January 22, 2018 – To help celebrate the 2018 sesquicentennial (150-year anniversary) of the University of California system, CSHE researchers have created the UC ClioMetric History Project, which takes a Big Data approach to exploring the history and role of the UC campuses in the state of California. This includes the building of a massive data repository documenting the university’s history through its students, faculty, courses, and finances. Zach Bleemer is the UC-CHP Director and lead researcher; CSHE Senior Research Fellow John Aubrey Douglass is the project’s principal investigator.
No other university has such a rich store of available institutional data, providing a unique opportunity to assess a university’s social and economic impact and to analyze its finances and operations over time.
Today, UC-CHP announces its new website, uccliometric.org, which houses much of the data collected by the project thus far. The website also hosts new interactive graphics and analysis of the university’s contributions to the state of California’s growth, health, economic mobility, and gender/ethnic equality in the 20th century and today, with more content to be rolled out throughout the sesquicentennial year.
Under the direction of Zach Bleemer, and working with a team of undergraduate research assistants, the Project has processed thousands of volumes of historical university records using a newly-developed digitization protocol—formatted optical character recognition (fOCR)—which transforms scanned structured texts like directories and catalogs into high-quality computer-readable databases. Many of these databases are now available on our new website, including:
- Annual student enrollment records for most large California universities from the 1890s to 1946, including students’ names, home towns, and majors,
- Annual faculty directories for three UC campuses (Berkeley, LA, and Davis) and Stanford University from 1900 to the 21st century, including names, departments, and rank,
- Annual course directories for the same four universities, including more than 850,000 full course descriptions by department and course number (with many linked to their faculty professors), and
- Annual detailed budget allocations for the entire University of California system from 1911 to 2012, including all employee wages through the 1950s and annual department-level allocations for the second half of the 20th century.
Interactive graphics on the website show that many of the best-known social and technological movements of the 20th century are reflected (and sometimes pre-figured) in these university records, from the World Wars and the Great Depression to the Space Race, the Women’s Movement, and the rise of Silicon Valley.
UC-CHP is also currently working with UC Registrars Offices to photograph, process, and analyze historical student transcript records, which were maintained on paper ‘hard cards’ until the late 20th century. These records are processed with fOCR and then integrated with modern digital student records, producing a complete record of student identifying information, demographic characteristics, and course completion/evaluation back to the 1950s or earlier. The resulting database is not available to outside researchers (due to privacy restrictions), but the UC-CHP website features an interactive graphic visualizing student trends at UC San Francisco—the first campus to complete UC-CHP processing—and will soon showcase findings from several other participating campuses.
A forthcoming publication in the Center’s Research and Occasional Paper Series (ROPS), available in working paper form on the UC-CHP website, provides greater technical detail about the new fOCR protocol developed to produce these new data resources.
Zach Bleemer – Director, UC ClioMetric History Project