Population Data Science

MPC members are building a framework to provide comprehensive longitudinal data for the United States spanning two centuries of change. Our Big Microdata project is now producing complete microdata for all U.S. censuses from 1790 through 1940, comprising approximately 977 million records.

The full-count census data provide a unique laboratory for studying demographic processes and for testing social and economic models. The large scale of the microdata allows us to study particular communities and small dispersed populations and also enables big studies that span many places and periods. With complete microdata, individuals and families can be traced from one census to the next using automatic record linkage technology, revealing both individual change over the life course and family change across multiple generations. We are leveraging vital records and genealogies compiled by Ancestry.com to link individuals and households backwards from 1940 to 1850.

Simultaneously, we are linking big microdata forward in time to statistical and administrative records dating from 1960 to 2015 and beyond. In the Census Longitudinal Infrastructure Project (CLIP), we are collaborating with the Census Bureau's Center for Administrative Records Research and Applications to develop Protected Identification Keys (PIKs) for the 1940 census. The PIKs will allow us to link the 1940 census microdata to recent censuses and to administrative records (e.g., Social Security, Medicare, death records) and surveys (e.g., the National Health Interview Survey, Current Population Surveys). Big Microdata and CLIP will reconstruct life histories on a massive scale over multiple centuries. These life histories will help answer fundamental questions about the causes and consequences of population change, including fertility, mortality, family composition, life-course transitions, and economic and geographic mobility. We will have unprecedented opportunities to assess the impact of early life socioeconomic and neighborhood conditions on later health and demographic behavior. Perhaps most important, the new class of data will enable analyses of long-run population dynamics across multiple generations.

In the TerraPop project, another major interdisciplinary effort within population data science, MPC members are developing new technology to make population and health data easily interoperable with raster data derived from satellite imagery and remote sensing. The TerraPop infrastructure allows researchers to convert data between microdata, areal data, and raster data formats, and to merge data from any of the three sources and export it to any of the formats. For example, TerraPop will allow users of Demographic and Health Survey data to attach characteristics of the local environment to DHS records, including information on land cover, agricultural productivity, and climate.

Finally, MPC is investing heavily in the integration of survey data. The Integrated Health Interview Series—now called IPUMS-Health—will soon include the Medical Expenditure Panel Survey. We are expanding IPUMS-CPS with a trove of newly discovered CPS supplements dating from the 1960s, 1970s, and 1980s. In collaboration with NORC and the Food and Drug Administration, we are integrating data from the Behavioral Risk Factor Surveillance System and other surveys. Created in collaboration with the Oxford Centre for Time Use Research, the IPUMS-Time Use database offers historical and comparative time use data spanning a half century and multiple continents. IPUMS-International has begun to identify, acquire, and preserve labor force and health surveys at risk of destruction, to ensure they remain available for research. In collaboration with USAID and ICF International, we recently launched IPUMS-DHS, an integrated version of the DHS that makes these invaluable data easily interoperable across time and space. We have completed work on 76 DHS surveys from 18 countries; we plan to more than double the scope of the integrated DHS project over the next five years and to integrate the similar U.N. Micro Indicator Cluster Surveys (MICS) as well.

Conveners: Lara Cleveland and Catherine Fitch