Access to data doesn’t guarantee the production of useful research evidence, nor does it guarantee that evidence will be used. But linking large data sets and facilitating researcher access creates opportunities to answer different sets of questions than those allowed by survey, experimental, or qualitative data alone. Data linkages set the stage for scientific discoveries that can inform smarter policies and programs.1
Both researchers and policymakers have been calling for access to big data—large volumes of digital information about individuals, their activities, and their interactions with different systems, including government. Others have gone so far as to argue that the “big data revolution,” which facilitates analysis of large-scale data sets drawn from administrative records or linked records from multiple sources, will form the future of evidence based policymaking.
Thanks to efforts underway at the federal, state, and local levels to integrate and allow access to a variety of data sets, some researchers are already tracking individuals over long periods of time and across multiple systems and institutions, yielding findings with the potential to challenge assumptions and generate new insights. For example, researchers have accessed, linked, and analyzed state education and federal labor statistics to learn about the effects of state-wide economic downturns on school achievement, and others have used state-level accountability and test score data to better understand shifts and correlates of racial and ethnic achievement gaps.
Others have used big data for research to advance an understanding of how and when to respond to problems. At the William T. Grant Foundation, we are convinced that social science research, including that which taps into the potential of big data, can play an important role in addressing the challenge of inequality. We think that both the extent of inequality and its effects on youth outcomes are not inevitable, but amenable to social policy. And we believe that high-quality research can help identify and build understanding of approaches that help reduce short and long term inequalities.
Access to administrative data provides researchers with a unique opportunity to generate research that informs these efforts. Some have argued that this access will also facilitate faster and lower-cost evaluations of federal, state, and local programs and provide better estimates of program costs. But to fully maximize the value of data, we need to ensure that the research evidence produced from analyses of big data is useful and used. We think this requires:
- linking administrative data to other sources of information,
- involving decision makers in the process from the outset, and
- establishing strategies and structures to initiate and sustain such efforts.
These ideas are consistent with the Foundation’s focus on improving the use of research evidence. We are interested in investigations that identify and test how to connect decision makers with research evidence—including research resulting from thoughtful uses of administrative data, as well as studies that identify and test the incentive structures that encourage the production of research that answers decision makers’ most pressing questions. Ultimately, we hope such efforts will foster a culture of evidence wherein what is learned about programs and policies to reduce inequality is moved into action.
Linking different data sources
The potential of big data would be stronger if the data were integrated with other sources of information. Many also argue that more researchers need access to these combined data. These are central aims of a new project underway at the National Research Council known as the American Opportunity Study (AOS).2 AOS would create a structure that allows all qualified researchers—instead of only a select group—regular access to public record data in a safe, protected environment.
Data that comprise the AOS are extensive, and would offer new opportunities to study social mobility. The project aims to develop a comprehensive link between survey and evaluation studies and administrative data from the Census and agencies such as the Social Security Administration. The AOS panel would provide repeated observations on individual income, education, occupation, and other demographic variables for individuals. Parent–child or intergenerational matches are also possible with access to data from the IRS, the American Community Survey, and the decennial census.
“…it would open up new fields of social science inquiry; increase opportunities for evidence-based policy on poverty, mobility, child development, and labor markets; and otherwise constitute a new social science resource with much reach and impact.”
The structure of AOS is also being designed to inform program and policy evaluations aimed at contributing knowledge about upward mobility for disadvantaged youth. Figure 2 presents the three tiers of the broad plan. The top layer shows data links across decennial Censuses, a long-term, ten year panel of everyone who completes a Census form; the middle layer includes surveys, studies, or evaluations of social programs; and the bottom layer represents linked administrative datasets.
If successfully developed, AOS would provide a resource of unparalleled statistical power and an opportunity for large-scale causal research. One can easily imagine how survey or evaluation supplements to AOS might reveal the contours and correlates of social mobility as well as potential responses. One might investigate intergenerational issues and look at long-run outcomes of early life circumstances. Researchers could also link state data on educational experiences to national sources of earnings and income data (like that in the IRS sample) for both parents and children, and look forward and backward to see how they fare on a range of outcomes. As Grusky and colleagues write, “If an AOS of this sort were assembled, it would open up new fields of social science inquiry; increase opportunities for evidence-based policy on poverty, mobility, child development, and labor markets; and otherwise constitute a new social science resource with much reach and impact.”
AOS might also be leveraged to answer questions about long term outcomes of policies. This would build on prior work that used administrative data to evaluate the educational and economic payoffs of federal social programs for cash transfers and refundable tax credits and examined the unintended consequences of policies designed to support families. With AOS, a research team might expand on these findings and use new sources of administrative data or survey work to examine areas of research that are currently lacking, such as the effects of policies on behavior and mental health. For example, a team might use linkages to access IRS data and incarceration statistics from the Bureau of Justice Statistics to learn about the effects of youth incarceration on longer term outcomes, such as earnings post release and recidivism. Another team might examine Veteran’s Administrative data to assess the effects of military service on health and economic outcomes later in life. Alternatively a team might use linked survey data from the Panel Study of Income Dynamics or the National Longitudinal Study of Adolescent to Adult Health to probe potential mechanisms or competing hypotheses about why the observed relationships exist. This range of findings could encourage further study on implementation and context, and, in turn, inform smarter policies and programs.
“Unless we use this data to produce research evidence that addresses the needs of decision makers, the promise of the moment may go unrealized.”
Ultimately, looking across data sets may build comprehensive evidence about how young people fare in the systems through which they live and grow, and provide new understanding about the intersections of these systems. Andrews, Imberman, and Lovenheim, for example, have linked administrative records about K–12 education, postsecondary education, and earnings to examine the impact of two programs in one state for students from low-income high schools. The programs provided additional financial aid and enhanced supports for students once enrolled at one of two highly regarded public universities in Texas. The team’s study demonstrated the potential of such programs to yield long-term earning benefits, and highlighted how differences in the design of these programs may have long-term implications.
While these lessons are compelling, questions remain, however, about what it takes to move from data to “a new social science resource with much reach and impact.” Unless we use this data to produce research evidence that addresses the needs of decision makers, the promise of the moment may go unrealized. We suspect this requires knowing how to foreground decision makers’ information needs at the front end of the process and employing strategies and structures to improve the likelihood that the research evidence produced is ultimately used.