Randomized trials are rapidly becoming standard practice for evaluation research and applied social science. Hundreds, if not thousands of such trials have been conducted and many more are on the way. Most large trials involve multiple sites in order to provide the sample sizes that are required and to broaden the generalizability of their findings. But, typically, these trials have only focused on average program impacts and program impacts for common socio-demo subgroups. That is about to change, however, as researchers, policymakers and practitioners are beginning to see the value of learning about and from variation in program impacts across individuals, across theoretically- and policy-relevant subgroups of individuals, and across program sites.
To help promote this path-breaking research agenda, the William T. Grant Foundation, the Spencer Foundation and the U.S. Institute of Education Science are supporting a major project that will bring together prominent university-based methodologists and the three research firms (MDRC, Mathematica Policy Research, and Abt Associates, Inc.) that have conducted the most large multi-site trials in education, youth development, and related fields. This project grows out of a 2013 conference on studying impact variation sponsored by the William T. Grant Foundation and reflects key themes of a 2014 federal conference on unpacking the “black box” of program impacts sponsored by the Office of Policy Research and Evaluation of the U.S. Department of Health and Human Services.
Raudenbush and Bloom (2015) outline key features of this ambitious project, describe its statistical foundation, and identify its anticipated benefits. Weiss, Bloom, and Brock (2014) provide a simple conceptual framework for organizing much of the project’s work. The project is organized in two parts: 1) developing and applying methods for learning about impact variation and 2) developing and applying methods for learning from impact variation.
Learning About Impact Variation
Although many observers believe that the effects of educational and social programs MUST vary substantially, there is little or no rigorous evidence to support or refute this belief. Hence, there is a great deal to be learned about the existence and magnitude of impact variation. Fortunately, much of this learning can be achieved from existing and future multi-site trials.
A simple first step in this process is for researchers to estimate program impacts on the variance of participants’ outcomes (e.g., Bloom & Weiland, 2015; Bloom, 2003; Bryk & Raudenbush, 1988). A program’s impact on the variance of participants’ outcomes indicates the extent to which the program tends to equalize outcomes or widen the outcome gap. For example, effective compensatory programs, which are intended to serve persons who are “at most risk” or “in most need,” will tend to equalize outcomes by bringing up the lower part of the outcome distribution. In contrast, effective “gifted and talented” programs, which are targeted toward the top of the outcome distribution, will tend to widen the gap between the top and bottom. The present project will use existing data from a number of major randomized trials to explore this issue for programs that run the gamut from early childhood education to post-secondary education.
A second and much larger step is for researchers to begin studying how program impacts vary across sites, such as schools, early child education centers or after-school programs. Doing so can provide important information that is masked by the past narrow focus of evaluation studies on average impacts. For example, a broader focus on cross-site distributions of program impacts can: 1) provide policymakers and program managers with a more nuanced understanding of the range of effectiveness of the programs for which they are responsible, 2) help researchers identify especially high-performing sites, which can both illustrate a program’s potential and provide valuable operational lessons, 3) assess whether the program is harmful in some sites, and 4) help researchers assess a program’s equity implications by examining the extent to which sites with the largest impacts have the weakest or strongest outcomes absent the program. The present project team has been active in the development and application of methods for this type of analysis (see Bloom, Raudenbush, Weiss, & Porter, under review; Bloom & Weiland, 2015; Raudenbush, Reardon, & Nomi, 2012).
Learning from Impact Variation
Having documented the nature and magnitude of variation in program impacts, this variation can be used to test theories about what social scientists call moderation and mediation of impacts.
Moderation of impacts:
Program impacts can vary because, for example, some types of persons are more likely than others to participate, some types of participants benefit more than others from participating, staff at some sites are more skilled than staff elsewhere, existing services available outside of a program are more widely available and/or more effective at some sites than at others, or because some organizations have greater capacity to enact a new program. These features of program participants or program sites are moderators of program impacts if they influence the program’s effectiveness but cannot be influenced by the program.
The simplest form of moderator analysis is a comparison of impact estimates for subgroups of sample members or program sites. Although subgroup analyses have long been a staple of evaluation studies, they are frequently based on ad hoc or post hoc subgroup definitions and interpreted without assessing the statistical significance of subgroup impact differences; practices which can be more harmful than helpful. Thus future subgroup analyses should be held to a higher standard (Bloom & Michalopoulos, 2013), with a focus on testing well-conceived theories about who is likely to benefit most.
In addition, most past studies have only focused on subgroups defined in terms of observable pre-randomization characteristics. However, researchers, including members of the present project team, are developing rigorous methods for estimating program impacts for “latent” subgroups that are defined in terms of post-randomization outcomes which cannot be observed for all sample members. Examples of such latent subgroups are: 1) persons who are at high risk of dropping out of school if not assigned to the program, 2) persons who would receive a large “dose” of program services if assigned to the program, or 3) persons who would be exposed to “world of work” activities if assigned to the program and not exposed to these activities if not assigned to the program. The first two latent subgroups are based on a single potential outcome and can be predicted using regression-based methods (e.g., Peck, 2003). The third latent subgroup is based on a pair of potential outcomes and can be predicted using “principal stratification” analysis (e.g., Feller, Grindal, Miratrix, & Page, under review; Page, Feller Grindal, Miratrix, & Somers, under review). The further development and application of these methods will be an integral part of the present project.
Mediation of impacts:
Mediators are changes in participants’ exposure to services, activities and staff practices or short-term changes in participants’ knowledge, skills, attitudes, or behavior that are caused by program assignment, which, in turn, promote participants’ long-term success. Hence, mediators are the mechanisms by which programs produce their impacts. Consequently, understanding the role of mediators lies at the heart of understanding program theories. However, because mediator levels are seldom, if ever, randomized, it is not easy to identify their causal effects.
One approach to doing so that is being explored for the present project is to use information about the cross-site relationship between program impacts on mediators and program impacts on participant outcomes. In theory, program sites that generate larger impacts on mediators should generate larger impacts on outcomes. Thus cross-site variation in impacts on mediators should explain cross-site variation in impacts on outcomes; and this relationship can be used to estimate the effect of the mediator on the outcome. The resulting approach is an application of instrumental variables analysis (Reardon, Unlu, Zhu, & Bloom, 2014; Reardon & Raudenbush, 2013; Raudenbush et al., 2012). The distinguishing assumption of the approach (called the “exclusion restriction”) is that all of a program impact being studied is transmitted through the mediator or mediators being studied.
The present project is also exploring a very different approach to mediational analysis. The approach, which was developed by team member Guanglei Hong, uses weighting as its core estimation method and is based on the assumption of “conditional sequential randomization” (Hong, 2015; Hong, Deutsch & Hill, in press). This assumption implies that not only is program assignment randomized to sample members, but also, conditional on a rich set of highly predictive baseline covariates, it is “as if” mediator values were randomized too. A distinguishing feature of this approach is that it allows for the possibility that the effect of a mediator on treatment-group outcomes can differ from that for control-group outcomes. In this way, a mediator of program impacts can also be a moderator of program impacts.
Over the next three years, the project will produce a series of working papers and journal articles, present a series of workshops and conference talks, and produce software that will enable other researchers to apply the methods being developed. In addition, the project will use these methods to learn as much as possible about and from impact variation from data for existing large multi-site randomized trials in education and related fields.
Bloom, H., S. Raudenbush, M. Weiss, & K. Porter (under review) Using Multi-site Evaluations to Study Variation in Effects of Program Assignment, New York: MDRC.
Bloom, H. & C. Weiland (under review) Quantifying Variation in Head Start Effects on Young Children’s Cognitive and Socio-Emotional Skills Using Data from the National Head Start Impact Study, New York: MDRC.
Bryk, A. & S. Raudenbush (1988) “Heterogeneity of Variance in Experimental Studies: A Challenge to Conventional Interpretations,” Psychological Bulletin (104)3: 396- 404.
Weiss, M., H. Bloom, & T. Brock (2014)”A Conceptual Framework for Studying the Sources of Variation in Program Effects,” Journal of Policy Analysis and Management, 33(3): 778–808 (Summer).
Bloom, H. & C. Michalopoulos (2013) “When is the Story in the Subgroups? Strategies for Interpreting and Reporting Intervention Effects on Subgroups,” Prevention Science, 14(2): 179-188.
Bloom, H., C. Hill, & J. Riccio (2003) “Linking Program Implementation and Effectiveness: Lessons from a Pooled Sample of Welfare-to-Work Experiments”, Journal of Policy Analysis and Management, 22(4): 551-575.
Bloom, Howard S. (2003) “Using ‘Short’ Interrupted Time-Series Analysis To Measure The Impacts of Whole-School Reforms: With Applications to a Study of Accelerated Schools”, Evaluation Review, 27(1): 3-49.
Feller, A., T. Grindal, L. Miratrix, & L. Page (under review) Compared to what? Variation in the Impacts of Early Childhood Education by Alternative Care-Type Settings, Cambridge, MA: Harvard University.
Hong, G., J. Deutsch, & H. Hill (in press) “Ratio-of-Mediator-probability Weighting for Causal Mediation Analysis in the Presence of Treatment-by-Mediator Interaction,” Journal of Educational and Behavioral Statistics.
Hong, G. (2015). Causal Inference in a Social World: Moderation, Mediation, and Spillover. New York: Wiley-Blackwell (To appear in 2015).
Page, L., A. Feller, T. Grindal, L. Miratrix, & M. Somers (under review) A Tool for Understanding Variation in Program Effects across Endogenous Subgroups, Pittsburgh: University of Pittsburgh.
Peck, L. (2003) “Subgroup Analysis in Social Experiments: Measuring Program Impacts Based on Post-Treatment Choice,” American Journal of Evaluation, 24(2): 157 – 187.
Raudenbush, S. & H. Bloom (under review) Learning About and From Variation in Program Impacts Using Multi-Site Trials, New York: MDRC.
Raudenbush, S., S. Reardon, & T. Nomi (2012) “Statistical Analysis for Multi-site Trials Using Instrumental Variables,” Journal of Research on Educational Effectiveness, 5(3): 303 – 332.
Reardon, S, & S. Raudenbush (2013) “Under What Assumptions Do Multi-site Instrumental Variables Identify Average Causal Effects?” Sociological Methods and Research, 42(2): 143-163.
Reardon, S., F. Unlu, P. Zhu, & H. Bloom (2014) “Bias and Bias Correction in Multisite Instrumental Variables Analysis of Heterogeneous Mediator Effects,” Journal of Educational and Behavioral Statistics, 1076998613512525.