Randomized Control Trial

Many organizations, from government agencies to philanthropic institutions and aid organizations, now require that programs and policies be ‘evidence-based.’

Each day I post a new Online MCAT CARS Passage. This is for anyone who wants to practice for the Critical Analysis and Reasoning Section.

Every article is selected to meet the AAMC MCAT criteria for MCAT CARS.

Subscribe by email to receive a new free practice passage each morning.

May 31, 2017 – Online MCAT CARS Practice

Question: What is your summary of the author’s main ideas. Post your own answer in the comments before reading those made by others.

Many organizations, from government agencies to philanthropic institutions and aid organizations, now require that programs and policies be “evidence-based.” It makes sense to demand that policies be based on evidence and that such evidence be as good as possible, within reasonable time and budgetary limits. But the way this approach is being implemented may be doing a lot of harm, impairing our ability to learn and improve on what we do.

The current so-called “gold standard” of what constitutes good evidence is the randomized control trial, or RCT, an idea that started in medicine two centuries ago, moved to agriculture, and became the rage in economics during the past two decades. Its popularity is based on the fact that it addresses key problems in statistical inference.

For example, rich people wear fancy clothes. Would distributing fancy clothes to poor people make them rich? This is a case where correlation (between clothes and wealth) does not imply causation.

Harvard graduates get great jobs. Is Harvard good at teaching – or just at selecting smart people who would have done well in life anyway? This is the problem of selection bias.

RCTs address these problems by randomly assigning those participating in the trial to receive either a “treatment” or a “placebo” (thereby creating a “control” group). By observing how the two groups differ after the intervention, the effectiveness of the treatment can be assessed. RCTs have been conducted on drugs, micro-loans, training programs, educational tools, and myriad other interventions.

Suppose you are considering the introduction of tablets as a way to improve classroom learning. An RCT would require that you choose some 300 schools to participate, 150 of which would be randomly assigned to the control group that receives no tablets. Prior to distributing the tablets, you would perform a so-called baseline survey to assess how much children are learning in school. Then you give the tablets to the 150 “treatment” schools and wait. After a period of time, you would carry out another survey to find out whether there is now a difference in learning between the schools that received tablets and those that did not.

Suppose there are no significant differences, as has been the case with four RCTs that found that distributing books also had no effect. It would be wrong to assume that you learned that tablets (or books) do not improve learning. What you have shown is that that particular tablet, with that particular software, used in that particular pedagogical strategy, and teaching those particular concepts did not make a difference.

But the real question we wanted to answer was how tablets should be used to maximize learning. Here the design space is truly huge, and RCTs do not permit testing of more than two or three designs at a time – and test them at a snail’s pace. Can we do better?

Consider the following thought experiment: We include some mechanism in the tablet to inform the teacher in real time about how well his or her pupils are absorbing the material being taught. We free all teachers to experiment with different software, different strategies, and different ways of using the new tool. The rapid feedback loop will make teachers adjust their strategies to maximize performance.

Over time, we will observe some teachers who have stumbled onto highly effective strategies. We then share what they have done with other teachers.

Notice how radically different this method is. Instead of testing the validity of one design by having 150 out of 300 schools implement the identical program, this method is “crawling” the design space by having each teacher search for results. Instead of having a baseline survey and then a final survey, it is constantly providing feedback about performance. Instead of having an econometrician do the learning in a centralized manner and inform everybody about the results of the experiment, it is the teachers who are doing the learning in a decentralized manner and informing the center of what they found.

Clearly, teachers will be confusing correlation with causation when adjusting their strategies; but these errors will be revealed soon enough as their wrong assumptions do not yield better results. Likewise, selection bias may occur (some places may be doing better than others because they differ in other ways); but if different contexts require different strategies, the system will find them sooner or later. This strategy resembles more the social implementation of a machine-learning algorithm than a clinical trial.

In economics, RCTs have been all the rage, especially in the field of international development, despite critiques by the Nobel laureate Angus Deaton, Lant Pritchett, and Dani Rodrik, who have attacked the inflated claims of RCT’s proponents. One serious shortcoming is external validity. Lessons travel poorly: If an RCT finds out that giving micronutrients to children in Guatemala improves their learning, should you give micronutrients to Norwegian children?

My main problem with RCTs is that they make us think about interventions, policies, and organizations in the wrong way. As opposed to the two or three designs that get tested slowly by RCTs (like putting tablets or flipcharts in schools), most social interventions have millions of design possibilities and outcomes depend on complex combinations between them. This leads to what the complexity scientist Stuart Kauffman calls a “rugged fitness landscape.”

Getting the right combination of parameters is critical. This requires that organizations implement evolutionary strategies that are based on trying things out and learning quickly about performance through rapid feedback loops, as suggested by Matt Andrews, Lant Pritchett and Michael Woolcock at Harvard’s Center for International Development.

RCTs may be appropriate for clinical drug trials. But for a remarkably broad array of policy areas, the RCT movement has had an impact equivalent to putting auditors in charge of the R&D department. That is the wrong way to design things that work. Only by creating organizations that learn how to learn, as so-called lean manufacturing has done for industry, can we accelerate progress.

Adapted from Project Syndicate.


Leave a comment below with what you understood to be the author’s main ideas. Ask about this daily passage in office hours/workshops for help.

Subscribe to my Daily CARS mailing list by entering your email.

The full list of daily articles is available here.

This was an article on Ethics.

Have a great day.
Jack Westin
MCAT CARS Instructor.
Contact Information


  1. RCTs are too narrow outside of clinical trials, crawling more holistic


  2. author is against the usage of RCT in other fields beside clinical trials.
    different method is required for different fields


  3. RCTs are widely used, but the author thinks they they are not that good b/c they ignore individual differences (can be good for drugs tho)


  4. RTC – randomized control of experiment (evidence-based) has shortcomings such as external validity, etc. the procedure of such defeats the purpose of standardized and control the experiments. Author casts doubts here.


  5. RCT’s are limited in their adaptability to the nuances of more complex problems.


  6. Evidenced based Research such as RCT has a serious limitation of external validity


  7. RCT=current GS Ex) fancy clothes Ex) Tablet

    Where I stopped reading… looks like the author was using the hypothetical tablet example to consider alternatives to an imperfect current research methodology


  8. RCT is slow for policy. Author describes what RCT is then proposes a better suited method for testing using a hypothetical tablet model.


  9. Author believes that RCTs are slow and ineffective for optimizing outcomes in basically all but drug trials, and believes that a ‘random crawl’ of strategies by many individuals, with adjustments based on feedback, will yield better outcomes.


  10. MIP: RCTs = gold standards b/c use statistics; implementation of RCTs = harming policy work + need feedback loops/constant learning


  11. RCT, not good for social and economical issues (slow and limited), should be replaced by cont. learning techniques such as feedback learning (ex: classroom + tablet).


  12. RCT = previous gold standard, new –> critique = limited
    crawling (thought experiment) = help (freedom to teacher)


  13. MI: RCT is not a universal standard for testing the efficacy of practices before implementation. In schools and other social situations (e.g. Policy plans for nonprofits) a more flexible form of testing should occur.


  14. Rct not effective for policy work because of so many to try out
    We must use random crawl to find thr best solution


  15. Theme: RCT should not be used indiscriminately despite its recent popularity, it has its limitaitons and author wants to inform his audience. It’s implementation is suitable for only a few settings. Tone: informative, persuasive and discerning (limitations of RCT) but definitely not dismissive (acknowledges its strengths and use too)
    Our need for programmes and policies to be “evidence-based” might impair our ability to learn and improve on ourselves
    RCT helps to address key statistical problems as it allows us to distinguish between correlation and causation with its robust statistical tests and analyses, examples: fancy clothes and wealth, Havard being good at teaching or do they just accept brilliant students
    RCT works on the premise of testing hypotheses (assumptions) and ascertains them if supported by statistical evidence (significance) but may err if the conditions are not set up properly (test is not robust, ie confounders present) and whether are we ultimately answering the right question (is the hypothesis formulated properly and if so, are the right conditions provided to test it out). Example: tablet used to aid learning in schools.
    RCT limited by its design (always treatment vs placebo aka control). Need to revolutionise the way we conduct our study. RCT works with setting initial baseline and making observations based on this baseline with final outcome. Alternative is to constantly provide feedback and adjusting baselines. This alternative may predispose us to the fundamental errors (examples: confusing correlation with causation, selection bias) but these problems are exposed in time and are not hidden/ masked (possible in RCT outcomes) so it’s a self-correcting algorithm. (justifications for social intervention). Shortcoming is that results form RCT cannot be translated to other phenomena (no external validity).
    Author emphasizes that obtaining the right combination of parameters is critical as seen in social interventions unlike RCT which assumes that it has all the correct parameters in place at the start of the study. Social intervention seeks out the right ones and weeds out the flawed ones in the process. (no specific examples given here)
    RCT appropriate for certain situations like clinical drug trials which test things that we assume to work (drug is efficacious) vs designing things that work (learning if they work), you resort to social intervention (5S, Kanban, poka yoke in manufacturing).


  16. In this passage, the author argues for a different approach involving organizations and the research being done on them. The author mentions that random control trials have been used for many different purposes; however, are only useful in certain fields, such as drug trials.
    The author’s problem with RCT’s is that they provide no feedback, they conduct research at a slow pace, and can only experiment with a few parameters.
    The author proposes a new method where constant change is being implemented when certain strategies do not work. The author thinks that with this constant feedback and change in methods, organizations will find a more efficient way for finding better results.


  17. MIP: RTC=bad for policy making. Machine learning-like method=good for complex issues.


  18. MI: RCT= bad=too lim/slow/ext valid prob=4 clin drug only
    tone: neut


  19. MIP: RCT is not effective method; Better= focus on proper factor combinations through immediate feedback loops


  20. RCT’s = widely used =/= effective for policy works.


Leave a Reply