Quality and Accountability in Healthcare Delivery: Audit Evidence from Primary Care Providers in India

This paper presents direct evidence on the quality of health care in low-income settings using a unique and original set of audit studies, where standardized patients were presented to a nearly representative sample of rural public and private primary care providers in the Indian state of Madhya Pradesh. Three main findings are reported. First, private providers are mostly unqualified, but they spent more time with patients and completed more items on a checklist of essential history and examination items than public providers, while being no different in their diagnostic and treatment accuracy. Second, the private practices of qualified public sector doctors were identified and the same doctors exerted higher effort and were more likely to provide correct treatment in their private practices. Third, there is a strong positive correlation between provider effort and prices charged in the private sector, whereas there is no correlation between effort and wages in the public sector. The results suggest that market-based accountability in the unregulated private sector may be providing better incentives for provider effort than administrative accountability in the public sector in this setting. While the overall quality of care is low both sectors, the differences in provider effort may partly explain the dominant market share of fee-charging private providers even in the presence of a system of free public healthcare.

This paper presents direct evidence on the quality of health care in low-income settings using a unique and original set of audit studies, where standardized patients were presented to a nearly representative sample of rural public and private primary care providers in the Indian state of Madhya Pradesh. Three main findings are reported. First, private providers are mostly unqualified, but they spent more time with patients and completed more items on a checklist of essential history and examination items than public providers, while being no different in their diagnostic and treatment accuracy. Second, the private practices of qualified public sector doctors were identified and the same doctors exerted higher effort and were more likely to provide correct treatment in their private practices. Third, there is a strong positive correlation between provider effort and prices charged in the private sector, whereas there is no correlation between effort and wages in the public sector. The results suggest that market-based accountability in the unregulated private sector may be providing better incentives for provider effort than administrative accountability in the public sector in this setting. While the overall quality of care is low both sectors, the differences in provider effort may partly explain the dominant market share of fee-charging private providers even in the presence of a system of free public healthcare.

Introduction
Healthcare is a credence good with substantial information asymmetries between patients and providers. This makes it difficult for patients to determine the quality of care they have received, and it is widely believed therefore that unregulated market-based delivery of healthcare is socially undesirable (Arrow, 1963) 1 . Further, if optimal care requires the potential denial of services that patients value (such as steroids or antibiotics), marketbased healthcare may be over-responsive to demand, leading to socially inefficient provision (Prendergast, 2003). Partly as a result of these considerations, the default policy approach to delivering healthcare for the poor in most low-income countries is through free or nominallypriced medical care in publicly-run facilities staffed by qualified doctors and nurses, who are paid a fixed salary (World Bank, 2003).
However, a majority of households in low-income countries choose to visit fee-charging healthcare providers in the private sector, whose market share exceeds 70 percent in rural India (the focus of our study). 2 This is surprising for two reasons. First, private healthcare providers in India face little de facto regulation and most have no formal medical training (Rohde and Viswanathan, 1995;Banerjee, Deaton and Duflo, 2004;CPR, 2011). Second, while the high use of the private sector could, in part, reflect the absence of public options, this cannot be the only explanation. In our data from rural India, the private sector share of primary care visits (constructed from a household census) is 80 percent even in markets with a qualified public doctor offering free care through public clinics, with more than 60 percent of these visits to private providers with no formal qualifications.
The high market share of unqualified private healthcare providers raises a number of questions about the functioning of healthcare markets in low-income settings. First, why would people choose to pay for care from (mostly) unqualified providers when public clinics are staffed with qualified doctors who offer care at a much lower price? Second, how does the quality of care received vary across public and private healthcare providers? Third, what does an unregulated healthcare market reward and how does this compare with the regulated public sector? Specifically, to what extent are prices in the market and wages in the public sector correlated with quality of care? Answers to these questions have been limited by the lack of evidence on the actual quality of care provided in public and private health facilities in low-income settings. 3 This paper addresses this gap by presenting among the first direct measures of quality of care using condition-specific metrics in low-income countries, using data from an audit study conducted in rural areas of the Indian state of Madhya Pradesh (MP). Specifically, standardized patients (SPs) were coached to accurately present symptoms for three different conditions -unstable angina, asthma, and dysentery in a child (who is at home) -to multiple healthcare providers. SPs then made over 1,100 unannounced visits to a near-representative sample of public and private providers of primary healthcare services and recorded conditionspecific metrics of care for each interaction. 4 These metrics include the providers' adherence to a checklist of questions and examinations deemed essential for making a correct diagnosis in each case, their likelihood of pronouncing a correct diagnosis, and the appropriateness of the treatments. For brevity, we refer to these metrics as "quality of care".
We present results from two sets of comparisons. First, we sent SPs to a near-representative sample of public and private health facilities and we use these data to compare the representative patient experience across public and private clinics. These differences reflect variation in both provider composition (including knowledge and intrinsic motivation) as well as differential incentives across public and private clinics. To isolate the effect of practicing in the private sector holding provider characteristics constant, we identified the private practices of qualified public doctors (the majority of whom have one) and sent SPs to present the same medical case to the same set of doctors in both their public and private practices. Our second comparison uses this "dual practice sample" and compares the quality of care across the public and private practices of the same doctors on the same set of cases.
We report three main findings. First, while the majority of private providers in the representative sample have no medical qualifications, they exerted significantly higher effort than public providers and performed no worse on diagnosis and treatment. Private providers spent 1.5 minutes more with patients (62 percent more), and completed 7.4 percentage point more items on a checklist of essential history and examination items (47 percent more) than public providers. They were equally likely to pronounce a correct diagnosis (only 4 percent of public providers do so), to offer a correct treatment (27 percent of public providers do so), or to offer clinically unnecessary treatments (provided by 70 percent of public providers).
Second, in the dual practice sample the same doctors spent more time with SPs, completed more items on the checklist, and were also more likely to offer a correct treatment in their private practices, relative to their public practices. Notably, we do not find evidence of differential over-treatment, with equivalently high rates of unnecessary treatments, use of antibiotics, and total number of medicines in both types of practices. These differences are conditional on seeing the doctor and therefore understate the difference in the quality of patient experiences across public and private practices of the same doctor, because the expected number of trips to the clinic to see a qualified doctor is considerably higher in the public practice (due to high absence rates).
Third, we find a positive correlation between the fees charged by private providers and measures of quality such as the time spent, the fraction of checklist items completed, and likelihood of providing a correct treatment. However, we also find a positive correlation between prices and the total number of medications given -including unnecessary treatment. In the public clinics, SPs were provided free or nominally-priced care. Since there is no variation in prices, we examine the correlation between doctors' compensation and quality of care and find no correlation between salaries (or desirability of posting) in the public sector and any measure of quality of care delivered.
Further, while public healthcare is free to the consumer, it is not free to the taxpayer. We calculate the per-patient cost in the public sector and conservatively estimate it to be four times higher than the fees charged by private providers in our sample. Thus, the unregulated private market for healthcare in rural MP, which is mainly staffed by unqualified providers, appears to deliver higher provider effort and comparable quality of care, at a much lower cost per patient.
To help interpret our results, we develop a simple theoretical framework that models provider-patient interactions in two stages: consultation and treatment. Patients present their initial symptoms to the provider, based on which he forms a prior distribution regarding the true ailment. Higher effort in the consultation stage yields a more precise posterior distribution. The treatment choice is determined by a combination of the physician's desire to cure the patient, market incentives for over-treatment, and patients' demand for medication. The main insight of the model is that while providers will typically exert more effort in their private practice, the effect on overall patient health is ambiguous. If the default effort level of doctors under low-powered incentives is reasonably high, the marginal gains from additional effort in private practice are outweighed by the costs of over-treatment resulting from market incentives. On the other hand, if the default effort level is low, the benefits of higher effort in the private sector (and the resulting increase in precision of the posterior) may outweigh the costs of over-treatment under market incentives.
Our methodological contribution helps address the fundamental problem of inferring quality in healthcare, where the optimal action is patient and condition specific, and inefficiencies include under-treatment, over-treatment, or both (Pauly, 1980). Specifically, there are four advantages to the use of unannounced SPs relative to existing measures in the literature, which are based on tests of provider knowledge, observation of medical practices or analysis of prescriptions. 5 First, the use of SPs ensures a common set of patient and illness characteristics, which limits concerns about differential patient sorting across clinics on the basis of personal or illness characteristics, as might be the case when observing real patient-provider interactions. Second, the SP method allows us to objectively score the quality of care using conditionspecific metrics (checklist completion, diagnosis, and treatment) because we know the actual illness being presented and the optimal care associated with the case. In the case of real observations, we would observe only the presenting symptoms and would have to speculate about the true underlying illness. Third, we are able to observe prices charged for completed transactions, which allows us to study to what extent the unregulated market rewards quality, which improves upon audit studies in other settings that obtain price quotes but do not complete the purchase. 6 Finally, Hawthorne effects are not a concern in the SP context because providers do not know that they are being observed.
Substantively, the advances in measurement above combined with our ability to observe the same doctor across public and private practices allow us to provide the first direct comparison of the quality of care across public and private sectors. 7 We also provide the first evidence on how market prices for healthcare behave in an unregulated setting and show that there is a positive correlation between price and checklist completion/correct treatment, but also between price and unnecessary treatments. This suggests that in unregulated markets 5 Medical vignettes, which measure provider knowledge, allow for standardization of case-mix, but do not measure provider practice, which has been shown to differ markedly from provider knowledge in multiple contexts (Rethans et al., 1991;Leonard and Masatu, 2005;Das and Hammer, 2007). 6 For instance, first price offers can be very different from the price of the completed transaction if the distribution of willingness to pay is different across populations. See for instance, Ayres and Siegelman (1995) and Goldberg (1996) for an example of how the lack of completed sales data can lead to misleading conclusions in audit studies of car sales. In our case, the "sale" is always completed as the SP leaves only after the provider has completed the interaction and the price has been paid. 7 Our approach extends a literature testing for moral hazard in agricultural labor markets by comparing worker effort and output under different contractual arrangements (Shaban, 1987;Foster and Rosenzweig, 1994a,b) to a credence good setting where output is harder to measure for both customers and researchers, and where there is substantial direct provision of the good by the public sector.
for healthcare, market prices do reflect some information on the quality of care, but also that patients cannot evaluate whether they are being over-treated and charged for medically unnecessary treatments.
These findings are consistent with the broader empirical literature on credence goods that has demonstrated over-provision of services to the detriment of customer welfare in settings ranging from caesarian sections to car mechanics and cab rides for tourists (Wolinsky, 1993;Gruber and Owings, 1996;Dulleck and Kerschbamer, 2006;Dulleck, Kerschbamer and Sutter, 2011;Schneider, 2012). As is well known, inefficiencies in market provision do not imply that public provision will do better and a key contribution of our paper is the ability to compare public and private provision of a canonical credence good such as healthcare.
Combined with our theoretical framework, our results suggest that in settings of poor governance and administrative accountability in the delivery of healthcare through the public sector (Banerjee, Deaton and Duflo, 2004;Banerjee, Duflo and Glennerster, 2008), marketbased provision of healthcare may present a legitimate alternative in spite of its many theoretical (and empirical) weaknesses. Our results have direct implications for global policy debates on the organization and delivery of healthcare services in low-income countries with low state capacity to deliver effective oversight over public healthcare systems. We discuss these along with caveats in the conclusion.
The rest of this paper is organized as follows. Section 2 describes healthcare provision in rural India and MP; section 3 describes the standardized patient (SP) methodology, our measures of healthcare quality, and discusses sampling and representativeness; section 4 presents a simple theoretical framework to interpret our results; section 5 presents the main results; section 6 discusses pricing and cost-effectiveness; section 7 discusses robustness to alternative explanations, and section 8 concludes with a discussion of policy implications and caveats.

Healthcare in Rural India
Healthcare in India is delivered by both public and private clinics and hospitals. In the public sector, patients can obtain primary care on a walk-in basis in facilities differentiated by their level of specialization ranging from district hospitals and Community Health Centers (CHCs) to Public Health Centers (PHCs), and sub-centers. 8 PHCs, CHCs, and hospitals are supposed to be staffed with trained doctors, who are expected to make diagnoses and either treat or refer patients as appropriate. Sub-centers are supposed to be staffed with qualified nurses with doctors visiting on a fixed rotation. Most doctors hold a Bachelor of Medicine and Bachelor of Surgery (MBBS) degree, the rough equivalent of an MD in the US, and receive a fixed salary from the government, with no variable compensation based on either patient load or quality of care. 9 Consultations in public clinics are free or nominally priced. Patients are also supposed to receive free medication, if available. Although a federally-funded insurance program for inpatient hospital care was introduced in 2007, at the time of our study, the tax-funded public system of care was the only source of (implicit) public insurance in the system at the time of this study.
Although public facilities are theoretically accountable to administrative norms and procedures (documented in the Civil Service Codes for each state), both the perceptions of staff members and process measures of effort suggest severe deficiencies. Nationwide, doctor absences averaged 43 percent on any given day in 2003and 40 percent in 2010(CPR, 2011Muralidharan et al., 2011). These absences do not occur on predictable days or hours (Banerjee, Deaton and Duflo, 2004) and they are not easy to address at a system-level (Banerjee, Duflo and Glennerster, 2008;Hanna and Dhaliwal, 2015). When asked about adherence to administrative rules, more than 80 percent of public sector doctors agree that the rules and norms are frequently flouted and that appropriate 'payments' can allow providers to circumvent disciplinary proceedings, even due to grave negligence (La Forgia and Nagpal, 2014).
While official policy documents of the Government mainly focus on improving the public system of primary healthcare (Planning Commission of India, 2013), data from household surveys consistently show that the fee-charging private sector accounts for over 70 percent of primary care visits (DHS, 2007;Selvaraj and Karan, 2009;CPR, 2011). Barriers to entry for private healthcare providers are low. Provider qualifications range from MBBS degrees to no medical training at all, and clinics can range from well-equipped structures to small one-room shops, the provider's residence, or the patients' home for providers that make home visits. Providers operate on a fee-for-service basis, and prices often include the cost of medicines. While providers operating without a medical license are not legal and face the threat of an occasional raid, they have come to be the dominant source of care in these 9 India also recognizes medical degrees from alternative schools of medicine including the BAMS (Bachelors in Ayurvedic Medical Sciences), the BHMS (Bachelor of Homeopathic Medical Sciences) and the BUMS (Bachelor of Unani Medical Sciences). However, providers with these qualifications are only licensed to prescribe medication in line with their training and are not given prescription rights on allopathic medicine. They also are not typically posted in the frontline healthcare system of PHCs, CHCs, and district hospitals that prescribe allopathic medicine. markets (as the data below will show).

Market Sampling and Summary Statistics
Our study was carried out in the Indian state of Madhya Pradesh (MP), one of India's poorer states, with a GDP/capita of ∼$600/year (or ∼$1500/year in PPP terms) in 2010-11 (the period of the study). We first drew a representative sample of 100 villages across 5 districts, stratified by geographic region and an index of health outcomes. We then conducted a household census in these villages, where we asked respondents to name all providers from whom they sought primary care in the past thirty days and their locations (including providers outside the village, but in market clusters on the nearest main road). We then surveyed all providers in all of these locations, regardless of whether or not the providers themselves had been mentioned in the sample villages, thereby obtaining a census of all providers in the healthcare market that catered to sampled villages.
Table 1 (columns 1-3) presents summary statistics based on the provider census (Panel A) and the household census (Panel B) in these markets; columns 4-6 compare villages sampled for the SP study to the representative villages. The table highlights three key features of the health market in rural India. First, villages are served by a large number of providers once the health market is correctly accounted for by including clusters on major roads close to the village. There are 11 primary care providers per market and 46 percent of households reported visiting a primary care provider in the 30 days prior to the survey.
Second, the majority of providers are private (7 out of 11 or 64 percent), and they account for 89 percent of household visits; excluding paramedical public health workers (typically responsible for preventive, maternity and child care) increases the fraction further to 93 percent. The share of visits to private providers (with or without qualifications) is 88 percent when there is a public provider in the market, and is 83 percent even when there is a public MBBS doctor in the same market.
Third, 48 percent of all providers and 77 percent of all private providers (5.4 per village) have no formal medical training, but account for 77 percent of household visits. There is less than one MBBS doctor per market, and 94 percent are available only outside the village. The distribution of MBBS providers is uneven. Only 30 percent of all villages have recourse to an MBBS provider (public or private) in their market, and only 5 percent within village boundaries. Private unqualified providers remain the dominant providers of care in most settings, accounting for 74 percent of all visits when there is a public provider in the same market, and 60 percent when there is a public MBBS doctor in the same market. 10 MBBS doctors account for only 4 percent of all patient interactions (Panel B).

The Standardized Patient (SP) Methodology
Used routinely in the training and evaluation of medical students in high-income countries, including the United States, SPs are highly-trained 'fake patients' who present symptoms of an illness to a physician like any other normal patient. Details of the SP interaction are then used to evaluate the quality of care received by a typical patient (Rethans et al., 1991). SPs are coached to present their initial symptoms and answer any questions that the physician may ask as part of history taking, in a manner consistent with the underlying condition. We followed the same method (adapted to local conditions) and sent unannounced SPs to healthcare providers in our sample during the course of a normal working day.
A total of 22 SPs were recruited from the districts where the study was conducted. Using a team comprising of a professional SP trainer, two medical doctors, and a medical anthropologist familiar with the local forms of presenting symptoms and illnesses, SPs were coached to accurately and consistently present one of three cases -unstable angina in a 45 year-old male, asthma in a 25 year-old female or male, and dysentery in a child who was at home presented by the father of the child (see Das et al. (2012) and Appendix B for details on the SP protocols). 11 SPs visited sampled providers, who did not know they were receiving standardized patients and therefore should have treated them as new patients. 12 After the interaction, SPs were debriefed within an hour with a structured questionnaire that documented the questions and examinations that the provider completed or recommended, the treatments provided, and any diagnoses offered. The SPs retained any medicines dispensed in the clinic and paid all fees charged by providers at the end of the interaction.
The SPs depicted uncomplicated textbook presentations of the cases, and a panel of doctors who advised the project concurred that appropriate history taking and examinations should lead providers towards the correct diagnosis and treatment. Cases were specifically for these staff to be the main healthcare providers in public clinics and also prescribe medication (given high doctor absence rates).
11 Das et al. (2012) discusses the SP methodology in further detail, and presents summary statistics on overall quality of care in this setting. The current paper focuses on the economics of unregulated healthcare markets and we do not replicate the analysis in Das et al. (2012). See Appendix B for further details on how the SP method was implemented. Details on case presentations and instruments are posted on www.healthandeducationinindia.org 12 The research ethics board of Innovations for Poverty Action approved this design following a successful pilot with informed consent in Delhi. We describe sampling and representativeness in the next section. chosen so that the opening statement by the SPs would be consistent with multiple underlying illnesses, but further questioning should have led to an unambiguous (correct) diagnosis. This allows us to measure provider quality through adherence to an essential checklist of questions and examinations that would allow them to accurately make a diagnosis and provide a correct treatment. We also chose these cases since they represented conditions with high or growing incidence in India and other middle-and low-income countries and minimized risk to SPs that could arise from unsafe invasive examinations, such as a blood test with an unsterilized needle.
We also picked cases where the role of suitable medical advice was important because real patients would be unlikely to be able to categorize them as "life threatening" or "potentially non-harmful" and triage themselves into clinics or hospitals. For instance, the SP with unstable angina complains of chest pain which, even in countries with advanced health systems, is often mistaken by patients as arising from heartburn, exertion or muscle strain. 13 Similarly, wheezing and shortness of breath in asthma may arise from short-term allergies to environmental contaminants. Finally, for any child with diarrhea, a key contribution of a health care provider is to assess whether the symptoms reflect a bacterial or viral infection (and thus whether the patient requires antibiotics) and the degree of dehydration -each of which may be difficult for parents to assess.

Healthcare Provider Sampling and Summary Statistics
Our study first uses the census of healthcare providers described earlier to construct a near representative sample of public and private healthcare providers in rural MP in three of the five sampled districts. While our SPs were recruited from the districts in our sample, they were never residents of the villages where they presented themselves to health providers. Since providers in rural areas might know their patients, the SPs had to justify their presence in the area by mentioning, for example, work-related travel or visits to relatives. For such excuses to be plausible, our final sample dropped villages that could not be accessed by paved roads and comprised of a total of 46 villages across three districts. While these sampled villages have more providers on average than the entire representative set of villages, there is no difference in the composition of providers across the frame and sample (Table 1).
Since SPs visited clinics to obtain primary care, we excluded community health workers, midwives, and providers that only made home visits. We sampled providers in all public clinics (up to two providers per clinic) and a maximum of six private providers in each market for a total of 247 providers in 235 clinics, and SPs completed interactions with 224 providers. 14 Details of the sampling and the visits are in Appendix A.
Data from this 'representative sample' allow us to compare care provided across typical public and private clinics in rural MP (all our estimates are re-weighted by the inverse of the sampling probabilities to provide population representative averages). However, this comparison would reflect a combination of varying composition of providers (including their knowledge or professionalism), as well as the effect of practicing in the private sector.
To isolate the role of private sector practice, we identified the universe of public MBBS doctors posted to PHCs and CHCs from all five study districts, even if these clinics were not located in the village-based sampling scheme. We then identified the private practices of these doctors (we found a private practice for 61 percent). We sampled and successfully administered SP visits to 118 public MBBS doctors. Our 'dual sample' consists of the 88 doctors in this MBBS sample who also have a private practice, and for 69 of these, SPs presented cases in both their public and private practices. The 'dual sample' enables a comparison of the quality of care provided by the same doctor on the same case across his public and private practices. We note that our completion rates were higher in the private (93 percent) compared to public practices (75 percent). The lower public sector completion rates resulted from long-term absences and leave among sampled public sector providers, leading to non-completion despite multiple attempts. If (a) there is heterogeneity in the public-private difference across doctors, and (b) doctors who are more absent also provide lower quality care in their public sector practice, our public-private differences may be underestimated. Appendix A and Tables A.1 and A.2 provide further details on the sampling and construction of the representative and the dual sample of providers.
Table 2 (columns 1-3) provides summary statistics for the representative sample of providers. The providers are mostly middle-aged men and just under 60 percent have completed 12 or more years of education (Table 2, Panel A). Providers practices have been open for 13-15 years, and private and public providers self-report an average of 16 and 28 patients per day, respectively. Most practices (82 percent of private and 100 percent of public) dispense medicines in the clinic itself and are equipped with the infrastructure and medical devices required for routine examinations, such as stethoscopes and blood pressure cuffs. In the representative sample, public providers are more likely to have an MBBS degrees (26 percent vs. 8 percent) and private providers charge an average of Rs.51 per interaction. Consistent with nominally priced public care, our SPs paid Rs.3.7 on average in public clinics.
Column 4 presents summary statistics on the universe of public MBBS doctors, while columns 5-7 present these for the 88 public MBBS doctors in the dual sample and test if they are comparable. Overall, doctors with and without dual practices are similar on observable characteristics, but the former have a longer tenure at their current location.
There is no significant difference in the equipment reported across these practices (Columns 8-10), although the overall number of patients seen is higher in the public practice and the fees charged are higher in the private practice. We randomly assigned three SPs to each sampled provider in the representative sample, one presenting each of the three cases. For the dual sample, we sent SPs presenting the asthma and dysentery cases to both practices of the same provider. 15 Since the rarity of unstable angina could raise suspicions if providers saw two travelers presenting the same case (even though visits were typically separated by a few weeks), we randomized the providers into two groups -one that received an unstable angina patient in his/her private practice and another that received the case in the public clinic. We show that the randomization was valid in Table A.3.

Measuring Quality of Care
We use three measures of quality of care. Our first metric is the extent to which the provider adhered to a checklist of questions and examinations required for making a differential diagnosis on each of the presented cases. For instance, these questions and exams would allow a doctor to distinguish between heartburn (that has gastrointestinal origins) and a heart attack, or between viral diarrhea and dysentery. These items represent a parsimonious subset of the Indian government's own guidelines, and the list we use was developed by a panel of Indian and American doctors (the items are described for each case in Table A.4). 16 While the most transparent measure of checklist adherence is the percentage of checklist items completed, we also compute an index score using Item Response Theory (IRT), which gives more weight to items that discriminate better among providers. Developed in the context of 15 Since we had 22 SPs and 3 cases, we made sure that the same case was presented by different SPs in the public and private practices. To ensure that our standardized patients saw the sampled provider when (s)he visited the public clinic and not a substitute, we first interviewed all providers in their private practices or residences without revealing that we knew they also worked in the public sector, and we obtained either their photograph or a detailed description of their physical appearance. SPs portrayed a dummy case (e.g. headache) if the doctor was absent when they visited the public clinic, and we sent in other SPs on our subsequent attempts. As we discuss later, it took significantly more trips to complete an SP case in the public practice relative to the private one, due to the high rates of provider absence in the public practice. 16 The Indian government's National Rural Health Mission (NRHM) has developed triage, management, and treatment protocols for unstable angina, asthma, and dysentery in public clinics, suggesting clear guidelines for patients presenting with any of these conditions. The checklist we use is more parsimonious than what the Indian government's own guidelines recommend. If we use the more extensive checklist, this would deflate the checklist adherence further below the low numbers that we document, but does not affect the relative performance of public and private providers (which is the focus of this paper). educational testing, IRT allows us to create a composite measure of provider quality based on questions asked across all three cases, with lower weights on checklist items that are less essential and higher weights on more essential questions that do a better job of discriminating between low and high quality providers (see Das and Hammer (2005) for details). We report both measures in our analysis.
Second, we examine diagnoses -whether or not the provider uttered any diagnosis to the patient and the accuracy of the diagnosis. We consider a diagnosis incorrect when it cannot even be considered partially correct -for example, a provider tells an asthma patient that she has a gastrointestinal problem or an unstable angina patient that the weather is causing his ailment. Our definitions of correct and incorrect diagnoses are presented in Table A.4 -Panel B.
Third, we evaluate the quality of treatment provided. SPs noted all treatment instructions received and retained all prescriptions and medication dispensed in the clinic. These were then classified as correct, palliative, and/or unnecessary /harmful, based on inputs from our panel of doctors, pharmacists, and a pharmaceutical company (Table A.4 -Panel C lists the specific treatments that fall into each category). Since providers can dispense or prescribe multiple medicines, we classify each medicine as correct, palliative, or unnecessary/harmful and thus allow the total treatment protocol to be classified into multiple categories at the same time.
Correct treatment refers to a treatment that is clinically indicated for the specific case and that would relieve/mitigate the underlying condition. Palliative treatments are those that may provide symptomatic relief, or treatments where the providers correctly identified which system was being affected, but which on their own would not cure the patient of the condition that was being presented -for example, allergy medicine for the asthma patient. Treatments classified as unnecessary/harmful were neither correct nor palliative. We group these two potentially distinct categories together because it was difficult to achieve consensus among doctors on what should be considered harmful. Some, for example, would consider antibiotics for the unstable angina patient unnecessary. Others could take a longer view with antibiotic resistance in mind and consider it as ultimately harmful. However, none of the treatments we observed were directly contra-indicated, and hence most of these represent unnecessary treatments as opposed to directly harmful ones. 17 However, even after classifying all medicines as correct, palliative, and unnecessary, there are two challenges in coding the "correctness" of a treatment. The first is: How should we interpret a referral when incentives are very different? In some cases, this may be a good thing (if, for example, the provider refers a heart attack patient to a hospital). In other cases, a "referral" may simply reflect a provider who deflected the case without directing the patient usefully. 18 Since we do not send the SPs to the place that was referred, there is no obvious way of coding the quality of referrals. We therefore try to be conservative in our main analysis and do not treat referrals as correct treatments. When we repeat the analysis treating referrals as correct in the angina case, our results are unchanged (results below).
A second challenge in classifying treatments arises from the proxy nature of the dysentery case. Many providers did not provide a treatment because the child was not presented and instead asked to see the child. We therefore report results for 'checklist completion' using all three cases, but drop the dysentery case for 'diagnosis' and 'treatment' because the patient (the sick child) was not actually presented for this case. All results are robust to dropping the case completely.

Theoretical Framework
A simple theoretical framework helps interpret the results and the optimal effort and treatment choices that a provider is likely to make with and without market incentives, as well as the effects of their choices on patient health outcomes. We present the main insights here, with full derivations in Appendix C. The interaction between doctors and patients is modelled in two stages -consultation and treatment -where providers first engage in (Bayesian) learning about the patient's condition and then treat. A patient enters the clinic and presents her symptoms, based on which the provider forms a prior belief about the underlying disease that caused the symptoms given by: The provider, who has medical knowledge, K, exerts effort e and draws a signal s ∼ N (n true , 1 β ), where n true is the correct underlying state and β = eK. Providers improve the precision of the signal by either exerting higher effort, or being more knowledgeable, or both.
The provider's posterior belief is then: where µ is the posterior mean given by 19 : In the second stage, the provider makes treatment choices based on the posterior belief about the true state. The choice of treatments is expressed as an interval [µ − n, µ + n], which maps into the empirical observation that most providers in our setting provide multiple medications. A wider range of treatments has a higher probability of covering the true illness and curing the patient of the current ailment, but also increases long-term health costs. 20 The patient's health outcome given e and n is denoted by H(e, n) = P e (n) − h(n), where P e (n) is the probability that n true is covered by the treatment and h(n) is the health cost increasing with n. Thus the optimal outcome for a patient is to receive only the correct treatment, and not receive any additional unnecessary treatments, and we can think of a high-quality provider as someone who provides this outcome, enabled by a precise posterior distribution of the true illness.
In practice, providers will choose effort and treatments to maximize their own utility, which may not be aligned with those of patients. We model provider utility as having three components. First, providers care about curing their patients and overall patient health. This can be attributed partly to altruism, partly to intrinsic motivation to do the right thing, partly to training and professionalism (Hippocratic oath), partly to peer pressure and monitoring, and partly to the liability and malpractice regime. We capture all of these factors with the parameter φ, which should be thought about as representing the extent to which providers value patient health in their utility in a setting without high-powered financial incentives. Thus, a higher φ represents greater alignment between provider and patient utility.
Second, providers also care about financial rewards, which in turn depends on how they are compensated. Under market pricing, providers can charge a consultation fee (τ e) that is a function of a piece rate τ (determined by their qualifications and reputation) and effort 19 Note that the marginal effect of e on the posterior precision diminishes as e becomes larger as illustrated in Figure 1 (Panel B). Also, as in Foster and Rosenzweig (1995) a doctor with more knowledge may also have a more accurate prior to begin with (or draw a signal with a more accurate mean), in addition to learning faster with additional effort/time. This is not our area of focus as we are more interested in deriving predictions for effort, treatment, and health outcomes for the same doctor across public (low-powered incentives) and private (high-powered incentives) practices. This corresponds to our dual sample. 20 This is a standard assumption in the medical literature and can be motivated by either the building of resistance to unnecessary drugs or by the potential for adverse interactions between drugs. expended (which is observable to patients), and a dispensing fee that increases linearly with the number of medicines provided. They also have an incentive for improving patient health because this helps build their reputation and raises future demand (which we model as an increase in their consulting piece rate over time). However, patients can observe whether they were "cured" more easily than the costs of excessive medication, and this creates an incentive to over-treat because over-treatment increases the probability of spanning the true illness and providing a correct treatment. We denote the observed health outcome as H o (e, n), and true health as H(e, n).
Third, providers may offer treatments on the basis of patient demand. Patients may self-diagnose their illnesses and demand medications that they think they need, 21 or may simply seek pain-killers, steroids, and other drugs that provide symptomatic relief but are medically inappropriate for their condition. In such cases it can be costly for providers to not provide medicines that patients demand, and we model patient-induced demand as a communication cost paid by providers to convince patients about the providers' choice of treatment.
In the absence of market incentives and patient-induced demand, providers optimize over: V 2 (e) = max n {φH(e, n)} where V 1 and V 2 (e) are the maximized utilities in the consultation and treatment stage, and they choose a corresponding level of effort and treatment. Since there is no marginal incentive for either effort or treatment, these will depend only on φ and the cost of effort. The provider then chooses n that maximizes H(e, n) in the treatment stage (assuming that medicines are provided free to patients as is the norm in public clinics).
Under market incentives a doctors maximize: where τ is a piece-rate consultation fee, δ represents the extent to which improving patients' current health improves the provider's reputation in the market and generates future pay-offs (we formalize this in Appendix C), and p is a per unit profit from n. Because the health cost of n is not fully observed in the market but the provider derives pecuniary benefits from n, s/he chooses excessive n where H(e, n) is decreasing in n. However, compensation for effort (τ e) and concern about reputation induces higher effort, which yields a more accurate posterior and increases the probability of the chosen n spanning the true illness. Finally, it is important that the costs of excessive medication are observed to some extent (albeit imperfectly), because this bounds n from going to infinity. Figures 1 and 2 illustrate the main insights of the model. Market incentives typically lead to higher effort, as shown in panel (A) of Figure 1. When φ is low, providers choose low level of effort without other incentives, and the difference in the level of effort with and without market incentives leads to large difference in the posterior precision (panel (B)). Thus, while market compensation provides an incentive to over-treat, it also provide incentives for greater diagnostic effort, which yields a more precise posterior. Since increased posterior precision reduces the benefit of choosing large n, it is possible that n may be smaller with market incentives as shown in panel (C). With higher effort leading to a greater probability of providing the correct treatment and a smaller n (due to increased diagnostic precision) the resulting health outcome is better with market incentives. However, as φ increases, the default level of effort without market incentives also increases, and the marginal gain from additional effort on posterior precision is lower and the benefits of additional effort under market incentives are outweighed by the incentives to prescribe more. In this setting (seen in Panel D) providers choose larger n with market incentives, and the health outcomes are likely to be worse than that without market incentives. 22 Figure 2 summarizes this point that market incentives are likely to lead to worse outcomes in settings with a high φ, as may be typical in high-income countries. However, in settings with very low φ as seen in India and other low-income countries -exemplified by high doctor absence rates (Chaudhury et al., 2006) -market incentives may lead to better outcomes.
Finally, we also add patient-induced demand to the provider's optimization problem. With this cost, we get n closer to the value which the patient demands, though the cost is lower for providers who exert higher consultation effort (because this effort makes it easier to convince patients that their desired n is not good for them). This mechanism provides a plausible explanation for the high levels of unnecessary treatment we observe among public providers (who have no marginal incentive to do so). 23 22 See Muralidharan and Sundararaman (2011) for an adaptation of the multi-tasking framework of Holmstrom and Milgrom (1991) and Baker (1992) that yields similar insights in the context of performance-linked pay for teachers (showing that outcomes could improve under performance pay if the default level of teacher effort was low, but could worsen if the default level was high). The key difference in our context is that the high-powered incentives do not come from administratively set performance-linked bonuses, but market rewards for performance. 23 Note that patient-induced demand is not necessary to explain high levels of unnecessary treatment in public clinics (though it may partly do so). Since a less precise posterior is correlated with giving out more Appendix C formally presents our framework in two parts. We first focus on specifying and solving the provider's utility maximization problem but do not endogenize the price setting process -because the static framework maps into our data and is adequate to interpret our empirical results. The second part provides one way of endogenizing the market incentives, and is shown for theoretical completeness.

Estimation Framework
Our main interest is in estimating differences in the quality of care that patients received from providers in the public and private sectors. In the representative sample, we estimate: where we regress each measure of quality q (checklist completion, diagnosis, and treatment) in interaction i between a standardized patient s presenting case c, and a provider p in market m on an indicator for the sector (Private), with β 1 being the term of interest. Since we pool cases and SPs and there may be systematic differences across them, all our specifications include SP and case fixed effects (δ s and δ c ). We report three sets of estimates for each quality measure. First, we include only SP and case fixed effects; then we add market fixed effects so that comparisons reflect relative performance in the same market (note that not all markets had both types of providers); finally, we add controls for provider and practice characteristics X p , to adjust for observable differences between public and private providers.
While β 1 provides a useful estimate of the differences in quality across public and private providers in a representative sample of providers, it is a composite estimate that includes differences in unobservable provider characteristics, as well as the effect of practicing in the private sector. To isolate the impact of private sector practice, we re-estimate equation 8 in the dual sample that only includes data from the cases where we sent the SPs to the public and private practices of the same MBBS doctor. We report three sets of estimates here as well. First, we include only SP and case fixed effects; 24 then we add district fixed effects (since medication, our model predicts that less knowledgeable providers as well as those who put in low effort will give out more medicines. 24 Note that we do not include provider fixed effects since the angina case was not presented in both the public and private practices of the same doctor, and will drop out if we do so. Since the case was randomly allocated across the public and private practices of the doctor and assignment was balanced on measures of quality on the other case (see Table A.3), our estimates will be an unbiased estimate of the average quality difference across the public and private practices of public MBBS doctors. We also estimate equation 8 with the dual practice sample was drawn from the universe of public MBBS doctors practicing in each district rather than the universe of providers practicing in sampled villages, as was the case for the representative sample); finally, we include controls for observable differences across the public and private practices of the doctors.

Completion of Essential Checklist of History Taking and Examinations
Columns 1-3 in Table 3 present results from estimating equation 8 in the representative sample. Our outcome variable is 'provider effort', measured by consultation length and checklist completion. While the results are similar across the three specifications, we focus our discussion on the estimates in Panel B, because they compare relative performance within the same market (without controlling for provider characteristics), which is the relevant choice set for patients. Base levels of effort among representative public providers are low. The average public provider spent 2.4 minutes with the SP in a typical interaction and completed 16 percent of checklist items. Private providers spend 1.5 minutes more per patient and complete 7.4 percentage points more items on the checklist (62 percent and 47 percent more than the public providers respectively). When evaluated on the IRT scaled score, private providers score 0.61 standard deviations higher. Figure A.1 shows that time spent with the patient is strongly correlated with the number of checklist items completed, which points to the appropriateness of the checklist, as more time spent with the patient led to greater checklist completion.
Columns 4-6 repeat the analysis in the dual sample, with similar results. Public MBBS doctors appear to be more productive than the typical public provider in the representative sample (many of whom are unqualified) because they complete a slightly higher fraction of checklist items (18 percent) in 35 percent less time (0.9 minutes less). However, this additional productivity is not used to complete more checklist items in the public practice, but rather to reduce the time spent with patients (1.56 minutes versus 2.4 minutes in the representative sample). In their private practices, the same doctors doubled consultation length, completed 50 percent more checklist items, and scored 0.73 standard deviations higher on the IRT-scaled measure of quality. It is worth comparing these differences with those obtained in interventions that are regarded as highly successful. For instance, Gertler and Vermeersch (2013) look at checklist completion as a result of the introduction of performance pay in Rwanda. They find that performance pay increased checklist completion by 0.16 standard deviations; we find that the difference in checklist completion across public and private practices of the same doctor is five times larger.
provider fixed effects and the results are unchanged (but driven by variation in the asthma case).
These differences are seen clearly in Figures 3-5. Figure 3 plots the cumulative distribution functions (CDF) of the IRT-score (based on checklist completion) of public and private providers in the representative sample, Figure 4 does so for the dual sample, and Figure 5 pools all four samples together  plot the corresponding distributions). Distributions of checklist completion for private providers first-order stochastically dominates that of the public providers ( Figure 3) and the corresponding distribution for the private practices of public providers also first-order stochastically dominates that of their public practices ( Figure 4). Finally checklist completion is higher for public MBBS doctors than a representative public provider (as would be expected given that the former are more qualified), but it is lower for the public MBBS doctors even relative to a representative sample of private providers (most of whom are unqualified, Figure 5).
Focusing on individual checklist items (Table A.5) shows that private providers in both samples are significantly more likely to perform several items on the checklist on all three cases, and are no less likely to perform any of the items. In addition to β 1 , Table 3 (columns 1-3) also shows that there is no correlation between the possession of any formal medical qualification and checklist completion, suggesting that formal qualifications may be a poor predictor of provider effort.

Diagnosis
Results for diagnosis (Table 4) follow the same format as Table 3 but the dependent variables of interest are whether any diagnosis was given and whether a correct diagnosis was given (both conditional and unconditional on uttering a diagnosis). In the representative sample, 26 percent of public providers offer a diagnosis, of whom only 15 percent offer a correct one. The unconditional probability of a correct diagnosis was only 4 percent.
Private providers in the representative sample are more likely to offer a diagnosis but are not more likely to offer a correct one. The probability of offering a correct diagnosis is higher in the dual practice sample (15 percent vs. 4 percent), which is not surprising since these providers are all trained MBBS doctors. Even among these doctors, however, there is no difference in the rate of correct diagnosis between their public and private practices. Overall, the summary statistics, our price regressions (seen later), and our field work suggest that pronouncing a correct diagnosis (or even just a diagnosis) is not seen by providers (and the market) as being essential in this setting. Table 5 reports on several outcomes related to the treatment offered, coded as discussed in section 3.3. The probability of receiving at least one correct treatment from a representative public provider was 21 percent. However, they offered non-indicated treatments at much higher rates, with a 53 percent probability of providing a palliative treatment, and a 74 percent probability of providing an unnecessary treatment. Since the majority of providers provide unnecessary treatments, the probability of receiving only a correct treatment and nothing more is 2.6 percent. We can also examine two potential proxies for over-treatmentthe rate of antibiotic prescriptions and the total number of medicines provided. Antibiotics were prescribed or dispensed in 26 percent of interactions (though they were not indicated for the asthma and angina cases), and an average of 2 medicines per interaction were dispensed.

Treatment
In the representative sample, we do not find a significant difference between public and private providers on the probability of providing a correct, palliative, or unnecessary treatment; however, point estimates suggest that private providers have a higher probability of providing both correct and unnecessary treatments. Private providers in the representative sample also provide significantly more medicines (over 3 medicines on average, which is 30 percent greater than the public clinics).
In the dual practice sample, the rate of correct treatment is 36 percent higher (13.8 percentage points on a base of 38 percent), and the rate of antibiotic provision is 25 percent lower (11.9 percentage points on a base of 48 percent) in the private relative to the public practice of the same doctor. These results are robust to the inclusion of controls and alternative definitions of correct treatment (see below).

Knowledge and Effort of Public and Private Providers
As predicted by the model, there is a strong correlation between higher provider effort and probability of giving a correct treatment ( Figure 6). Nevertheless, the results in Tables 3 and 5 suggest that even though the typical private provider exerts significantly greater effort than his public counterpart, this greater effort does not lead to better treatment outcomes. The most natural explanation for this is that the representative private provider has lower levels of medical knowledge, but compensates with higher effort, yielding comparable overall levels of treatment accuracy (in line with our theoretical framework). To examine this possibility further, we use the 'discrimination' parameter of each checklist item (as estimated by the IRT-model, see Table A.5), to classify the individual items into terciles of low, medium, and high discrimination items. Here, higher discrimination items are those that are more effective at distinguishing provider quality. In the model, these would correspond to questions and exams that enable a provider to construct a more precise posterior distribution (since β = eK, this can be interpreted as a provider with more knowledge spending the effort more efficiently). 25 Table A.6 reports the same specifications as in Table 3 but compares public and private providers on checklist completion for different levels of item discrimination. By construction, providers are much less likely to complete high discrimination items on the checklist (consistent with low overall quality of care). In the representative sample, private providers complete 11 percentage points more of the low-discrimination checklist items but are no more likely to complete high-discrimination checklist items. However, in the dual sample doctors are significantly more likely to complete both low and high-discrimination items in their private practice. These results suggest that while the representative private provider does exert more effort, lower knowledge implies that the marginal product of effort is high only for questions that are easy to ask and interpret.

Robustness of checklist and treatment results
Our main results pool data across cases to maximize power. For completeness, we also show the results from Tables 3-4 by case (Table A.7). The superior performance of private providers on consultation length and checklist completion is seen in each of the three cases and in both the representative and the dual samples. Consistent with the overall results, private providers in the representative sample do not do better on diagnosis or treatment in any of the individual cases. In the dual sample, MBBS doctors were 14 percentage points more likely to correctly diagnose and 28 percentage points more likely to correctly treat the unstable angina (heart attack) case in their private practice relative to their public practices. In the asthma case, they are 11 percentage points more likely to offer a correct treatment (but this is not statistically significant given the smaller case-specific sample size).
We confirm that the results in Table 4 are robust to alternative definitions of correct treatment, such as treating all 'referrals' as a 'correct' treatment (Table A.8 shows the specific treatments offered by case, including referral frequency; Table A.9 shows that the results are robust to treating all referrals as a correct treatment). As discussed earlier, we include the dysentery case for the analysis of checklist completion but exclude it from the analysis of correct diagnosis and treatment because of the large (and differential) fraction of cases where the provider did not provide these and instead asked to see the child (see Table  A.8). Since there is a possibility that the checklist completion may also be censored in such cases, we also present the checklist completion results without the dysentery case and the results of Table 3 continue to hold (Table A.10). We also show the core results with controls for clinic-level infrastructure and facilities (Table A.11), and all the results continue to hold, suggesting that the results are not being driven by differences in facilities and infrastructure across public and private clinics.
6 Results -Pricing and Cost Effectiveness Table 6 presents correlations between prices charged and our various metrics of healthcare quality in the representative sample, dual sample, and pooled samples. The odd columns present binary correlations, while the even columns show the correlates of prices charged with measures of quality in multiple regressions. The market rewards several measures of quality of care including time spent, checklist completion rates, and provision of a correct treatment (Table 6, Columns 1, 3 and 5). On the other hand, there is no price premium for pronouncing a correct diagnosis and a price penalty for referrals; whether this penalty is optimal (without a penalty, every provider should just refer the patient) or reduces provider incentives to refer patients adequately is unclear. Finally, there is a price premium for dispensing medicines as well as for the total number of medicines dispensed, which may provide incentives for the provision of excessive medication and is consistent with our theoretical framework. Most of these patterns are repeated in the multiple regressions (Table 6, Columns 2, 4 and 6). We highlight that in multiple regressions, correct treatment is no longer rewarded. This is likely due to the high correlation between the provision of a correct treatment and the checklist completion rate ( Figure 6) and the use of medicines. The market prices quality to the extent it is embodied in the checklist, but patients cannot discern whether they received the correct treatment conditioning on checklist completion.

Correlates of Prices Charged among Private Providers
The correlates of pricing observed in Table 6 are in line with those predicted by our modeling framework and points to both strengths and weaknesses of market-based incentives for healthcare provision. On one hand, there appear to be positive incentives for the provision of better quality care (including more effort and providing the correct treatment). On the other hand, the results are consistent with evidence from other settings, which show that markets for credence goods with asymmetric information between providers and customers often reward over-provision to the detriment of customer welfare. Overall, the results suggest that the market rewards providers who "do more", which is correlated with doing more "good" things as well as more "unnecessary" things. 26 In sharp contrast to the market for private healthcare, the public sector rewards qualifications and age (experience), but there is no correlation between provider wages and any of our measures of quality including the time spent, checklist completion, correct diagnosis, or correct treatment (Table 7). Since public employees receive non-pecuniary rewards for better performance through more desirable job postings, we also present correlations between the desirability of a posting and measures of quality and again find that the only significant correlate of a better posting is age -suggesting that the public sector does not reward the quality of care provided by doctors with either more pay or with more desirable job postings. 27

Comparative Cost Effectiveness
While healthcare in the public sector is free or nominally priced to the user, it is not cost-free to the tax payer. Table A.13 presents estimates of the cost per patient in the public sector, and calculates that the cost per patient interaction is around Rs.240. This is a conservative calculation because it uses only the wage cost in the public sector and thus reflects cost "at point of service" and does not include any cost of infrastructure, facilities, equipment, medicines or administration. By contrast, the fees charged are the only source of revenue for private providers, and hence will cover all operating costs. It also assumes that all patients shown in the official records of the PHC/CHC were true patients. Finally, as is standard in comparative cost effectiveness analysis of this sort, we assume that there is a comparable case mix for primary-health visit across public and private facilities, which is consistent with our data from observing real patients (see section 7.1 below) where we do not find any difference in the symptoms presented across public and private clinics.

Real Patients
While the use of SPs to measure quality of healthcare presents several advantages over the next best method of clinical observations. However, SPs are limited in the number and types of cases that can be presented. Further, we may worry that the audit methodology represents the public sector and find that both public and private providers have similarly high levels of provision of unnecessary treatments (Table 5).
27 These results are similar to those found in publicly-provided education in India and Pakistan, where teacher salaries increase with qualifications and seniority, but are not correlated with their effectiveness at raising test scores (Muralidharan, 2013;Das and Bau, 2014). Note that the results are robust to excluding observations where we were not able to identify the medicines provided and classify them as correct or not (see Table A.12).
"off equilibrium" situations in the market that do not extend to its general functioning. 28 We therefore supplemented our data collection after completing the SP modules by conducting day-long observations in provider clinics to code actual provider-patient interactions. We conducted these observations both in the representative and in the dual samples and observed a provider in both his/her private and public practices. While we cannot code the accuracy of the diagnosis and treatment from these observations since we do not observe the underlying illness, we record several observable characteristics of each patient interaction based on over 1000 interactions in both samples. Table 8 reports results from the specification adopted in equation 8 with data from real patient interactions. Private providers spend more time with patients, ask more questions, and are more likely to conduct a physical exam. They also give out more medicines on average. Results from the dual sample are also remarkably similar to those in Tables 3-5, with private providers still exhibiting higher effort but not providing more medicines. Thus, while our SPs present only three specific cases, our results from observing real interactions between patients and providers across the entire set of cases seen in a typical day are very similar to those from the SPs, suggesting that our SP-based results may be valid for a wider range of cases.

Strategic Diversion of Effort in the Dual Sample
One issue in interpreting the results from our dual sample is the possibility that doctors with private practices may deliberately under-provide effort in their free public practices to shift demand to their fee-for-service private practices (see Jayachandran (2014) for a similar example from education). While we cannot fully rule out this possibility, there is suggestive evidence against this. We compare public providers with and without a private practice and find that providers with a private practice are not any more likely to refer away an SP (Table A.14). We do find that providers with a dual practice provide less effort in their public practices relative to those who do not, but the lack of any evidence of differences in referral rates suggest that these differences may reflect selection rather than strategic behavior with more publicly conscientious doctors being less likely to have a private practice.
The relevant policy question is whether doctors will start exerting more effort in their public practice if the option of private practice did not exist. But it is worth noting that private practice by public MBBS doctors was illegal in MP during the time of our study, and that over 60 percent of providers still had a private practice, consistent with the idea that this is a low φ environment.

Alternative Comparisons in the Representative Sample
A final issue is that our representative sample analysis compares the average public and private provider in a market, but it is not clear if the average is the correct metric for patient choice since patients can choose the best provider in the market. We therefore present an alternative comparison between the best public and best private provider (defined separately for checklist completion and correct treatment) in each market in Table A.15 and find that our results are very similar to those in Tables 3-5.

Discussion and Conclusion
Using an audit methodology, we present the first set of results on the quality of public and privately provided healthcare in a low-income country that features a de facto unregulated private sector. Comparisons of representative public and provider samples suggest that patients in our setting have few good options for healthcare -public or private. Private sector providers, the majority of whom have no formal medical training, spend more time with patients and are more likely to adhere to a checklist of recommended case-specific questions and examinations, but their effectiveness appears to be ultimately limited by their low level of medical knowledge. Public sector clinics, though theoretically staffed by qualified providers, are characterized by lower provider effort. Posts are vacant and doctors are frequently absent, so that even in a public sector clinic, the patient often sees a provider without formal training. The lower effort (compared to the private sector), appears to offset the benefit of more qualified providers in the public sector, and ultimately there is little difference in correct treatment or the overuse of incorrect medicines across a representative sample of public and private providers. At least on the basis of these data, there is little evidence that patients are harming their health more by going to the private relative to the public sector, and the price paid could well reflect patient demand for provider effort (including more reliable presence at the clinic).
Comparing the same provider in the public and private sector allow us to isolate the effect of customer accountability in the private sector and compare it with administrative accountability in the public sector. The first appears to perform better on all counts. Adherence to checklists and correct treatment rates are higher in the provider's private clinic, and rates of incorrect treatments are identical in both sectors.
Better treatment according to medical guidelines is consistent with the hedonic priceeffort relationship in the private sector, which is absent in the public sector. Providers in the private sector earn more when they complete more of the medically necessary checklist and when they provide a correct treatment, showing that the market rewards certain key aspects of high quality. Where customer accountability fails is in its ability to control the extent of unnecessary medication. Patients frequently receive treatments that they do not need, and they pay for them. Surprisingly, however, the rate of provision of unnecessary medication is equally high in the public clinics. Finally, our best estimates of cost per patient interaction suggest that the public healthcare system in India spends at least four times more but does not deliver better outcomes than the private sector.
Indian and global health policy debates have been hampered by a lack of empirical evidence on the quality of clinical interactions in the public and private sector. Under the status quo, considerable attention has been focused on inadequate access to publicly-provided healthcare and the need to increase spending on the public healthcare system (Reddy et al., 2011;Shiva Kumar et al., 2011;Planning Commission of India, 2013). Our results suggest that enthusiasm for the public sector as the primary source of healthcare in resource poor settings has to be tempered by the extent to which administrative accountability is enforced in the system. More broadly, the quality of healthcare depends both on provider knowledge and effort, and there are likely positive returns to investing in improved incentives for effort in the public system of healthcare delivery (where providers are more qualified) or increased training and credentialing among private healthcare providers, who have better incentives for effort. 29 Current policy thinking often points in the opposite direction, with a focus on hiring, training, and capacity building in the public sector on one hand (without much attention to their incentives for effort), and considerable resistance to training and providing legitimacy to unqualified private providers on the other (Reddy et al., 2011;Shiva Kumar et al., 2011;Planning Commission of India, 2013). This viewpoint is often justified by ad hoc assumptions that patients -particularly those who are poor and illiterate -are unable to make accurate decisions regarding their health care. While certainly possible, such an assertion would have to be backed by empirical evidence on patient demand and quality of care. Our paper is one of the first attempts to do so, and expanding this methodology to other conditions and settings will allow for a richer understanding of the functioning of medical systems in settings with low resources and administrative capacity.      Notes: Standard deviations in parentheses. The number of providers available to a village was determined by a provider census, which surveyed all providers in all locations mentioned by households in 100 sample villages, when asked where they seek care for primary care services, regardless of whether or not the particular provider was mentioned by households. Unqualified providers report no medical training. All others have training that ranges from a correspondence course to a medical degree. "Outside villages" are typically adjacent villages or villages connected by a major road. The 30-day visit rate was calculated from visits to providers reported by households in a complete census of households in the 100 sample villages. The type of provider they visited was determined by matching reported providers to providers surveyed in the provider census. Notes: Standard deviations are in parentheses. Unit of observation is a provider. The dual practice sample consists of providers who received a standardized patient in both their public and private practices. Provider mapping and complete provider census yielded information about whether or not a provider operates more than practice. The representative sample did not employ the intense reconnaisance to find both the public and private practices of the same provider, and thus the proportion of dual practice providers can be considered self-reported. In the dual practice sample, however, the existence of additional medical practices was verified by repeated observation. (5 districts) (5 districts) (3 districts)

Panel C: SP, case and market/district fixed effects
Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. All regressions include a constant. Observations are standardized provider-patient interactions. Market fixed effects are used for the representative sample, and district fixed effects for dual practice sample. Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. All regressions include a constant. Observations are standardized provider-patient interactions. Market fixed effects are used for the representative sample, and district fixed effects for dual practice sample. In columns (6) and (12) the dependent variable is total number of medicines recommended to the patient (dispensed and/or prescribed).   Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. The desirability index is a constructed using principal component analysis of proximity to several amenities (paved road, bus stop, railway station, Internet, post-office and bank), availability of infrastructure (stethoscope, spyghamometer, torchlight, weighing scale, hand washing facility, drinking water, staff toilet, patient toilet, fridge, sterilizers, electric connection, electric supply, power generator, telephone, computer, IV drip, cots/beds, disposable syringes), and PHC size (number of staff and number of patients). In binary regressions columns, each coefficient represents a separate regression of prices on the row variable, a constant and district fixed effects. Multiple regressions include district fixed effects.

Panel B: including patient controls and market/district fixed effects
Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the village level in parenthesis. Observations are patient-provider interactions, and the sample has been limited to the SP sample. All regressions include controls for patients' characteristics and patients' presenting symptoms. Controls for patients' characteristics include: whether patient has no education, number of questions asked by patient, and patients' asset index. Controls for patients' presenting symptoms include: number of days patient has been sick, patients' ease in performing activities of daily living, and indicators for a number of presenting symptoms (fever, cold, diarrhea, weakness, injury, vomiting, dermatological problem, pregnancy, and pain). In columns (5) and (10) the dependent variable is total number of medicines recommended to the patient (dispensed and/or prescribed).

A.1 Mapping of Providers: Representative Sample
We first randomly selected five districts in the state of Madhya Pradesh, stratified by region and an index of health outcomes. In each district, we sampled 20 villages by probability proportional to size (PPS) sampling. Because of the rural focus of the study, we restricted the sampling frame to villages with population below 5,000. The sample of villages is thus representative of rural Madhya Pradesh.
In each sampled villages, we conducted at least three Participatory Resource Assessments in different locations within the village and obtained a list of all the health care providers that households' sought primary care services from. These lists were used primarily to identify the geographical locations that households' sought care from. For instance, households may seek care from providers within the village, but also on the nearest highway. If 5 percent or more households reported visiting a provider in an outside location, we identified that location as a "cluster village" and considered it a part of the "health care market" for the sampled village. Fifty-five sampled villages have one cluster village, 13 villages have two, and one village has three. The remaining 31 villages have no cluster villages (i.e. less than 5 percent of primary health care visits were to a location outside the village). For our sample as a whole, we identified 184 unique locations, including the 100 sampled villages.
Surveyors then visited each location and administered a provider census to all health care providers in the location -regardless of whether they had been mentioned in the participatory assessments. The provider census details the provider's demographic, practice and clinic characteristics.
Following the provider census in the villages, we administered a short household census with information on household demographics and health care seeking behavior. For each household member, we asked about incidence of any illness (primary or otherwise) in the past one month, if they sought medical attention for that illness, and (if yes), the name and address of the provider they visited (regardless of the location of the provider). Surveyors mapped the household visits to the providers lists; this is the mapping we use to compute the fraction of visits to public and private providers and providers with different qualifications in the text. In instances where households reported visiting providers not already on the list, we probed for providers' name, address and practice details and added the providers to our listing and census exercise. We verified through this exercise that we had covered at least 95 percent of all providers visited by households in each village. This exhaustive mapping process ensured that we mapped the complete "health market" where households in our sampled villages sought primary care services.

A.2 Sampling of Providers for SP visits: Representative Sample
We conducted the SP work in three out of the five districts in our sample because of logistic considerations. Although SPs were recruited from the local community, they needed plausible reasons for their presence in the village (which they were not from), and the typical narrative was that they were traveling and/or passing by the village. In order to minimize SP detection, we excluded 5 remote markets (as assessed by road access) from the possible 60 markets, where we thought a traveling excuse might not be plausible.
We sampled providers for the SP work from a smaller set of "eligible providers" than what we had mapped. All public nurses and midwives (ANMs), community health workers (ASHA), day-care center workers (Anganwadi), chemists and pharmacists as ineligible for SP work, as they provided primarily preventive care such as vaccinations. We also exclude mobile and itinerant providers from the sample. Finally, we exclude 55 providers with whom we could not complete the provider census (typically due to the unavailability of provider, we were able to conduct the census with 17 of these providers in subsequent rounds). These restrictions remove 7 markets from our study, primarily because there were no eligible providers in these markets. We also drop two other markets because they share a cluster with other sampled villages and do not have eligible providers inside the village. Our study in the representative sample therefore covers 46 markets in 3 districts of Madhya Pradesh (see Table A.1). Based on the eligibility criteria defined above, these 46 markets have 649 eligible providers (130 public and 519 private) from which we sample.
In each market we randomly sampled up to two eligible providers in each public clinic and up to six private providers in each market. 30 In the private sector, we sampled one provider per clinic. We sampled a total of 247 providers of which 45 are public providers and 202 are private providers (Appendix Table A.1). We also sampled all MBBS providers in both public and private sectors.

A.3 Completion of SPs: Representative Sample
We sampled 247 providers in 235 clinics and SPs were completed with 224 providers in 215 clinics. Of these, 214 providers are those we sampled. Furthermore, for 27 SP interactions (corresponding to 10 providers) we saw a provider who had not been sampled. We knew the identity of the provider because he/she had been included in the census, but was practicing in a clinic different from their own. For 18 observations (corresponding to 8 public and 2 private clinics and 10 public and 2 private provider sampled) we do not know the identity of the provider. These were most likely staff mapped to the clinic who are not licensed to provider care, but do so when the doctor is absent.
The discrepancy between who we sampled and who we actually saw does not affect interpretation of our results in Panels A and B of Tables 3-5 but it does in Panels C. Panels A and B present results without provider controls, therefore, whether or not we have background data on the provider is not relevant. Including the small fraction of observations where we saw unintended and unknown providers, the public-private difference here should be interpreted as the difference random visits to providers' clinics than providers. In Panels C, we present results including provider controls. Here, for 27 interactions where we saw providers we did not sample but mapped (and conducted provider census), we use their background information. The 18 observations where we do not know the provider at all are dropped from the estimation sample.

A.4 Mapping of Providers: Dual Practice Sample
We obtained a list of all Primary Health Centers (PHCs) and Community Health Centers (CHCs) from the Ministry of Health of Madhya Pradesh. Excluding PHCs/CHCs which were mapped as part of the representative sample, we mapped 200 more facilities in this round. Of these 200 facilities, 40 did not have a MBBS provider posted (see Appendix Table  A.2). In the remaining 160 PHCs/CHCs we located 216 providers (some providers were mapped to multiple facilities). Our field team then undertook detailed field work to find out if these providers operated private practices and if yes, to locate their private practices. We found thatwere able to locate a private practice for 132 of the 216 providers (61.1 percent) operated a private practice (this is the sample we call the "dual practice sample".). After the mapping, we administered the provider census to all providers. To the extent possible, the census was administered in the private clinic of the provider.

A.5 Sampling of Providers: Dual Practice Sample
We sampled one provider from every PHC/CHC with preference for a dual practice provider. Often a provider is posted to multiple public facilities, and in cases where there were no additional providers in these facilities, we randomly sampled the provider from one of the multiple facilities they were posted to. With this sampling strategy, we sampled from 143 of the 160 facilities we could have sampled from. Of the 143 providers, 94 operated private practices (65.7 percent, see Table A.2).

A.6 Completion of SPs
We completed SPs with 118 of the 143 providers sampled, primarily because providers had left the clinic or were away on "long leave" in the 6-month phase between the listing and the SP work. We attempted to complete these providers over a minimum of 4 attempts, and were forced to stop trying at that point due to the heightened risk of detection. Of the 49 providers without private facilities, we completed an SP with 30 providers (61.2 percent). Of the 94 providers with private practices, we were able to complete at least one SP with 88 providers (93.6 percent, either public or private). The number of dual practice providers for whom we have at least one observation in both public and private is 69. As discussed in text, and seen in Table A.10, it was difficult to complete a case in the public practice of the public MBBS doctors (because of high absence rates), and we had a lower completion rate in the public practices of these doctors than in the private practices (75 percent versus 93 percent). To the extent that absence is correlated with lower quality practice, this would suggest that we underestimate the public-private difference in our results.

B.1 Description of tracer conditions and relevance for India
SPs presented either a case of unstable angina, asthma, or dysentery of an absent child.
• Unstable Angina: A 45-year-old male complains of chest pain the previous night.
Appropriate history taking would reveal classic signs (radiating, crushing pain) and risk factors (smoking, untreated diabetes, and family history of cardiac illness) of unstable angina or an imminent myocardial infarction.
• Asthma: A 25-year-old male or female presents with difficulty breathing the night before the visit. When questioned appropriately, the SP reveals that the episode lasted for 10 to 15 minutes and involved a "whistling" sound (wheezing) and that he or she has had similar episodes before, often triggered by house cleaning and cooking smoke. The SP also reports a family history of similar symptoms.
• Dysentery: A 26-year-old father of a 2-year-old complains that his or her child has diarrhea and requests medicines. When probed, the SP reveals details of their water source and sanitation habits, in addition to the presence of fever and the frequency and quality of the child's stools.
For all cases, checklists of recommended history questions and examinations were developed together with an advisory committee and SPs were trained to recall the questions asked and examinations performed. These were then recorded during debriefing with the supervisor using a structured questionnaire within an hour of the interaction. In a recent study, we test the reliability of recall by comparing audio recordings with recall and find a very high correlation of 0.63 (p<.001) (Das et al., 2015).

B.2 Relevance of Cases
Incidence of cardiovascular and respiratory diseases has been increasing, and diarrheal disease kills more than 200,000 children per year in India (Black et al., 2010;Patel et al., 2011) The Indian government's National Rural Health Mission (NRHM) has developed triage, management, and treatment protocols for unstable angina, asthma, and dysentery in public clinics, suggesting clear guidelines for patients presenting with any of these conditions (Jindal et al., 2005). The cases were also chosen to minimize risk to standardized patients since they could not portray any symptoms of infection given the documented high propensity to administer medicines intravenously with unsterilized needles and to use thermometers that have not been appropriately disinfected (Banerjee, Deaton and Duflo, 2004).

B.3 SP recruitment, script development and SP training
A total of 22 SPs were recruited from an initial group of 45 who were extensively screened and trained for 3 weeks. The age and sex of recruited SPs corresponded to the relevant tracer conditions. For instance, angina was depicted by male SPs between 40 and 50 years old.
Scripts were developed under the guidance of an anthropologist with active SP participation that described the social and family contexts of the patient if a provider were to ask questions about these details. Script development and SP training jointly ensured that the clinical symptoms and case history reflected the social and cultural milieu of which the SP was assumed to be a member and, second, the presentation of symptoms and answers to history were consistent with biomedical facts about the disease. SPs were trained to present symptoms and answer questions pertaining to case history that were medically correct. For example all opening statements and questions pertaining to the type of cough and its duration were standardized. SPs were also trained to distinguish between questions to which answers could be improvised but had to be appropriate to the social role of the SP and answers that had to be given using local idioms but in a standardized format without any alterations.
All SPs underwent rigorous training for 100-150 hours that started with a focus on the cases and the development of scripts and proceeded to memorization and appropriate role-playing, as well as techniques to perfect recall of the questions asked and examinations completed during the interaction. Following the training, SPs visited doctors who were working with our team to provide feedback on their presentation and depiction of the cases. Finally, dry runs were completed with unannounced visits to consented providers to help build the confidence of the SPs and take them through a number of "real-life" situations. Field work started once protocols were in place for the variety of these experiences.
The study was first piloted in Delhi with 64 consented providers who had been previously informed that they would be visited by an SP within the next 6 months (see Das et al. (2012)). In the pilot phase of the study, a total of 248 out of a potential 256 SP interactions were completed. Within a month of the SP visit, field-workers visited the consented providers to enquire if they had been visited by an SP. In cases where the provider felt that an SP visit had occurred, we elicited the sex, approximate age and symptoms of the SP. We could confirm a match between the providers' suspicions and the actual SP sent to the provider in only 2 cases for a detection rate of less than 1 percent.
Institutional review boards at Harvard University and Innovations for Poverty Action and the Central and State governments in India granted clearance for the study. For the Delhi pilot, consenting providers were informed they would receive a standardized patient in the following 6 months. No standardized patients were harmed or exposed to risk in this stage and detection rates among consented providers were below 1 percent. To minimize detection in rural Madhya Pradesh, where providers are more likely to recognize their entire patient population, the study proceeded as an audit, and providers were not aware that they were being visited by standardized patients. Clearance was granted for this deception design because the risks to providers and their patients were minimal, whereas accurate measures of provider practice were nonexistent. The expected length of clinical interactions, patient loads, and levels of provider anxiety induced by the cases were thought to be small, and standardized patients had to pay providers whatever they charged. The waiver of consent is consistent with the principle that where the research subject provides a public service to other customers, the public have a right to know about the quality of the service provided (Norris, 2002).

B.4 Categorizing treatment in SP interactions
In rural Madhya Pradesh, as in much of India, providers often dispense medicines in the clinic rather than prescribe them for purchase from external chemists (some do both). Our field staff recorded names of all dispensed/prescribed medicines in SP exit interviews and used multiple resources to classify medicines as accurately as possible. Field staff were given a list of commonly used drugs in India along with their medical classification, and the CIMS Drug Information System (in print), which they used to record exact medicine names and classes. For drugs that were not immediately confirmed, they consulted local chemists and pharmacists and obtained correct names to the extent possible. Along with names, field staff coded if the medicines belonged to any of the following categories âĂŞ ayurvedic, homeopathic, antibiotics, analgesics, anti-ulcer medication, steroids, anti-allergy medicines, cardiac medicines, psychiatric medicines, other.
To construct our main treatment variables -correct treatment, palliative treatment and unnecessary treatment -we obtained from a panel of doctors in the United States and India a full list of correct and palliative treatments/medicines for each case. These include nitrates, aspirin, clopidogrel, anti-platelet agents, blood thinners, beta blocker, morphine, other pain control, ACE inhibitor and vasodilator for unstable angina; ORS, electrolytes and zinc for asthma; and inhaled-corticosteroids, leukotriene inhibitors, cromones, inhaledanticholinergics and oral-corticosteroids for asthma (see Table A4).
After medicine coding in the field, members of the ISERDD team in Delhi verified the codes assigned to all medicines and recoded if them when necessary. To further ensure the coding was correct, we used a third party, a pharmaceutical consulting firm in Delhi, to independently verify our classification of medicines.
Medicine coding is relatively straightforward in instances where providers prescribe and SPs receive a written prescription. In cases where providers dispense, it is obviously easier to obtain names when medicines come with packaging than when they do not. In the 1,123 complete SP interactions, SPs were recommended a total of 2,772 medicines corresponding to 969 unique medicines (by medicine names, ignoring unlabeled ones). We are unable to classify 14.18 percent of the all 2,772 medicines because they were unlabeled (providers dispensed them as loose samples or in crushed powder form). We are further unable to classify 3.64 percent of medicines (93 unique medicines by name) because we could not match them to secondary information sources. SPs received at least one unclassifiable medicine in 268 interactions (23.9 percent). However, in 211 of these interactions (18.8 percent), SPs received classifiable medicines along with the unclassifiable medicines. In only 57 interactions (5.1 percent) were all medicines unclassifiable.
We construct our main treatment variables -correct treatment, palliative treatment and unnecessary treatment -after completing the medicine coding process described above. For each interaction, we determine if any recommended medicines fall into correct, palliative and/or unnecessary treatments, treating all unlabeled and unidentifiable medicines as unnecessary. It is possible that the unlabeled and unidentifiable medicines construe correct or palliative treatment. But our results are robust to excluding interactions that include unclassifiable medicines. Nonetheless, the likelihood that the provider dispenses medicines, and the provider dispenses an unclassifiable medicine is decreasing in other measures of provider quality from the SP study, therefore we are reasonably confident that such medicines are more likely unnecessary treatments than not.

C Theoretical Appendix
We provide a simple theoretical framework which demonstrates doctors' choice of effort and treatment with and without market incentives as well as the effects of their choices on patient's health outcome. This framework incorporates three possible channels which can generate excessive unnecessary treatments. The first channel is ignorance. Doctors want to treat patients, but they do not know the cause of patient's symptoms and give out a cocktail of medicines hoping that one of the medicines would work. Second, there is pecuniary incentive to sell more medicines. Third, excessive treatment can be driven by patients. Patients have their own expectation about proper treatment, and doctors satisfy the patients by complying their expectation.
One key intuition of the framework is that unnecessary treatments are not only driven by market incentives but also can arise due to low effort of doctors. When doctors lack motivation to exert enough effort to substantially reduce their ignorance about the patients condition, mixing a variety of treatment is even necessary to maximize the health outcome. Market incentives induce higher effort but also lead to more unnecessary treatment at any given level of effort, thus, health outcome produced under market system does not necessarily dominate that of public system, and vice versa.
Our purpose is to provide a framework which helps to interpret the empirical findings related to the choice of effort and treatment by doctors facing different incentives. We abstract from any market equilibrium component such as pricing, entry and exit decisions of doctors, or any strategic interaction among doctors in the market or across sectors (public and private). Patients' expectation also enters exogenously rather than endogenously formed in the market.
This appendix is comprised of two parts. In section 1, which is the main part of the appendix, we introduce doctors' utility maximization problems and discuss whether market incentives induce higher effort and better health outcome. We first omit the patients' expectation channel from the problem to focus on the effects of market incentives and introduce it again at the end of the section. In section 2, we provide one potential way of endogenizing the market incentives for interested readers.

C.1 Doctors' maximization problem with and without market incentives
A patient visits a doctor endowed with a level of medical knowledge K. The patient has an illness defined by the required type of treatment denoted by n true . Patients with different underlying illnesses may experience similar symptoms. In other words, given a set of symptoms, there is a distribution of n true associated with the symptoms. Doctors' job is to identify the true state of the patient and perform adequate treatments. The doctor-patient transaction is modeled as a two-stage process: consultation and treatment. Subscript i for ith doctor is used when there is a need to emphasize heterogeneity among doctors. The subscript is suppressed otherwise for notational simplicity.

C.1.1 Consultation stage
A patient visits a doctor. The true state of the patient n true is unobserved to both the patient and the doctor. The patient describes his symptoms, and the doctor forms a prior belief about the true state given the described symptoms. The prior belief follows a normal distribution: n prior ∼ N ν, 1 α The prior belief can be thought of the distribution of illnesses in the region which cause the given symptoms. The doctor exerts a costly effort e to learn about n true . The effort cost is given by e 2 . One can interpret e as the number of checklist items or time spent with the patient. e is also observed by the patient. The doctor draws a noisy signal s ∼ N (n true , 1 β ) by exerting e where β = eK. The signal is not observed by the patient. Given s, the doctor updates her belief about n true . The posterior belief of the true state is given by: This is the result of standard Bayesian normal updating, and hence, proof is omitted. Note that n post → n true as β → ∞.

C.1.2 Treatment stage
Based on the posterior belief about the true state, the doctor decides the types of treatment she will perform. The treatment is expressed as an interval [µ−n, µ+n], and n is interpreted as the variety of the treatment chosen by the doctor. Let F e denote the cumulative density function of the posterior belief given some level of effort e. Given K, the shape of the posterior belief is governed by e. Throughout the appendix, e and β are used interchangeably depending on the context. The probability that the interval [µ − n, µ + n] includes n true is denoted by P e (n) where P e (n) = F e (µ + n) − F e (µ − n). There is a health cost of using a variety of treatment given by n 2 . The expected health outcome, H, is a function of e and n and is given by H(e, n) = P e (n) − n 2 . Note that for each individual patient, the interval either includes the true state or not with probability of P e (n) and 1 − P e (n).
The patient has his own belief about the proper treatment that he expects to receive when visiting a clinic given the symptoms he has, which is denoted byn. It is assumed that n is also known to the doctor. When the chosen n is different fromn, the doctor needs to communicate with the patient to convince him that the doctor's choice of n is the correct treatment. The farther away n is fromn, the more communication needs to be done. Also, the patient is more easily convinced if the doctor has exerted more effort to examine the patient in the first place. The cost of communication is given by (n−n) 2 e . An easy way to reduce this communication cost is to simply give something close ton. We are particularly interested in the case wheren is large.

C.1.3 Doctors' optimization problem with and without market incentives
Denote the maximized utility of doctors in the consultation stage and treatment stage by V 1 and V 2 respectively. Without market incentives, doctors have low-powered incentives only and maximize their utility: where φ governs the magnitude of low-powered incentives. Doctors may face market incentives in addition to low-powered incentives. In market environment, doctor i charges a piece rate τ i per unit of effort as a consultation fee and also charges p per unit of n for the treatment. 31 Doctors also care about their reputation in the market, which is determined by the health outcome of their patients. Health outcome is not fully observed in the market because the long-term health cost of excessive treatment is not as easily observed as the immediate relief of the symptoms. Instead, reputation is based on the observed health outcome H o , which is given by H o (e, n) = P e (n) − γ o n 2 where 0 < γ o < 1. δ is a parameter that governs the magnitude doctors care about their reputation in the market. When there are market incentives, doctors maximize their utility given by: To focus on how the presence of market incentives shape the optimal choices of doctors when there are some degrees of ignorance, let us omit the third channel, expectation of the patients, and remove the term − (n−n) 2 e from the doctor's maximization problem. We reintroduce the term in subsection 1.5.
The first order conditions without market incentives are given by: where f e is the probability density function of the posterior belief given e. f e (µ + n) captures the marginal benefit of increasing n through higher probability of covering the correct treatment. The left hand side is the marginal cost of increasing n through higher health cost of excessive treatment. Note that doctors choose n which maximizes H at any given e. The first order condition in the consultation stage with market incentives is given by: and the first order condition in the treatment stage is given by: The pecuniary benefit of selling n increases the marginal benefit of n. Because γ o < 1, the marginal cost of n is smaller than when there are no market incentives. It is easy to see from (12) and (14) that given e, doctors choose larger n when there are market incentives. Because there is pecuniary benefit from n and also because the cost of excessive n is not fully observed in the market, given e, the marginal benefit of n is always greater and the marginal cost is always smaller than those without market incentives. Thus, doctors choose excessive n where H is decreasing in n instead of where H is maximized. This means that by slightly decreasing n, the health outcome will be improved.
Whether market incentives induce higher effort depends the relative size of the rewards for e and n in the market. As long as the rewards for n is not too large so as to dominate those for e, doctors choose higher e with market incentives. One of the benefits of exerting higher e is to produce better health outcome with smaller n. When there is little punishment for excessive treatment in the market and the marginal profit from treatment is large, doctors may find it optimal to reduce e and profit from large n unless the direct rewards for effort are large enough to offset the force. To see this from the first order conditions (11) and (13), observe that the marginal benefit of e is larger when there are market incentives if n(e) were the same for both with and without market incentives. However, doctors choose larger n when there are market incentives. Note that, for any given e, nf e (µ + n) is increasing in n when n < 1 √ α+β , maximized when n = 1 √ α+β , and decreasing in n when n > 1 √ α+β . Also, nf e (µ + n) is bounded below by 0 and above by 1 √ 2π exp{− 1 2 }. Let n 1 (e) and n 0 (e) denote the optimal choice of treatment as a function of given e with and without market incentives. Observe that when n 1 (e) < 1 √ α+β , the marginal benefit of e with market incentives is always greater than that without market incentives. n 1 (e) is more likely to be smaller than 1 √ α+β when the market incentives for n are small, i.e., when p is small and H o is close to H. On the other hand, when n 1 (e) > 1 √ α+β , the left hand side of (11) may be larger than that of (13). However, because nf e (µ + n) is bounded, we can always τ which makes the marginal benefit of e with market incentives larger.

C.1.4 Market incentives and health outcome
The direction of the effects of market incentives on the level of effort and health outcome depends on parameter values. One possible outcome of the model is that, when the magnitude of low-powered incentives, φ, is small, the health outcome under market incentives dominates; however, as φ increases, the health outcome without market incentives starts to dominate. 32 Figure 3 and 4 in the main text illustrate the mechanism that such outcome is produced. Panel (A) in Figure 3 illustrates a case where market incentives induce higher effort. M B with and M B without are the left hand side of (20) and (11) with respect to e. M C with and M C without are the right hand side of (20) and (11) with respect to e. Holding other parameter values constant, M B and M C curves with some small and large φ values are drawn. e * with and e * without are the optimal levels of effort with and without market incentives, respectively, for small and large φ values. The rewards for higher effort in the market is sufficiently large in this case, e * with is larger than e * without . With larger φ the optimal choice of e is higher. Panel (B) has posterior variance 1 α+β , the inverse of posterior precision, as a function of e holding K constant. The y-axis intercept 1 α is the posterior variance when e = 0. 1 α+β decreases with e at diminishing rates because β = eK. When φ is small, a difference in e is translated into a substantial difference in the posterior variance. When φ is large, the marginal effect of effort on 1 α+β is small. Panel (C) illustrates the optimal level of treatment with and without market incentives, n * with and n * without , when the posterior variance with market incentives is substantially smaller than that without market incentives. M B with and M B without are the left hand side of (21) and (12) with respect to n. M C with and M C without are the right hand side of (21) and (12) with respect to n. The slope of M C with is smaller than 1 because the health cost of excessive treatment is not fully observed, and hence, punishment for excessive treatment in the market is weaker than what doctors would impose on themselves due to low-powered incentives. p, the unit price of n, is added to M B with , so M B with asymptotes to p 2(φ+δ) rather than to 0. When posterior variance with market incentives is substantially smaller than that without incentives, the optimal level of n with market incentives can be smaller in spite of incentives for excessive treatment. Panel (D) illustrates the optimal level of treatment with and without market incentives, n * with and n * without , when the posterior variance with market incentives is only slightly smaller than that without market incentives. In this case, the effects of market incentives on excessive treatment dominates, and the optimal level of n is larger with market incentives. Figure 4 illustrates the health outcome produced with and without market incentives with different values of φ. H increases with φ because e increases with φ, and n is invariant of φ given e when there are no market incentives and decreases with φ when there are market incentives. We argue that, when φ is low, H is higher with market incentives; however, as φ increases, H without market incentives starts to dominate that with market incentives. When φ is very small, doctors choose e close to zero without other incentive to exert effort. With market incentives, doctors always choose e that is above τ . At low levels of e, small difference in e is translated into a substantial difference in the posterior precision. Although market incentives induce excessive n, the effect of higher posterior precision on the health outcome dominates the offsetting effect of excessive n. However, as φ increases, e under both environment increases, and the effect of e on the posterior precision, and hence on the health outcome, becomes smaller. At sufficiently high levels of e, higher e with market incentives generates little difference in the posterior precision that is too small to offset the effect of excessive n. Thus, when φ is high, the health outcome without market incentives is higher.

C.1.5 Re-introducing the patient's expectation
When patients have their own expectation about proper treatment, doctors engage in costly communication to convince patients that their choice of n is the correct treatment. The first order conditions with the communication cost , (n(e)−n) 2 e , are given below. Without market incentives : With market incentives : Note that, in the treatment stage, the marginal cost of n is smaller than that when patients' expectation channel is omitted when n <n and it is larger when n >n. Thus, there is an incentive to choose n closer ton. Figure ?? illustrates the effect of some largen in the treatment stage. We consider a case where doctors choose higher effort with market incentives. Panel (A) is when the patient's expectation channel is omitted and panel (B) is when the channel exists. In panel (A), M B with and M B without are the left hand sides of (14) and (12) with respect to n. M C with and M C without are the right hand sides of (14) and (12) with respect to n. In panel (B) they are the left and the right hand sides of (16) and (15). The optimal levels of treatment with and without market incentives are labeled as n * with and n * without . Given e, 1 + 1 eφ > φ+γoδ φ+δ + 1 eφ+eδ and − 1 eφ < − 1 eφ+eδ ; the MC curve with market incentive has a higher intercept and a smaller slope. The direction of inequalities still holds when e with market incentives is larger than that without market incentives. At small values of n, MC of n is lower in panel (B) than in (A) because the communication cost decreases as n becomes closer ton. Thus, with some largen, the optimal choice of n becomes larger. The effect is larger when there are no market incentives because the level of effort is lower, and hence, the communication cost of deviating fromn is greater. This implies that when patients demand excessive treatment and doctors lack incentives to exert effort without market incentives, we would observe more excessive treatment among public doctors.

C.2 Endogenous market incentives
In this section, we provide a set of assumptions about market structure and patients' preference that generate market incentives consistent with (10). By allowing the current piece-rate consultation fee τ to depend on the past realizations of the level of e and H o , we derive the same set of optimality conditions for e and n as in the previous section. The distinction between τ and γ e becomes clear by specifying the dynamic pricing equation.
The structure of the economy is similar to the one considered by (Acemoglu, Kremer and Mian, 2008) where teachers produce noisy signal about their ability (test score of their students) by choosing the level of productive and unproductive efforts and the price of each teacher is determined where the parents' expected utility from any teacher becomes zero. Because different price is set for each doctor, different levels of effort and treatment and resulting health outcomes are sustained in the market. We abstract from entry and exit decisions of doctors and assume away any strategic interaction among doctors.
Consider an infinite horizon economy with infinitely lived doctors and patients who live for one period. In every period, a new set of patients enter the economy. Suppose that there is a finite and countable set of symptoms and each patient experience a subset of symptoms. For example, chronic headache can be one subset and chest pain be another, and also, the union of chronic headache and chest pain can be another subset. Furthermore, suppose that separate markets exist for each subset of symptoms and there is no interaction between markets. Each market has N doctors and a continuum of patients of measure one. The measure of the patients is constant over time. Each patient can visit only one doctor.
In the market, doctors face dynamic market incentives in addition to low-powered incentives. Denote the piece rate τ in this period and the next period by τ 0 and τ 1 respectively. Their choice of e and n in this period affects τ 1 through a law of motion given by τ 1 = τ (H o i ). Doctors discount the future with discount factor δ, 0 < δ < 1. The doctors' present value of life-time utility is presented in the following recursive forms: V 2 (e) = max n φH − (n −n) 2 e + np + δV 1 (τ 1 ) subject to The first order condition in the consultation stage with market incentives is given by: where e(τ 1 ) is the optimal choice of e given τ 1 and ∂V 1 ∂τ 1 = e(τ 1 ) comes from the envelope condition. The effect of the current choice of e on the future piece rate τ 1 is given by ∂e . The first order condition in the treatment stage is given by: Patients derive utility from the health outcome, the doctor's effort, and treatment. Patients do not observe the true health outcome H and base their utility on the observed health outcome H o . Patients believe that higher doctor efforts will lead to better health outcome. Patients' expected utility of visiting doctor i in period t is given by: with γ n > 0 where γ n captures the pure consumption value of a unit of n. The utility patients derive from the effectiveness of n is embedded in H o . We make the following three assumptions about the market.
Assumption 1. Treatment can be purchased outside the doctor's clinic. For example, there are pharmacies where patients can purchase medicines. There is infinitely many suppliers of treatment and the quality of a unit of n is homogenous. The suppliers are price takers.
Assumption 2. Doctors are price takers. Assumption 3. The economy is in the steady state where each doctor makes the same choice of e i,t = e i,t−1 and n i,t = n i,t−1 ∀t.
Under Assumption 1, Bertrand competition among patients imply that p = γ n . Thus, The added value of a doctor visit comes from identifying effective treatment through consultation but not from consuming n itself. With Assumption 2, Bertrand competition among patients lead to τ i,t at which EU i,t = 0 for all i in every t. Assumption 3 implies that the expected observed health outcome in period t is the same as that in period 7. Period t + 1 begins, and the same sequence of events repeat.
In the steady state, we have: where τ * is a steady state piece rate for a given doctor and e * and n * are the optimal choices. Rewriting the first order conditions with market incentives by plugging the above expressions in (18) and (19), we have: (1 − δ)τ * + (φ + δ) f e (µ + n(e * )) n(e * )K √ α + e * K = 2e *  (21) and (12) with respect to n. M C with and M C without are the graphs of the right hand side of (21) and (12) with respect to n.
and f e (µ + n * )  Notes: Reasons for not completing SP surveys include transferred and provider not found. In almost all cases our field staff made at least three attempts to complete a case. During fieldwork we replaced five sampled providers with other providers. In two cases, it was because the provider was on sick leave, two cases because provider had been transferred and one case because provider had gone on training. (1) (2) (3) (4) (5)

Representative sample Dual practice sample
Notes: In Unstable Angina, alternate definition for correct treatment codes referrals and referrals for ECG as correct. Note the large and significnat differences in "asked to see the child" across public and private providers in the representative and dual samples. If we were to assume the same rate of correct treatment in these cases, then the differences in correct treatment are no longer significant in either sample. If we carry out a bounding exercise, the differences are still not significant, and the standard errors are too wide for meaningful inference. This is why exclude the dysentery case in our pooled analysis of treatment across cases.

Panel C: SP and market/district fixed effects
Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. All regressions include a constant.
Observations are standardized provider-patient interactions. Columns (1) and (2) also include case fixed effects. Market fixed effects are used for the representative sample, and district fixed effects for dual practice sample. Alternative definition for Unstable Angina adds "referral" and "referral for ECG" as correct treatment. Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. All regressions include a constant. Observations are standardized provider-patient interactions, except in IRT score column where each observation is a provider. The score is computed using all cases, plausible values scores are used. Market fixed effects are used for the representative sample, and district fixed effects for dual practice sample. Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. All regressions include a constant and controls for provider qualifications, age, gender, and patient load. Observations are standardized provider-patient interactions. Dual sample refers to providers who operate both public and private clinics. Market fixed effects are used for the representative sample, and district fixed effects for dual practice sample. Columns (1)-(3) include all cases and can be compared with Table 3. The remaining columns include Unstable Angina and Asthma cases only -compare Columns (4)-(6) with Table A6; and Columns (7)-(12) with Table 4. In column (12) the dependent variable is total number of medicines recommended to the patient (dispensed and/or prescribed). Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. Observations are standardized provider-patient interactions. Interpretation of coefficents in "Binary regressions" needs caution. Each coefficient represents a separate regression of prices on the row variable and case and district fixed effects. Multiple regressions include case and district fixed effects. Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. All regressions include a constant. Observations are standardized provider-patient interactions. In column (13) the dependent variable is total number of medicines recommended to the patient (dispensed and/or prescribed). Notes: *** Significant at 1%, ** Significant at 5%, * Significant at 10%. Robust standard errors clustered at the market level are in parenthesis. All regressions include a constant and SP, case and market fixed effects. Observations are standardized provider-patient interactions. In column (11) the dependent variable is total number of medicines recommended to the patient (dispensed and/or prescribed).