Los puntos clave no están disponibles para este artículo en este momento.
When I started clinical research, nearly 30 years ago, retrospective studies were considered unreliable. And this was a generally accurate assessment. Hundred-patient chart reviews — a typical retrospective study of the time — were mostly a waste of paper. Data were usually unreliable and highly biased, and sample sizes inadequate. Furthermore, such analyses typically included an explicit or implicit historical control which is inherently invalid. Over the last decade, electronic records have enormously increased the availability of clinical data. For example, about half of the academic medical centres in the USA already use electronic records and it is likely that virtually all healthcare facilitates will soon; a similar trend is apparent in the UK. (Converting clinical records into research-ready data is far more difficult and expensive than most imagine, but will perhaps become easier as the systems mature.) The revolutionary switch from paper to electronic health records has provided investigators with huge amounts of relatively reliable data. Even a single centre can now generate records for tens – or hundreds – of thousands of patients. National efforts have also produced a variety of high-quality data. Major peri-operative efforts in the USA include the National Surgical Quality Program (NSQIP), the Society of Thoracic Surgeons (STS) National Database, the Closed Claims Registry of settled malpractice cases, the Multicenter Perioperative Outcomes Group (MPOG) and the Anesthesia Quality Institute's National Anesthesia Clinical Outcomes Registry (NACOR). Most states make billing records available to qualified investigators, as does the Center for Medicare and Medicaid Services (CMS). Similar registries exist in other countries and others are in development. For example, the UK has large amounts of data from the National Health Service (NHS) that has already helped guide care and policy decisions 1. The UK has also led in development of operation-specific registries, including one for hip fractures, and recently established the anaesthesia-focused Health Services Research Centre (see www.niaa-hsrc.org.uk/). There have also been important European-wide efforts 2 Dense local registries and national databases range from nearly a million patients to many tens of millions: this is what I mean by 'Big Data'. Size matters because with sufficient patients it is possible to study rare diseases, accurately evaluate 'hard' outcomes such as mortality, and generate appropriate comparison groups for case-control and retrospective cohort studies. The quality and density of registries varies substantially. For example, very large national sources such as the CMS database often contain little more than basic demographic characteristics, hospital and provider identifiers, billing codes, duration of hospitalisation, and in-hospital mortality. Other national databases, such as the NSQIP and STS registries, contain more clinical data and outcome details. Generally, larger databases contain less detail, with the most dense information available from single-centre sources. For example, databases at the Cleveland Clinic essentially make the entire medical record available for research. Adequately powered randomised controlled trials (RCTs) – or meta-analyses of enough small trials 3 – remain the highest level of clinical evidence. But registry analyses provide distinct advantages in some respects. For example, RCTs usually include relatively strict enrolment criteria designed to reduce variability (and thus sample size); they similarly exclude patients less likely to benefit from treatment or at risk of treatment-induced complications. This restrictive approach enhances internal validity, but at the expense of generalisability. Randomised trials are also expensive and time-consuming; there are thus few of them. In contrast, registry analyses can be conducted quickly and at modest cost. A further difficulty with RCTs is that results of even the best age quickly as practice changes and new therapeutic options become available. The extent to which randomised data apply to current patients is thus sometimes unclear. Another limitation of RCTs is that they are prohibitively expensive or simply impossible for rare diseases (such as malignant hyperthermia) and rare outcomes (such as respiratory arrests on surgical wards). And of course it is impossible to randomise non-modifiable factors such as race, sex, obesity, smoking – all of which are of interest since they considerably influence prognosis. Registry data can be used in many ways. But there are four broad areas of particular interest: 1) case-control and retrospective cohort studies; 2) health services research; 3) quality assessment; and 4) modelling for and conduct of prospective studies. I will review each in turn. By far the most common use of Big Data is case-control and retrospective cohort studies. These approaches are especially valuable for evaluation of non-modifiable factors, rare conditions and rare outcomes – each of which is difficult or impossible to approach through RCTs. Case-control studies identify groups that are similar except in having or not having a disease of interest. The groups are then compared backwards in time on exposure. (In this context, epidemiologists use 'disease' for any outcome of interest including pain, functional recovery, cost or vital status. Similarly, 'exposure' includes race, genetics, socioeconomic status and medical treatments.) Case-control studies are always retrospective. Because absolute risk cannot be determined from case-control analyses, they are best reserved for rare diseases. The logic for cohort comparisons is opposite to that of case-control studies: groups that are similar except in having or not having an exposure of interest are compared forwards in time for development of disease. Cohort comparisons can be conducted prospectively. For example, blinded RCTs are a type of prospective cohort studies in which treatment is controlled by the investigators and outcome assessors are masked. But cohort studies can also be conducted retrospectively through analysis of registry data in which exposure and subsequent development of disease are quantified. Aside from chance (random variation), the major sources of error in clinical research are selection bias, confounding and measurement bias. Large, blinded RCTs are considered reliable because randomisation prevents selection bias and confounding, while blinding prevents performance and measurement bias. Fortunately, advanced statistical techniques now help limit bias and confounding in retrospective analyses. Among the major tools now routinely used are multivariable regression, generalised estimating equations and propensity matching. Perhaps because of these techniques, retrospective analyses and randomised data are usually consistent, although retrospective studies tend to overestimate treatment effects 4. Broadly speaking, health services research evaluates health systems, often with respect to cost and outcome 5. For example, it is of considerable policy interest to know if larger hospitals perform better than smaller ones, whether academic hospitals perform better than others, the effects of geography, and the effects of surgical or critical care volume on cost and outcome. In almost all cases, comparisons will be made on the basis of electronic institutional data. By far the most commonly available data in the USA are those reported to payers, including insurance companies and the CMS. Typically, they include basic demographic characteristics, codes for baseline conditions and procedures performed, and limited outcomes such as hospital stay and in-hospital mortality. The difficulty with comparing across institutions is that baseline patient risk, and the procedures performed, varies. Fair comparisons thus require accurate risk adjustment. Perhaps the most commonly used system is the Charlson Comorbidity Index 6, a formula developed from a limited number of patients that has not been extensively validated. More recently, the Risk Quantification Index was developed from the NSQIP database 7. It is based on Current Procedural Terminology (CPT) codes, uses transparent methodology, and provides good discrimination (see www.ClevelandClinic.org/RQI). There are also various commercial risk-adjustment methods; however, their details are proprietary and few are publicly validated. Possibly the best adjustment methods are the Risk Stratification Indices (RSI), developed from a national sample of about 35 million Medicare records 8 and subsequently validated using about 24 million records in the California Inpatient Database 9. Development was strictly by algorithm and thus entirely objective; separate models are available for inpatient mortality, 30-day and 1-year mortality, and hospital stay. Only International Classification of Disease (ICD-9) codes, which are available for virtually every patient, are needed. The indices provide excellent discrimination and, after calibration, are remarkably accurate even on samples as small as 5000 patients. Accuracy is improved by including present-on-admission coding which is already available in selected states and is incorporated into ICD-10 codes. The RSI Models are freely available, with the coefficients, statistical code and sample files all posted on the internet (see www.ClevelandClinic.org/RSI). There is increasing regulatory pressure for hospitals to measure and report various quality measures. Often certification and/or payments are linked to the reports. For example, the Cleveland Clinic reports more than 125 discrete quality measures. Similar reports are required for NHS hospitals in the UK. Furthermore, hospitals often evaluate additional performance measures internally for their own quality enhancement purposes. I have every expectation that quality reporting requirements will only increase as medicine transforms from a pay-for-procedures system towards pay-for-outcomes. That so many quality measures are used is an enormous change from just ten years ago and results from the ready availability of electronic data and increasing emphasis on the need to guide internal improvements. Once established, reports can be easily generated at timely intervals or even displayed in real time on computer 'dashboards'. Copious literature indicates that providing timely feedback enhances compliance 10, 11. Electronic monitoring can be combined with decision-support systems that identify non-compliant situations in real time, while corrections can yet be made. For example, temperature monitoring and management is a requirement for both the Surgical Care Improvement Project (linked to payment) and the Physicians Quality Improvement System (publicly reported). The Clinic's electronic anaesthesia records are monitored by a decision-support system that alerts clinicians when patients are not meeting the measures' requirements of core temperature ≥ 36°C and/or active over-body warming. That then allows caregivers to start (or start adequately documenting) forced-air warming. As a result, compliance with these measures now approaches 100%. Among the most difficult aspects of trial design is choosing optimal enrolment criteria. Tighter enrolment criteria reduces variability, and thus sample size, but simultaneously slows recruitment since fewer patients qualify. Less restrictive enrolment facilitates patient recruitment (and generalisability), but at the expense of variability. A further complication is that various types of patients contribute differently to outcomes. For example, sicker patients are less common than generally healthy ones, but may be more likely to experience an outcome of interest such as a postoperative myocardial infarction. How best to balance competing enrolment interests to optimise speed and cost has traditionally been considered an 'art' of trial design. But increasingly, Big Data allows investigators to model various inclusion criteria statistically and choose the best approach based on patient availability and – importantly – the extent to which various patients are likely to contribute outcomes. Such models are especially helpful when data can be obtained from the relevant institutions. Sample size estimation is another aspect of trial design that is a bit of an art. Sample size for dichotomous outcomes is mostly a function of baseline event rate and expected treatment effect (i.e. difference between control and experimental groups). Sample size for continuous outcomes is mostly determined by population variability and expected treatment effect. While statistical projections can be complicated, the real difficulty is in the assumptions used for the statistical modelling. Until recently, investigators would make assumptions about event rates, population variability and anticipated treatment effects from available literature and their experience. And they were often wrong, resulting in improperly powered studies – usually under-powered. Good modelling of retrospective data improves estimates of each critical assumption, thus enhancing the accuracy of such estimates. One of the major trends in clinical trials is towards larger sample sizes 12: large peri-operative trials now enrol thousands of patients. With so many patients, data management becomes challenging. But increasingly, much of the information needed for clinical trials is already collected electronically. Assuming collection is accurate and the data are available to investigators, much less needs to be recorded on traditional case-report forms. This approach reduces front-end cost because investigators need to record less, and it reduces back-end cost (and error) because data do not subsequently have to be manually entered into study computers. It is thus often unnecessary to record data manually that are routinely collected electronically. In some cases, all outcomes can be collected electronically, with patient involvement essentially ending after intervention. As an extension of electronic data capture, novel study designs can be used that facilitate relatively inexpensive acquisition of large amounts of data. One is real-time randomisation based on decision-support systems. For example, it is possible for a decision-support system to scan the electronic anaesthesia record, identify patients who qualify for a study (say by virtue of sufficiently low blood pressure), and then, in real time, to allocate patients randomly to an intervention. Typically, the outcomes of interest (say, duration of hypotension) would subsequently be obtained from the electronic record. Obviously, this sort of study can only be conducted when research ethics committees waive consent – but they often will when the intervention is unlikely to be harmful and may well prove beneficial. Real-time randomisation is being used to evaluate the potential benefit of clinician alerts in response to 'triple low events' (low mean arterial pressure, low Bispectral Index, and low minimum alveolar fraction 13. A second innovative study design is alternating intervention trials. These trials are conducted by implementing one intervention in a group of operating rooms for a defined period, and then switching to an alternative intervention for the subsequent period. They thus start as a typical before-and-after design that provide limited protection against bias and confounding. But in an alternating intervention trial, the two treatments are alternated many times. This approach limits bias that results from the Hawthorne effect and time-dependent changes. To the extent that treatments are easy to implement and outcomes can be obtained electronically, alternating intervention trials provide large amounts of inexpensive data. For example, this approach was used to evaluate the effects of isoflurane and sevoflurane on hospital stay 14. Big Data is here to stay – and provides a wonderful opportunity for physicians, epidemiologists and health policy experts to make data-driven decisions that will ultimately improve patient care. Registry analyses will not replace other kinds of clinical research, especially clinical trials. But Big Data will enhance design and conduct of trials, to say nothing of providing robust information about many conditions that cannot be modified experimentally or are simply to rare to evaluate in trials. Appropriate analysis of Big Data will also provide a rational basis for sound health policy decisions. DIS serves on the Board of the Anesthesia Quality Institute, but has no personal financial interest related to this editorial. No external funding declared.
Daniel I. Sessler (Thu,) studied this question.