Evaluation (Introduction to Medical Informatics) (http://www.cpmc.columbia.edu/edu/textbook) LAST REVIEWED: 25 November 1997 evaluation of information systems in health care especially decision support systems evaluation is difficult, costly why evaluate like other interventions: Is it safe and effective? understanding the basic principles of systems learn to design better systems (formative evaluation vs. summative) reduce liability measure benefits to justify the cost learn where and when a system should be used only 10% of KBs have been evaluated formally WHAT TO MEASURE 1. structure look at pieces of the system and how put together user interface vocabulary data schema knowledge base accuracy, completeness, consistency, performance attempt to get reference standards (eg for ECGs) inference mechnisms text of advice or explanations 2. function how the system works need unbiased test data sufficient number of cases (5 * #Dx * #findings) do not test on training set not avoid cases system is known to fail on need "gold standard": measures of accuracy = important right answer to compare to usually does not exist so, often use "silver standard" what to do when there is no gold standard compare to expert opinion majority vote Delphi method: async structured survey-based interaction peer review: multiple experts do written eval of output independent and blinded (to sys vs human expert) var of peer review = round robin: each expert reviews only some of the cases used to generate output Turing Test: reviewer of ES output is blinded to whether a machine or human expert generated the output 3. process how providers work number of tests or procedures ordered time to do tasks ... infer outcomes from process easy to measure cheap 4. outcomes how it affects patients survival quality of life (QALY) length of stay morbidity, esp re prevention of disease clinical outcome = most important measure but also is most costly and most difficult reason: systems act on patients indirectly there are many different mechanisms for impact therefore, diff to control confounding vars effects are small, but important outcomes like death are rare efficacy = would it work if applied correctly vs. effectiveness = does it work as normally applied 5. cost charges vs actual costs MECHANISM OF IMPACT (Wolff, MCV 1993) guidelines have effect through knowledge, attitude, behavior, and clinical outcome can measure changes at each steop can measure process (know/att/behavior) vs clinical outcome TYPES randomized controlled clinical trial the gold standard of studies enter study, then divide patients into two groups randomize to avoid bias (may have wrong population, but not biased result) prospective = design study, then collect data (more rigorous) vs. retrospective = collect data, then design study (cheaper) expensive Physician inpatient order writing on microcomputer workstations [Tierney; Regenstrief] subjects: 68 teams (6 at a time) of MDs for 5219 patients intervention: microcomputer-based order entry with critiques measures: inpatient charges, time-motion study results: 12.7% lower charges with PC (p=.02) shorter length of stay (p=.11) 33 minutes logner writing orders per 10 hour shift (p<.0001) other prospective studies controls: historical (match exp grp to sim past grp), parallel (same time), crossover (group serves as its own control) retrospective study on existing databases cheaper Discordance of databases designed for claims payment versus clinical information systems [Jollis] compare two databases of information on the same patients diagnoses: diabetes, unstable angina, _ kappa (chance-corrected agreement): aggreement ranged from 0.83 to 0.09 (>0.4 is good) therefore, cannot use claims data for clinical work surveys get user feedback set of questions eg, survey of response to reminder critical incident technique used to assess reasons for success or failure of a process involving people based upon narrative answers to open-ended questions categorizing the answers produces a framework from which one can better grasp the issues involved 1. look at answers one by one 2. lump like answers together, and give group a name 3. create super-groups, and end up with a hierarchy 4. look at changes to groups to assess reliability eg, User comments on a clinical event monitor [see overhead] meta-analysis pull together many studies can be faced with two studies with non-overlapping results (what does statistically significant mean?-not chance in study group) 1. generate set of criteria for inclusion into study 2. perform literature search; ask experts 3. analyze the quality of individual studies 4. measure inter-study consistency 5. summarize the results Effects of computer-based clinical decision support systems on clinical performance and patient outcome [Johnston] choose 28 of 793 papers computer dosing 3/4 improved clinician performance automated diagnosis 1/5 _ preventive reminders 4/6 _ quality assurance 7/9 _ of those that looked at outcomes 3/10 positive routine use may be the ultimate test EVALUATION PROCESS 1. Wyatt: 3-stage process -- prototype: define problem, perform RA -- lab -- field testing 2. Lab phase: 3 aspects of user perspective -- structure: Is the system needed? -- process: Easy to use? -- outcome: conclusions and explanations accurate? 3. Lab phase: 3 aspects of experimental perspective -- structure: good questions? (KR, knowledge source) -- process: reason appropriately? -- outcome: accuracy of judgment (incl outlier data) 4. Field phase: items to measure -- structrure: survey attitudes; data completeness -- process: consultation func; tx suggested; tests ordered -- outcome: survival; morbidity; resource utilization ISSUES biases: does population under study match population of interest (want to control for confounding vars) recruitment - how are subjects selected -- prejudice by users for or against system (esp when sys use requires data entry) may bias eval -- one solution: trial run of system Hawthorne - people do better when studied -- subjects pay more attention during the study checklist effect - encourage better organization -- solely through more structured and complete data coll "feedback effect": audit fnc can improve behavior by highlighting successes and failures -- quantify with a feedback only group carry-over: education by expert system -- one solution: increase sample size (dept -> med ctr) placebo effect: bias patients with extra attention intention to treat (eg, use of computer program) system change: changing system during eval can bias results -- solution: "freeze" version for testing secular trend: events outside of system have affect -- example: sys to dec LOS when LOS is declining everywhere anyway blinding reduce human bias: patient, provider, analyst in drug trial, can use placebo in systems, usually know whether you have received advice -- therefore difficult to do blinding granularity of analysis patient provider clinic or floor hospital ethical is it proper to withold alerts? related reading: Wyatt J, Spiegelhalter D. Evaluating medical expert systems: what to test, and how? In: Talmon JL, Fox J, editors. Knowledge based systems in medicine: methods, applications and evaluation. New York: Springer-Verlag, 1991: 274-90.