Generative artificial intelligence (GenAI) is any computer system capable of generating text, images, or other types of content, often in response to a prompt or question entered through a chat interface. GenAI comprises large language models (LLMs) and other general-purpose foundation models powered mostly by generative pre-trained transformer (GPT) deep learning technology. Compared with traditional AI models using single data modalities for specific classification or prediction tasks, GenAI comprises task-agnostic, increasingly multimodal models that learn shared representations of different data types and, using suitable prompts, may perform never-before-seen tasks.1 GenAI tools (also termed solutions or applications) are compelling because, unlike traditional AI, they are conversant, interacting directly with humans and generating human-like responses to prompts. These tools, in the form of ChatGPT and other GenAI chatbots, have very quickly captured the interest of researchers, clinicians and industry. Anecdotally, certain GenAI tools, such as ambient AI scribes and assistants, are already being used in many practice areas.2, 3 In the UK, one in five general practitioners now routinely use GenAI for various tasks.4 At the time of submission, this rapid uptake was occurring with little guidance on what use cases (tasks or clinical indications) are most amenable to GenAI, how GenAI tools intended for clinical practice should be used, evaluated and governed, and how to safeguard reliability, safety, privacy, and consent. In addressing these issues, we undertook a narrative review of existing literature, and using this evidence, we propose a phased, risk-tiered approach to implementing GenAI tools, discuss risks and mitigations, and consider factors likely to influence adoption of GenAI by both clinicians and health services. Although GenAI encompasses both text and image generation, this review primarily focuses on text-based applications in clinical practice, with image-related applications limited to report generation rather than image generation. Box 1 contains a glossary of terms used when describing GenAI. We searched PubMed and Google Scholar for articles published between 1 January 2022 and 31 August 2024 using search terms "generative AI", "large language models", "clinical practice" or "health care". We focused on review articles and grouped them into key application domains to inform our implementation framework: clinical documentation (16), operational efficiency (20), patient safety (11), clinical decision making (42), and patient self-care (4). Seven reviews covering all these domains were also retrieved.5-11 From these reviews, we extracted references outlining the problem(s) being addressed and exemplars of implemented GenAI tools used to solve them. We noted considerable heterogeneity in study design and methodological rigour and relative paucity of real-world implementations across several domains. Despite the aforementioned limitations in current evidence, our review suggests that GenAI tools could be implemented over five phases (Box 2). These are sequenced according to increasing levels of patient risk, task complexity, and implementation effort, and decreasing levels of current technical maturity and evidence of safety and effectiveness. The phased approach affords careful introduction of GenAI, beginning with tools that primarily enhance administrative efficiency (lower patient risk), progressing to those directly influencing clinical decisions and patient self-care (higher patient risk and requiring regulatory approval). EMRs = electronic medical records. Automating clinical documentation: Doctors in clinics can spend up to 2 hours on documentation for each hour of direct clinician–patient interaction;12 hospital residents and nurses spend up to 25%13 and 60%14 respectively of shift time on documentation. Ambient GenAI tools capable of voice transcription and note generation during doctor–patient encounters can decrease documentation time up to 25% ("keyboard liberation"),15-17 and allow more attentiveness to patients. Similarly, scribes in nurse–patient encounters can double the time the nurse spends on direct patient care.18 Ambient GenAI tools can also generate a readily understood patient summary,19 potentially increasing satisfaction and adherence with care. Ambient scribes could also include real-time advice, such as highlighting missed items in history or overlooked investigation results.20 Synthesising patient information from medical records: When interviewing new patients in the clinic or on ward rounds, clinicians can spend up to a third of the encounter retrieving, reading and synthesising patient summaries from electronic medical records (EMRs) before patient contact.21 GenAI can generate easily interpreted summaries of pertinent history, investigation results and treatments more accurately than clinicians22 while reducing this familiarisation time by around 20%.23 Generating discharge summaries from EMRs: Writing discharge summaries is time consuming, error prone and often slow in reaching recipients,24 with suboptimal patient outcomes.25 GenAI can generate summaries more accurate than those of junior doctors in 90% of cases,26 are available at discharge,27 and lessen the time seniors spend in supervision for complex cases by a third.28 Optimising consent: The reading grades (school-grade level of reading skill required for understanding) of most consent forms exceed the population average (8th grade) and often lack procedure-specific information required for informed consent. Clinicians can use GenAI chatbots that, by inputting clinician-verified text, could provide more comprehensible, informative and empathetic versions that take less time to read.29 Automating routine administrative tasks: Scheduling clinic appointments, organising staff rosters, drafting minutes and policy documents, and coding patient records are all labour-intensive but potentially automatable tasks. As examples, GenAI could more quickly create safer and fairer rosters,30 expedite coding for faster remuneration,31 and improve operational decision making.32 Improving hospital capacity management: Overcrowded emergency departments, access block to inpatient beds, delayed discharges and avoidable readmissions are commonplace. GenAI-enabled patient triage and discharge planning33, 34 and command and control patient flow systems could assist clinicians and bed managers in optimising bed use.35 Improving workflows in image-based disciplines: Taking radiology as the most mature domain, heavy workloads stress radiologists and cause delays in issuing reports, which can compromise patient care.36 Tools that automate image interpretation and structured reporting37 can reduce total reporting time by a third,38 reducing radiologist burnout and shortening report turnaround times.39 GenAI could potentially optimise referral and reporting prioritisation, patient scheduling and preparedness, and scan protocoling.40 Similar benefits in prioritising, interpreting and reporting digital pathology slides could also be realised with GenAI.41 Facilitating gathering and trending of data: Care-related near misses or adverse events such as medication harm and delirium are currently ascertained retrospectively from medical records or incident reports, with significant lag times. Such data could be captured, quantified and trended in real-time using LLMs applied to EMRs, thus facilitating more timely recognition of unsafe situations warranting remedial intervention.42-44 Expediting analysis of data: Considerable time and effort are spent in gathering, analysing and reporting quality and safety measures, incident data, and undertaking root cause analyses, with often little impact on care.45, 46 GenAI could aggregate and analyse these data more efficiently,47 identify safety hazards and contributors more quickly, automate audit48 and survey analyses,49 and allow quality and safety staff to redirect resources to proactive safety improvement.50 Retrieving medical evidence to inform decision making: Current online literature search systems (eg, PubMed) take time to search and synthesise data, are limited to simple keyword queries, and often retrieve limited relevant, actionable reports.51 GenAI, particularly using retrieval augmented generation, can very quickly and iteratively, in response to serial prompts, screen available literature and synthesise high quality, actionable evidence with supporting references,52 although the ability of LLMs to assess risk of bias of clinical trials remains limited.53 Reducing diagnostic error: Diagnostic error accounts for 60–70% of all medical errors causing harm, mostly caused by cognitive biases in reasoning.54 Responding to clinician prompts, GenAI could suggest more accurate differential diagnoses or detect and reduce misdiagnosis,55 particularly for complex, undifferentiated general medical cases involving non-expert clinicians.56 Personalising therapies: The response of many patients to specific therapies for diagnosed and confirmed diseases remains unpredictable.57 Applying GenAI to EMRs and genomic databases could identify patient genotypes or phenotypes associated with favourable or unfavourable treatment responses, as seen in various oncological applications.58 More rigorous evaluation will be required of consumer-facing applications relying on the do-it-yourself proficiency of users who may lack medical expertise, especially as GenAI chatbots could give seemingly confident, personalised but inappropriate advice.59 Providing medical advice: GenAI symptom checkers can diagnose conditions better than laypeople using traditional online information sources, but remain inferior to vetting by clinicians, with triage decisions for acute conditions particularly problematic.60 However, GenAI chatbots fine-tuned on curated medical knowledge could reliably identify patients' needs and provide informed suggestions.61 Chatbots that can process and draft responses to messages and queries of patients with diagnosed conditions under the care of clinicians can also alleviate clinician burden and enhance patient engagement.62 Improving chronic disease self-management: The use of GenAI chatbots to manage chronic diseases seems well accepted by patients in supporting mental health, physical activity and behaviour change for selected conditions,63 but evidence of effects on patient outcomes is limited.64 Wearable devices integrated with GenAI can potentially detect adverse health states such as falls or clinical deterioration.65 Several risks to patient safety and quality of care require careful consideration.66-73 These relate to: reliability (errors, hallucinations); consistency (different responses to the same question); explainability (few rationales for responses); limited understanding of context; biased responses due to unrepresentative training data; misuse of prompts; potential privacy breaches; little auditability of tool processes and outputs; workflow disruptions and job displacement; depersonalised care; over-reliance of clinicians on GenAI with clinician de-skilling; limited clinician and patient acceptance; and costs and carbon footprint. However, risk mitigation strategies exist and will continue to evolve (Box 3). Although many of these risks are common to all forms of AI, certain risks, such as hallucinations, prompt misuse and the inability to be audited, are peculiar to GenAI. GenAI is also not yet capable of higher-order reasoning, contextual understanding, capturing sensory and nonverbal cues, or making moral or ethical judgements. Decision support LLMs may produce inconsistent advice to the same queries and be as prone to cognitive biases as humans.74 GenAI alters its behaviour in response to new data inputs or updating or recalibration of its operations, which may go unannounced. Importantly, in performing several different tasks, acceptable GenAI performance on one "benchmark" task does not translate to other, seemingly related tasks for which it was not trained.75 This challenges the generalisability of any single, point in time evaluation of an evolving model with a large potential task capability. Ensuring the quality of massive datasets used to train GenAI models is challenging compared with traditional AI models trained on smaller, targeted datasets. The behaviour of hugely complex LLMs with billions of parameters performing different tasks cannot be understood, despite knowing their technical architecture. Evaluation and regulation of GenAI tools with their limitless and changing arrays of inputs and outputs is hugely challenging. A single, fit-for-purpose pre-deployment assessment and approval of all GenAI tools, as software as a medical device (SaMD), may not suffice for tools that continue to learn and adapt. Currently the Therapeutic Goods Administration (TGA) regulates some but not all AI tools designed to support clinical decision making as SaMD, but exempts tools, such as GenAI scribes, which provide only documentation or administrative assistance. The TGA's remit for consumer-facing AI tools remains undefined. Current regulatory and accreditation processes,76 coupled with amendments in society-wide laws (eg, privacy, consumer and anti-discrimination laws) may be sufficient to cover many GenAI applications. Two regulatory approaches are possible: an application-centric approach, and a system-centric approach. In an application-centric approach, individual tools are evaluated according to task criticality and patient risk. For high risk diagnostic or treatment applications (phases 4 and 5), the tool may be frozen pre-deployment and evaluated in a standard pathway (versus a fast pathway) using pragmatic clinical trials (Box 4).77-79 If approved, the tool could later be locked down, re-opened, retrained (if needed), and re-evaluated for re-approval if any substantive change in function or deviation from benchmark tasks is seen. The US Food and Drug Administration calls for AI developers to provide an algorithm change protocol describing how modifications are generated and validated.80 Lower risk tools (phases 1 and 2) may pass through a fast pathway, requiring only observational studies or post-deployment verification studies for approval. A standardised, actionable, risk-based checklist for evaluating GenAI along multiple axes, including post-deployment monitoring of real-world performance and clinical impact, is needed81-83 as are similar checklists for identifying and resolving ethical concerns.84, 85 Importantly, any GenAI tool must undergo a standardised clinical validation process at the local level using local data, including tools with regulatory approval. Using open-source or open-weight tools hosted on local servers may be the best option for protecting privacy, but requires in-house data scientists and technical staff for model training and tool deployment. Clinical validation stage (by risk level): For lower risk applications (phases 1–3): For higher risk applications (phases 4 and 5): A complementary system-centric approach requires tool developers and deployers (ie, large-scale health services) to wrap a quality assurance framework86 around their GenAI activities, comprising both risk mitigation (Box 3) and life cycle monitoring and evaluation. This framework may include statistical process control analyses that define acceptable bounds around tool accuracy or analyses of downstream effects on proximal clinical outcomes (eg, adverse events, mortality).87 More proxy measures of tool use, such as tracking the number of human-initiated corrections to LLM-created documents, could also be used.88 Developers and deployers might be accredited by an appointed authority to use GenAI tools depending on how well they measure, report and satisfy these parameters. Health services may need to establish dedicated, multidisciplinary clinical AI units to perform these tasks and provide the necessary human expertise and digital infrastructure.89 Such units may also specialise in validating and piloting specific applications before deployment in other similar or affiliated services, given the limited capacity of some services to undertake these tasks for every GenAI tool they may want to deploy.90 A balance is therefore needed between bespoke and more centralised evaluations, with the latter preferred for widely used, high value, high risk or high impact solutions. Because of its human-like interactivity, GenAI is rapidly gaining acceptance by frontline clinicians for certain tasks (eg, ambient scribes), bringing a cultural shift in how medicine is practised and providing more value over time.91 Clinicians will likely adopt GenAI for common tasks where it has demonstrated acceptable accuracy and safety, is easy to use, aligns with clinical workflows, and enhances clinician–patient interactions.92 Clinician trust will rely on clearly articulated use cases, well defined risk-based clinical testing processes and evidence generation, and ongoing monitoring of performance linked to original indications.93 Consumer trust will centre on tool accuracy, transparency around GenAI use in their care, and privacy assurances.94 Meaningful co-design with diverse consumer groups can help identify concerns and build appropriate safeguards into GenAI implementation.95 All users of GenAI, both health professionals and patients, will need to be well versed, through education and training programs, in its limitations, know how to use it responsibly, consistently apply human judgement to its outputs, and undertake appropriate consent procedures in using AI in care delivery. Every GenAI tool should come with a fact sheet or model card providing information, as necessary, on its function and context, training datasets, performance metrics, bias evaluation, safety assessment, user testing, technical architecture, prompt engineering, and conditions of appropriate use.96 Our narrative review identified several organisational or system factors likely to influence GenAI uptake, with some common to all forms of AI. First is the need for interdisciplinary collaborations involving researchers, data scientists, ethicists, vendors, clinicians and consumers in co-designing and co-evaluating GenAI tools and ensuring they are fit for purpose. Second, health services must decide whether to adopt off-the-shelf GenAI open-source or proprietary tools, with local calibration as required, or develop, or co-develop with a vendor, tools in-house. Whether to integrate tools with EMRs through application programming interfaces or embed them within reconfigured EMRs is another issue for EMR vendors to decide. Guidance that assists health services to assess the suitability of GenAI tools before committing to development and/or deployment is required.97 The financial and environmental impacts of software, hardware and staffing also need consideration.98 Lower cost scalability may be achieved using vertically integrated state and territory eHealth units able to collect data from multi-site EMRs and using it to train and test their own or third-party tools, which, if successful, are then provided to all participating services for local calibration. Third, it is crucial for industry, regulators, and health services to enhance access for developers to context-specific patient data from EMRs and other sources for GenAI training, while controlling data misuse. Health data are currently siloed, often lack data standards, and access is reliant on multiple data custodians using different site-specific access rules. Harmonising access processes and establishing FHIR-enabled (Fast Healthcare Interoperability Resources) interoperable data exchange using common data formats are essential. Fourth, introducing GenAI into clinical practice should be guided by best-practice implementation frameworks that optimise human–computer interfaces and clinician/consumer acceptance.99, 100 Fifth, clinicians' legal liability for a poor patient outcome following their justifiable acceptance or rejection of GenAI advice must be defined and managed using strategies derived from emerging legal opinions and early experiences in implementing health care AI (Box 5).101-105 This narrative review has highlighted potential gains from adopting GenAI in clinical practice. While our cited evidence may be criticised for selection bias, recent reviews published over the last two years (during which LLMs such as ChatGPT became available) were consulted, and our list of use cases is not intended to be exhaustive. As GenAI is rapidly evolving and there is a time lag between publication of original work and subsequent incorporation into review articles, we concede that some relevant primary articles may have been missed. We recognise the limitations of current GenAI106 and the urgent need for more real-world research to grow the evidence base on the efficiency, quality and safety of GenAI-assisted care, and identify the tasks and contexts for which this new and rapidly evolving technology is best suited. The advantage of GenAI is its flexibility across multiple tasks and its conversant, natural language interface, rather than superior performance on every task. Some tasks will be better served by fine-tuned supervised machine learning models rather than LLMs.107 Governance and technical standards will be required coupled with rigorous evaluation frameworks that allow users to respond quickly to unanticipated consequences and hazards. We propose a phased, risk-tiered implementation of GenAI tools into health care coupled with risk mitigation strategies. As a human invention, GenAI will never be perfect, but judicious selection and cautious introduction may considerably improve current care.108 Open access publishing facilitated by The University of Queensland, as part of the Wiley - The University of Queensland agreement via the Council of Australian University Librarians. No relevant disclosures. Not commissioned; externally peer reviewed. Scott IA: Conceptualization, data curation, writing – original draft. Reddy S: Data curation, writing – original draft, writing – review and editing. Kelly T: Writing – review and editing. Miller T: Writing – review and editing. Van der Vegt A: Writing – review and editing.
Scott et al. (Mon,) studied this question.