CSI
Deze inhoud is nog niet vertaald.
Introduction
Section titled “Introduction”A pseudonymized corpus of natural language clinical documents is a powerful catalyst for text-processing innovations, such as Large Language Models (LLMs), creating new pathways for a smoother collaboration with stakeholders to bring cutting-edge research and innovation to patient care and healthcare systems.
With pseudonymization as a safeguard for protecting patient privacy, how can hospitals effectively take first steps towards creating more value from clinical narratives?
A prime example is UZ Brussel’s NLP-driven web application that empowers clinicians to identify patient cohorts by searching for natural language concepts. Not only does this solution complement — or even replace — the traditional, structured data queries performed by IT, it allows to go beyond ad hoc searches by grouping textual concepts under medical codes. This enables automated, reusable searches that enhance efficiency and precision in patient care and research.
Rationale
Section titled “Rationale”With the abundance of healthcare data (and high costs associated with gathering this data) it must surely be possible to achieve a higher quality of care. We can develop predictive models to know whether people are at risk before they develop a disease or condition. We can gain more knowledge of managing and/or slowing the pace of chronic conditions. To provide the right care for patients, caregivers need to stay up to date with new research in their medical field. Not all of these challenges can be solved by putting only structured data to work. For example, for the identification of drug abuse in the emergency department, structured data would only partially solve the problem, and can turn up a high number of false positives and misses for capturing early indications.
Converting unstructured healthcare data into a structured format is a challenge. Their complexity and heterogeneity make it difficult to fit them neatly into tables (converting to “tabular data”): traditional databases and data tables like Excel or CSV files. The presence of inconsistent and cryptic medical terminologies further complicates the conversion process. Moreover, clinical jargon, acronyms, misspellings, and abbreviations add to the challenges faced during the conversion.
Even with templates used in EHR, most of the physicians continue to dictate notes for their reports. In typical History and Physical Examination (H&P) notes, the History of Present Illness or HPI is typically a narrative. For a discharge summary the hospital course and follow up plan sections are also mostly a narrative. For radiology reports such as an echocardiogram, important values such as ejection fraction are usually buried in the free text. (Dirk Van Hyfte)
Reviewing these notes individually may be feasible if there are no time constraints. However, usually, the provider simply doesn’t have the time. And, clinical decision support rules won’t be able to be accessed either because the coded data is not available.
All these negatively affects the usefulness of the information. So clearly you can highlight that a we indeed need a break through technology here.
How can NLP help?
Section titled “How can NLP help?”Natural Language Processing (NLP) plays a critical role in complementing traditional queries that rely solely on structured and coded data.
- Capturing Nuanced Information
- Traditional queries on structured data rely on specific codes and fields, which may not fully reflect the intricacies of a patient’s condition.
- Free-text notes often contain details such as social history, lifestyle factors, or subtle symptoms not captured in discrete fields.
- NLP tools can extract this rich information, providing a more holistic view of the patient.
- Uncovering Hidden Insights
- Structured data can tell you how many patients have a particular diagnosis or which medications they are taking, but it may not reveal deeper context—such as how those medications are tolerated, or the severity of side effects as described by patients.
- NLP can analyze free-text comments, physician notes, and other unstructured sources to uncover trends and patterns that might otherwise remain hidden.
- Enhancing Data Completeness
- A significant portion of clinical documentation remains in free-text form, whether it’s progress notes, discharge summaries, or patient messages.
- By applying NLP, organizations can integrate and analyze these additional data streams, enhancing the completeness and usefulness of their data repositories.
- Supporting Advanced Analytics
- When combined with structured data, insights extracted through NLP enable more advanced analytics—such as predictive modeling or real-time alerts.
- For instance, NLP can flag language in clinical notes that indicates a patient is at higher risk of complications, thereby triggering targeted interventions.
- Facilitating Research and Quality Improvement
- Researchers benefit from accessing a richer dataset that goes beyond standard coded fields.
- Free-text data often includes detailed clinical observations, rationale for treatment decisions, and patient-reported outcomes.
- By unlocking this information using NLP, healthcare organizations can drive quality improvement initiatives and generate evidence-based insights more effectively.
In sum, NLP is indispensable for translating the wealth of unstructured text data in healthcare into actionable insights, thereby augmenting and enhancing the value of traditional, code-based queries.
UZ Brussel has been at the forefront of leveraging Natural Language Processing (NLP) to analyze free-form clinical documents.
Through an intuitive self-service web application called CSI, hospital users can efficiently access and explore the wealth of information contained within these documents.
This tool has become an invaluable complement to traditional queries on structured data, significantly enhancing the depth and breadth of analytical capabilities.
By exploring textual concepts, users can easily identify patient cohorts for clinical studies or trials.
By selecting and grouping - and optionally, coding - concepts, new patients can be automatically discovered and reported. Clinicans or data managers can easily screen for false positive results the web application.
Integration with other use cases is possible with the extensive REST API provided by CSI.
Manual and Automated Labeling
In SPECTRE-HD we use machine learning approaches (NLP, NER) to expedite large-scale labeling and then validate with smaller manual review sets. Clinicians and domain experts are engaged intially to create labels for or annotate text in clinical narratives, progress notes, and discharge summaries. With CSI there is no need to directly apply labels to individual documents. Instead, clinicians and domain experts identify, group and label key concepts (diagnoses, symptoms, social determinants of health) that have already been derived from these documents with NLP. These key concepts are indicative of a condition or disease (markers). After grouping key concepts, they can be tied to coded ontologies
Contextual Tagging Beyond simple categorical labels, capture contextual details. For instance, labeling a radiology report of an X-Ray not just by “pneumonia” but also by severity, location, or associated risk factors.
Apply Named Entity Recognition (NER), topic modeling, sentiment analysis, or summarization to extract structured information from clinical narratives, progress notes, and discharge summaries.