Aller au contenu

Rare Diseases Pilot Project

Ce contenu n’est pas encore disponible dans votre langue.

In Belgium, more than 500,000 people are affected by one of over 6.100 known rare diseases, yet fewer than 2% are found in the national rare disease registry (CRRD).1 The low number of reported cases in the CRRD has several causes:

  • Collection and reporting by genetic centers: it was initially foreseen that cases of a rare disease are reported by the 8 accredited genetics centers in Belgium
    • However, ~30% of rare diseases are not of genetic origin
    • Not all patients with a rare genetic disease have had genetic consulting
    • Genetic tests may be performed in a different center compared to the clinical center visited by a patient
  • There is no financial support for reporting to CRRD
  • The number of mandatory parameters to be reported is large: for several parameters, the required information is difficult to find and may not be available in the patient file; if reported it may be partial and actually not reliable.
  • As most of the cases are te be reported retrospectively, the information on “consultation” is actually difficult to complete and might not even be relevant. Removing these items would simplify and facilitate the reporting.
  • The reporting form assumes that case reporting is done at the time of a specific genetic visit. However, most of the cases to be reported might be reported retrospectively. In addition, at Saint-Luc, the clinical centers with the IMR (Institut de Maladies Rares) are reporting the cases and not the genetic center.
  • Challenges in the systematic identification of rare disease cases over all medical specialties
    • technological support
    • effective data management

Today’s approach is centered on diagnoses in genetic centers, since about 80% of known rare diseases have a genetic origin. However, after a rare disase diagnosis, genetic centers struggle to capture all further relevant patient data for submission to the CRRD.

The SPECTRE-HD pilot project introduces an innovative workflow for the identification of individuals with a rare disease. It is based on “textual” diagnostic markers located within structured and unstructured sources of the hospital’s electronic patient records, spanning various medical specialties beyond genetics.

This approach enables earlier detection of potential rare disease cases. Furthermore, the workflow streamlines a potential case’s validation, starting with a data manager in the first line and, if necessary, escalating to a clinician in the second line, particularly when the patient’s rare disease has not yet been confirmed through routine clinical practice.

The resulting rare disease registry empowers the hospital to develop customized care pathways, establish new centers of expertise, and strengthen its eligibility for participation in European Reference Networks (ERNs) dedicated to specific rare diseases. A key objective of the pilot project is to harmonize rare disease registration across hospitals, aligning with national data collection efforts to enable efficient and streamlined data sharing when required. Notably, rather than proactively sharing extensive datasets, only core data points — such as the patient’s identity, diagnosis, and time of diagnosis — can be reported initially to the national registry. Additional data can then be collected as needed, tailored to address specific policy or research questions, ensuring both efficiency and privacy.

The proposed workflow can be expanded to hospitals beyond the 8 rare disease centers in Belgium.

There is still a need for a holistic approach with clear incentives for hospitals to make the needed people and resources available to ensure the systematic and structured reporting to different registers.

We leverage Orphadata to “collect” rare disease cases from the hospital EHR using the REDCap data management platform. This includes a flow to validate and code the diagnosis (if needed) and to manage the data effectively for both internal use and for sharing with the CRRD.

Registers — or systematic databases for specific diseases and conditions — are essential for creating high-quality, consistent, and reliable clinical data. By collecting, organizing, and analyzing patient information, registers serve as the backbone for evidence-based policy, quality improvement, funding decisions, clinical innovation, and patient empowerment. Registers can transform fragmented healthcare data into actionable insights.

Naturally, and quite correctly, access to many patient outcome registers is tightly restricted, as individual records may create identification and confidentiality issues. Even in cases where design features have “de-identified” the data, certain combinations of demographic or healthcare juxtapositions can create the ability for inadvertent or malicious re-identification. There may also be technical access overload concerns, in cases where the original design of the technological infrastructure supporting the register was based upon only allowing minimum access to accredited and vetted personnel.

Both privacy and technical access issues are real.

  • Rational and Data-Driven Decision-Making
    Current healthcare systems often lack robust, systematic outcomes data. Registers fill this gap by offering validated information on patient demographics, treatments, and results, thereby supporting more informed policy decisions.
  • Cost-Effectiveness
    An initial investment in registers pays off by reducing inefficiencies and misallocations in healthcare resources. As policies become more grounded in real-world evidence, the system becomes more efficient, ultimately offsetting the costs of data collection.

2. Targeted Improvement in Quality of Care

Section titled “2. Targeted Improvement in Quality of Care”
  • Benchmarking and Best Practices
    Registers facilitate comparisons of clinical outcomes and enable healthcare providers to benchmark themselves against peers. When outlier results are identified, targeted interventions can be designed to raise the overall standard of care.
  • Continuous Quality Monitoring
    Up-to-date dashboards with Key Performance Indicators (KPIs) ensure that hospitals and clinicians can track their progress over time, identifying trends and intervening when quality measures slip.
  • Data-Driven Budgeting
    With registers, policymakers gain a clearer picture of disease prevalence and treatment outcomes. This guides smarter investments in healthcare, ensuring public funds are directed where they can have the greatest impact.
  • Eliminating Redundancies
    By centralizing data, registers help avoid duplicative initiatives and overlapping services, making healthcare spending more transparent and accountable.
  • Personalized Treatment Pathways
    Registers allow clinicians to see what works best for specific patient populations, paving the way for more individualized and precision-based care.
  • Collaborative Research
    Comprehensive datasets enable multi-center studies and foster collaborations among researchers, leading to faster medical advancements and improved patient outcomes.
  • Informed Participation
    When patients see how their data contributes to a larger registry, they can become more engaged in their own care and appreciate the broader impact of sharing their health information.
  • Shared Decision-Making
    Registries often produce patient-friendly reports that explain different treatment options and likely outcomes, encouraging patients to actively participate in healthcare decisions.
  • Current Gaps in Knowledge
    Many healthcare systems operate in silos, resulting in fragmented and often low-quality real-world data. For example, in Belgium it is estimated that fewer than 50% of people with diabetes are accurately documented.
  • Building Comprehensive Profiles
    Disease registries consolidate scattered data sources—clinical, administrative, laboratory, and patient-reported—into cohesive, high-fidelity datasets that give a more complete picture of patient populations.

7. Multi-Stakeholder Engagement and Collaboration

Section titled “7. Multi-Stakeholder Engagement and Collaboration”
  • Bringing Everyone to the Table
    Improving data quality and utility requires concerted efforts from governments, payers, providers, patients, technology partners, and researchers.
  • Shared Objectives, Unified Impact
    With registers in place, stakeholders can align on common KPIs, identify gaps, and drive collective innovations that benefit entire patient communities.
  • Disease-Specific Plans
    Each condition—be it diabetes, cardiovascular disease, or a rare illness—warrants a dedicated registry with KPIs defined by patients, specialists, and other experts.
  • Real-Time Dashboards
    Modern registries support up-to-date dashboards and metrics, allowing healthcare providers to continuously refine treatment strategies and foster faster, data-driven policy updates.

Registers are more than just databases; they’re catalysts for systemic change. By shining a light on the true scope and details of diseases, they empower policymakers, clinicians, and patients alike to make informed decisions that enhance care quality, optimize resource allocation, and ultimately improve patient lives. Through multi-stakeholder collaboration and a commitment to high-quality data collection, we can transform today’s fragmented landscape into a well-integrated, evidence-based healthcare system — one condition at a time.

Registers provide high-quality data as the basis for:

  • evidence-based policy
    • we need more systematic and robust outcomes data in order to give more rationality and evidence to our healthcare system
    • the cost will be recuperated by having a more efficient and effective system
  • targeted improvement in the quality of care
  • more efficient allocation of public funds
  • clinical applications
  • patient empowerment

Real-world clinical data in hospitals is often fragmented and sometimes lacks quality. For example, currently we know less than 50% of all people with diabetes in Belgium. We can fix that. To make progress and fight the inefficiency due to fragmentation, we need to engage in multi-stakeholder dialogues and collaboration at every level. We need registries for every condition, because we need plans for every disease, with up-to-date dashboards of KPIs identified by patients, specialists and other experts. For example, a query in electronic patient records could identify patients likely to have diabetes but who remain undiagnosed; We could then invite them to visit their GP for an official diagnosis.

Since naturally we cannot implement registers for all conditions at once, we want to focus on a general framework to identify patients and collect data for local registers, while taking concrete action in the domain of rare diseases. This will allow us to make progress where there is certainly a great need today, and gives a chance to learn lessons that can be applied more broadly. We have found stakeholders in all 8 Functions who are prepared to commit.

Narrative detail in clinical notes offers valuable insights that structured data alone often misses. Rare or newly emerging conditions may not have a well-established ICD-10 code or might be entered incorrectly due to limited code options. Physicians often document these nuances in free text, flagging conditions for future refinement in structured coding systems.

Structured data is fundamentally limited by existing coding schemas, which may lag behind real-world clinical practice. Physicians frequently observe new or evolving conditions in their narrative notes — well before codes are assigned or even created. These “unofficial” observations could reveal emerging comorbidities or rare complications.

Using Clinical Notes for Early Identification of Rare Diseases

Section titled “Using Clinical Notes for Early Identification of Rare Diseases”

Rare diseases often defy quick categorization. Patients may present with vague or atypical symptoms that can lead to delayed or missed diagnoses—sometimes for years. While structured data (like ICD-10 codes) is essential for standardized reporting, it often falls short in uncovering the initial clues of a rare disease. Clinical notes, on the other hand, capture nuanced observations and evolving clinical suspicion. By leveraging advanced text-mining techniques—especially Natural Language Processing (NLP) and Large Language Models (LLMs)—health systems can flag early signs of rare conditions more effectively. Here’s why:

  1. Documenting the “Atypical”

    • Inconsistent or Evolving Symptoms: Rare diseases frequently manifest in ways that don’t fit neatly into an existing code or clinical guideline. Physicians’ free-text documentation may mention peculiar complaints, unusual test results, or off-pattern laboratory values.
    • Early Nuances: Before a physician can assign a code, there must be a working hypothesis. Narrative notes capture these inklings or doubts—such as “suspecting a connective tissue disorder” or “patient with atypical immune response.”
  2. Highlighting Hidden Clues

    • Patient and Family History: Rare conditions often have genetic or familial implications. Clinicians might note family background, similar symptoms among relatives, or previously unexplained health issues, details that are rarely coded.
    • Lifestyle & Environmental Factors: Some rare diseases have triggers or signs that clinicians note in passing—like specific dietary reactions or environmental exposures. NLP can turn these buried clues into actionable signals.
  3. Identifying Patterns Over Time

    • Retrospective Searches: A patient’s clinical path may be scattered across multiple visits and departments. Structured data can miss the thematic connections buried in free-text narratives over time.
    • Cumulative Evidence: NLP tools can aggregate and analyze unstructured notes from multiple encounters, revealing patterns—e.g., repeated complaints about the same unresolved symptom, or persistent laboratory anomalies.
  4. Reducing Diagnostic Odysseys

    • Speeding Up Recognition: The earlier a rare disease is identified, the more quickly targeted interventions or genetic counseling can begin. Tapping into physician notes may shave off months or even years from the diagnostic journey.
    • Guiding Specialist Referral: Clues surfaced through note analysis can prompt primary-care providers to consult specialists earlier, improving care coordination for these complex cases.
  5. Adaptive and Responsive

    • NLP & LLM Advancements: Modern language models are trained on vast amounts of text and can pick up on rare or emerging clinical terms—sometimes even before they’re officially recognized in coding systems.
    • Dynamic Updating: Unlike static code sets, language models can continuously learn from new text data, staying current with novel research findings or changes in medical nomenclature.
  6. Complementing Traditional Data Sources

    • Synergy with Structured Data: While lab results, imaging, and coded diagnoses remain crucial, they only tell part of the story. Clinical notes fill in the gaps by explaining why those codes were assigned or what observations led to certain tests.
    • Comprehensive View of the Patient: Integrating free-text narratives into predictive analytics provides a more holistic picture of patient health, improving the likelihood of catching rare diseases early.

Conclusion
Harnessing free-text clinical notes for rare-disease detection does more than just bolster existing data streams—it can be a lifeline for patients who might otherwise spend years in the healthcare system without definitive answers. By applying advanced NLP and LLM techniques to unstructured narrative data, clinicians and researchers can more quickly flag potential cases for further investigation, ultimately improving patient outcomes and reducing the “diagnostic odyssey” that so often accompanies rare conditions.

(from Poster)

Rare disease (RD) knowledge and expertise, especially based on real-life data, are scarce. This leads to a diagnostic odyssey, insufficient treatment options, and decreased quality of life. It is imperative for health care institutions to collect and share standardized RD data to stimulate RD research.

Hospitals need a platform for data discovery, preparation, and provisioning to facilitate RD identification and data collection. By extending the local platform to a shared platform, an ecosystem is created where patient data can be accessed and used with respect for privacy and security, while ensuring control over the lifecycle of the patient’s shared data.

SPECTRE-HD aims to enable health care providers to identify and diagnose RD patients with real-world data from the Electronic Health Record (EHR) and to promote data sharing with other stakeholders in a secure and privacy-preserving environment.

  1. Standardization of a set of relevant data elements for RD to enable harmonization of patient data
  2. Development of a digital ecosystem to enable safe access and use of patient data

Standardization of a set of relevant data elements by Representatives of the Belgian RD function hospitals and Sciensano, the public institute of health in Belgium.

  • Selection based on “Set of Common Data Elements for Rare Diseases Registration” from the European Union RD Platform, and the Central Rare Disease Registry (CRRD) in Belgium.
  • Consensus agreement through stakeholder meetings with representatives of the Belgian RD function hospitals and Sciensano.

Development of a digital ecosystem, using:

  • In-house developed applications like the electronic health record (EHR) system and a web application based on iKnow Natural Language Processing (NLP) technology
  • REDCap, a web-based clinical data management platform
  • The Orphanet database for nomenclature and classification of RD
  1. The set of data elements includes the “Set of Common Data Elements for Rare Disease Registration” and the referring health care provider, country of referral, type of service requested, and the base of diagnosis.

  2. UZ Brussel’s IT department built a local Data Integration and Preparation platform (DIP) and a shared, cloud-based Data Access and Processing platform (DAP) to enable the safe sharing of patient data with other institutions and organizations in a digital ecosystem. Possible RD patients are identified using existing structured data, iKnow NLP on unstructured clinical notes with use of the Orphanet database, and validated by clinicians before inclusion in the RD registry in REDCap. The DIP enables a hospital to facilitate data access requests from various stakeholders for secondary use of patient data. The DIP can be implemented in other hospitals and connected to the DAP, where all data can be accessed in a trusted research environment.

The SPECTRE-HD pilot project led by UZ Brussel’s IT department built a DIP and DAP that focuses on rare diseases. Efficacy and efficiency of this project are begin evaluated. In case of successful implementation, the next steps are to scale-up to other Belgian hospitals and to expand to other areas in healthcare.

Did you know that there are more than 6,100 known rare diseases? That large and diverse group struggles with shared challenges that deserve full social attention. Which ones? You can see the 5 separate key facts below.2

A rare disease never affects one person alone, but drags partners and entire families into lifelong uncertainties. After all, most rare diseases are incurable and all too often have an unpredictable and erratic course.

alt text

alt text

alt text

alt text

alt text

Milestone 2 – Pilot Study of Rare Diseases

Section titled “Milestone 2 – Pilot Study of Rare Diseases”

All data capabilities in scope of SPECTRE-HD are made available to be used independently of each other and for any number of use cases. We are talking about:

  • Supporting a data-driven flow to build a hospital registry of patient data
  • Using NLP to explore textual concepts in free-form clinical documents and to find patients
  • De-identification of free-form clinical documents
  • Using a hospital-local platform for integrating and preparing patient data (HDIP)
  • Using a central platform for accessing and processing patient data (HDAP)

The objective of the pilot study of rare diseases is to create a working proof-of-concept that connects all the data capabilities shared in SPECTRE-HD.

T2.1 Analysis and implementation plan for structured data and clinical documents

Section titled “T2.1 Analysis and implementation plan for structured data and clinical documents”

Timeline: September 2023 – December 2023

The single goal of this task is to create an inventory of all types of patient data in the hospital that is relevant for the SPECTRE-HD project, specifically the pilot project of rare diseases. For each data source, a plan is also made available for disclosing its data for secondary use in the hospital. This is especially important if patient data is managed in other systems than the hospital’s integrated EHR (for example, for ICD-10 registration). As input to current task 2.1, UZ Brussel will provide a template of the inventory/plan (mid-September).

The remainder of the description of Task 2.1 provides some background information.

Note: the actual implementation of the ETL scripts or pipelines for the relevant data is the subject of Task 2.6 (for structured data) and Task 3.3 (extension for unstructured data) that will both be realized between March and June 2024.

The REDCap platform (set up in Task 2.4) is used to collect structured and unstructured data that is available in the patient’s electronic health records, or in other hospital systems where health-related information is stored.

All data in the SPECTRE-HD project will be collected for two main purposes:

  • identifying patients with a (potentially) rare disease
  • building a hospital registry of patients with a rare disease

1. Identifying patients with a (potentially) rare disease

Section titled “1. Identifying patients with a (potentially) rare disease”

Certain types of data available in the hospital records can act as indicators or evidence to support the identification of patients with a rare disease. In the SPECTRE-HD project, existing data will be leveraged to signal new cases on the REDCap platform, where they can be confirmed by a data manager or clinician (who typically has a therapeutic relationship with the patient). A patient is identified when his or her data can be linked to one or more ORPHAcodes. ORPHAcodes are internationally recognized as the standard for coding rare diseases, and are maintained by ORPHANet.

When the user confirms a new case that has been presented, the decision on the specific rare disease is supported through a list of suggested ORPHAcodes that the user can pick from. The ORPHAcodes can be complemented with other data that help to provide evidence (e.g., the problem list). Note that the complete workflow is the subject of Tasks 2.5 and 2.6.

The ORPHAcodes will be derived from existing data in the patient records. In this SPECTRE-HD project milestone, we will first focus on structured data. In Milestone 3, we will leverage NLP technology to also consider the unstructured data that exists in abundance in our hospitals.

  1. The main source of “direct” evidence for a rare disease are the diagnoses that are already coded in the hospital using international coding or classification systems like ICD-10, SNOMED CT, OMOP, ICD-O (for oncology), OMIM (for genetics/genomics), DSM (for psychiatry), ATC or RxNorm. If not already coded in ORPHAcodes, a mapping is often maintained by the organization that is responsible for the coding system. While these mappings are valuable, they might not always provide a perfect one-to-one match due to the inherent differences and purposes of each coding system. Regular updates and verification against the latest datasets will be essential to maintain accuracy. If an API has been published, we will document how the mapping can be applied in the ETL pipeline, or otherwise provide a number of REDCap reference projects in support of the mapping.

  2. Other evidence is not coded but exists in another structured format, for example as items of a ‘quick pick’ list in an electronic input form used in clinical routine. It is important to also allow this type of information to be used, for example to help in the identification of new cases of a rare disease or to serve as supporting evidence in the validation process. Since this structured information is in most cases proprietary to the hospital, it is necessary to maintain a knowledge base that can be consulted in the ETL pipeline or in the target REDCap platform.

As mentioned, a hospital can also leverage its unstructured data for the identification of patients with a rare disease. Even with the current widespread use of integrated electronic health records, a lot of health-related information is not structured and is considered “hidden” in the narratives within a variety of clinical documents (like input forms, radiology reports, surgery protocols, hospital discharge or summary letters, etc.). Of course, this information is not actually hidden, as the narratives are in plain sight of caregivers who can gain access to the patient records. It is however difficult to further use this information for analytics, decision making and research. While ideally patient observations are immediately structured in the EHR system during clinical routine, for example by enriching the observations with SNOMED CT codes, it is safe to say there is some way to go before clinicians can do this consistently across the medical departments in the hospital. In the meantime, for secondary use we also need to deal with patient records in a retrospective manner.

At UZ Brussel, NLP technology is used to make the contents of clinical documentation available for secondary use. The main use case is the identification of patient cohorts for reporting and clinical research. A web application named Cohort Selection and Identification (CSI) is being built by UZ Brussel to make a rich set of functionalities available in an intuitive way for hospitals with different levels of data maturity. A major goal in SPECTRE-HD is to make it possible for users in other hospitals to engage with our CSI web application.

UZ Brussel’s NLP technology is based on iKnow, which is at the core a semantic algorithm based on different human language models. An iKnow language model derives entities like concepts and relations from the textual content of a document. A concept is nothing more than a meaningful group of words like “acute coronary syndrome” that occurs literally in the text of the document. The iKnow models that finds these entities are based on carefully hand-crafted rules by a team of linguists about 11 different human languages (English, German, Dutch, French, Spanish, Portuguese, Swedish, Russian, Ukrainian, Czech, and Japanese). Importantly, the language models are domain-independent, meaning that no prior knowledge about the domain of the documents (like healthcare) is needed. iKnow was made open source by Intersystems in 2020.3

To put the iKnow engine to practice and to search and explore the entities in clinical documents in an efficient way, a scalable database is needed to store and index the concepts and relations that are derived iKnow’s semantic engine. An API is indispensable to query the documents and their metadata. Both a database system and an API implementation for iKnow are available on the IRIS for Health platform. This platform, sold by Intersystems, is the same on which the well-known Epic EHR is built. Besides a scalable database, IRIS for Health offers solutions for processing and integrating health data. For the SPECTRE-HD project, a license has been budgeted for the UZA and ZOL hospitals who do not use the IRIS for Health platform today. This will give every hospital an equal opportunity to evaluate the usefulness of NLP for its patient data.

At the same time, we want to look at scenarios for a sustainable deployment of the NLP data capability past the SPECTRE-HD project horizon. This may involve negotiating licensing conditions with Intersystems, to alleviate the cost for an individual hospital, and perhaps creating an entirely new implementation of the NLP functionalities based on open source iKnow. A lot of this will depend on the evaluation of the collective experiences at the end of the SPECTRE-HD project.

With the support of modern NLP techniques, we will unlock the information in the clinical narratives to enrich our patient records with standard diagnosis codes. By leveraging unstructured data, in contrast with the prior focus on existing codes in the patient records, we can more proactively and more broadly identify patients with a (potentially) rare disease. We will expand our search to any type of patient contact in any medical department. Moreover, we can do this as soon as documents are produced by our caregivers.

This is how it works:

  1. The document ETL pipeline created by each hospital in Task 3.3 transforms a large retrospective batch of clinical documents from possibly multiple sources in the hospital’s EHR system. These are clinical notes, reports, letters, and input forms that may also exist in different formats (PDF, HTML, XML, Word, etc.). The documents are converted to plain text files and stored in a single document database in the IRIS for Health platform.

    • This is a one-time task that may take several days of processing, depending on how far back in time the hospital wants to make its documents ‘searchable’ by users (for example, all documents in the past 5 years, or 10 years).
  2. The iKnow semantic engine processes the complete historic batch of documents on the IRIS platform to generate an index of textual concepts (“iKnow concepts”).

    • This is a one-time task, where no effort from the hospital is needed since the iKnow semantic engine is part of the IRIS platform where both the documents and the index is stored.
  3. The document ETL pipeline from Task 3.3 is scheduled to run daily and transform and load the new notes, reports, letters, and input forms produced in the hospital’s patient records in the past 24 hours.

    • This is an automated task. Besides the scheduling, no additional effort from the hospital is needed.
  4. The iKnow semantic engine is scheduled to run on a daily basis and index textual concepts in any of the new notes, reports, letters and input forms that have been produced in the hospital’s patient records in the past 24 hours.

    • This is an automated task. No additional effort from the hospital is needed.
  5. A clinician uses the CSI web application to efficiently explore all textual iKnow concepts in the hospital’s documents, and to select those concepts that may be indicative of a particular rare disease. The set of iKnow concepts can be simply labeled with one or more ORPHAcodes for rare diseases.

    • Constructing a set of iKnow concepts is a one-time activity that needs to be performed for each rare disease of interest. Creating a relatively complete set of concepts will probably not take more than one hour for a clinician who is familiar with the domain of the rare disease (given that the clinician has attended a training session for the CSI web application).
    • “Post-production”, meaning after an iKnow set has been defined, another automated task can run to notify of any newly detected iKnow concept that matches an existing iKnow set. Consequently, the new concept can be added to the iKnow set by its owner. This helps to ensure that we do not miss new cases due to alternative textual descriptions of concepts related to a rare disease.
    • It is also possible to share iKnow sets between hospitals. Of course, the terminology in documents can be different and the coverage of the iKnow set may be incomplete when it is used in another hospital than where it was originally constructed. On the other hand, the automated task can notify of any newly detected “local” concept that matches an iKnow set that is reused from another hospital.
    • At any time, the clinician can validate a particular set of iKnow concepts by requesting the application to produce a list of all patients who have an occurrence of at least one of the concepts in their medical records. For each individual patient, the clinician can then proceed with the validation, visualizing the source of each clinical document in which any of the proposed iKnow concepts occurs. This is useful to learn more about the document context of a textual concept.
  6. A flow is scheduled to run daily to report new (potential) cases of a rare disease. This flow is the product of Task 3.7, which is about an NLP extension of the flow from Task 2.6. The NLP extension can be described as follows:

    • For each iKnow set that is annotated with a diagnosis code of a rare disease, the task requests the list of all patients for whom at least one of the concepts in the iKnow set is detected in a note, report, letter, or input form.
      • In a separately scheduled process, the list of patients who are related to an iKnow set is automatically recalculated each time the iKnow semantic engine has indexed the new documents from the past 24 hours (= step 4). This process is already available and requires no effort from the hospital. The calculated list of patients can simply be retrieved with an API request.
    • If a patient is found who was not previously related to the iKnow set, we have a new potential case for the rare disease.
    • The remainder of the flow is identical to the one for structured data (implemented in Task 2.6).

2. Building a registry of hospital patients with a rare disease

Section titled “2. Building a registry of hospital patients with a rare disease”

While the identification of patients with a rare disease is a strong capability in and by itself, it is not the end goal in our project. We want to accumulate knowledge on rare diseases, to better help these patients, and to foster research and innovation. We also want to standardize on a minimal set of data points between the Belgian hospitals, to share data on rare diseases in the context of public health. To this end, we will build a hospital registry to gather information on patients with rare diseases. While each hospital is free to decide if and which extra data is gathered, at the core is the standard template of “common data points” between hospitals. The design of this standard template happens in Task 2.2 in collaboration with Sciensano, whereas in Task 2.8 we will focus on options to make the registry interoperable with Sciensano’s Central Registry on Rare Diseases. Because the inventory in the current Task 2.1 depends on Task 2.2, these tasks will be executed in parallel.

This registry will be constructed in REDCap and will seamlessly integrate with the prior workflow for the identification of patients. After a data manager or clinician validates a new case of a rare disease, another flow will collect data from the hospital’s source systems and maps the data to the forms in the registry. Note that some data about the patient may not be (directly) available, in which case it can be manually completed in the REDCap input form. Ideally this will lead to initiatives in the hospital to enable a structured registration of the “missing” data in the EHR system.

T2.2 Co-creation of a standard template with Sciensano for Rare Diseases minimal dataset

Section titled “T2.2 Co-creation of a standard template with Sciensano for Rare Diseases minimal dataset”

This task is important to support the FOD/SPF strategy concerning data capabilities.

Once a patient suffering from a rare disease has been identified and validated in the hospital’s local REDCap platform, we want to maintain a registry that gathers any information that is relevant to the disease and that can be used for future activities supporting quality and research. To make the data available for public health purposes, a core set of data points can be distinguished. These data points make up a “minimal dataset” related to demographic information of the patient, the disease code, history of the disease (like the date of first symptoms and the date of diagnosis), the care path, information concerning research, etc.

The minimal dataset on rare diseases will be embodied in a template that is elaborated in cooperation with Sciensano, who is responsible for the Central Registry of Rare Diseases. For this template, we also need to align with the “Set of Common Elements for Rare Diseases” of the EU Platform for Rare Diseases (EU RD Platform) [1]. This is an instrument aimed at increasing interoperability of RD registries. It contains 16 data elements that are considered as essential for research and should be registered by each rare disease registry across Europe.

A platform for ‘Electronic Data Capture’ (EDC) such as REDCap offers a flexible way to manage such a template. The EDC platform makes the rare disease template and other data instruments in the registry directly available to end users, with fields that are automatically filled in from the hospital EHR system where possible. Other fields are manually filled in. With REDCap, it would even be possible

The template facilitates a uniformization of data capture in EDC systems across hospitals, even when different EHR systems are used. After data has been mapped from the EHR source databases to the standard template, it can be further exchanged in a standard way with Sciensano/HealthData. Ideally, if the communication can be managed in both ways, we adhere to the “only once” principle where a clinician does not need to provide the information more than once for a patient who has visited multiple hospitals.

T2.3 Adapt Rare Diseases backbone for deployment in other hospitals

Section titled “T2.3 Adapt Rare Diseases backbone for deployment in other hospitals”

TBD

T2.4 Make hospital REDCap platform available + setup REDCap project for Rare Diseases

Section titled “T2.4 Make hospital REDCap platform available + setup REDCap project for Rare Diseases”

TBD

T2.5 Workshops: operational flow of Rare Diseases pilot

Section titled “T2.5 Workshops: operational flow of Rare Diseases pilot”

TBD

T2.6 Implement data flows for Rare Diseases project, with data operations stack

Section titled “T2.6 Implement data flows for Rare Diseases project, with data operations stack”

TBD

Data will be gathered from the hospital’s relevant source systems using automated ETL pipelines that run on a regular basis (preferably daily) as well as on demand. In some cases, it is more convenient to extract from a data warehouse that is fed by the hospital’s primary systems.

Describe Data Operations Stack

T2.7 Test and evaluate Rare Diseases workflow

Section titled “T2.7 Test and evaluate Rare Diseases workflow”

TBD

T2.8 Test options for interoperability of hospital data on Rare Diseases with Sciensano/HealthData

Section titled “T2.8 Test options for interoperability of hospital data on Rare Diseases with Sciensano/HealthData”

TBD

First, to address any concerns about an additional workload for physicians: I confirm that in SPECTRE-HD we still have a focus on rare disease identification from existing patient data, validation and an optimized data collection flow (whether it be for CRRD or for the hospital’s own purposes).

In other words, we have no engagement in SPECTRE-HD to routinely complete the CRRD template and share data with Sciensano. We do enable a new flow for a harmonized collection in hospitals of more and better quality data on rare diseases, while keeping an open line with all stakeholders to ensure that this flow is optimized for sharing data downstream.

That being said, within the scope & budget, it is expected that each partner allocates sufficient physician (or equally qualified) time to evaluate the flow. This is important from the point of view of usability and user experience - again, no focus on routine data collection, just for the test cases.

In the meanwhile, the rare diseases flow has evolved substantially, thanks to feedback that we got from delegates of the RD Functions such as you and UZ Leuven. Some of the changes are:

  • a data manager/coordinator is the first line for the new potential rare disease cases that are reported.
  • a python API is made available to report new cases to the rare disease flow enabled in REDCap.
  • physicians can only see the patient records of cases that have been assigned to them for validation.
  • the REDCap registry is harmonized with the latest version of the CRRD template (that has a “minimal” set of mandatory items). This will lower the (technical) threshold for reporting new cases to Sciensano (beyond the scope of SPECTRE-HD).

To re-iterate on the above: when an attributed rare disease diagnosis gets a status of “definitive” or “provisional”, it’s up to the hospital to decide what other items for CRRD are registered. In the dashboard, we apply a simple color schema with a marker that turns from red to green when the minimally required CRRD items are present. We can go deeper into this in an upcoming demonstration & workshop.

It is also time for a new SPECTRE-HD meeting to discuss status and next steps for all milestones. Can you please indicate which of the following moments are suitable: https://doodle.com/meeting/participate/id/bmMzmvAb

Thank you, Yves

From: THONNARD JOELLE Sent: vrijdag 7 juni 2024 14:35 To: Verbiest, Annelies; Yves Thorrez Cc: Cras, Patrick_ext; Noëlla Pierlet Subject: RE: Data collection from Function RDs in CRRD

Dit emailbericht is afkomstig van buiten het UZ Brussel Dear all,

Indeed, I agree that it would be more easy to split the CRRD aspect and the tools to be developed.

We could in the context of the tools to be developed and the Minimal Viable Product for mock-up data exchange, focus on demographic data and rare disease diagnosis + date but not the other items included in the CRRD ? Would this be ok ?

Kind regards, Joëlle

Hi Yves,

Just to make sure: we will not engage ourselves in SPECTRE to complete a rare disease register based on the shared template, right?

I understood that in the scope of SPECTRE we would look for concepts that can be detected in an automated fashion. For the proposed template, I fully understand the relevance of the datapoints. But speaking from the clinical reality, collecting these outside the context of a labor-intensive register is currently not be possible, not even with NLP. I would not even be able to do it myself through chart review, because of the data missingness and my lack of expertise for many of these diseases.

Sorry if I misunderstood something here, it’s tricky to jump into an ongoing project and I don’t mean to interrupt! 😊

Gr! Annelies

Annelies Verbiest MD PhD Medisch oncoloog Tel + 32 3 821 55 70

Medische Oncologie UZA, Drie Eikenstraat 655, 2650 Edegem Tel +32 3 821 30 00 / www.uza.be Denk aan het milieu vooraleer je dit bericht of de bijlagen uitprint.

Dear Joëlle Dear all

Thank you for the very clear and elaborate feedback from Saint-Luc. I’ve added my comments to the document.

@Annelies: true but the time from first symptoms to date of diagnosis is such an important indicator in rare diseases so it should be included.

Best regards Heini

Thank you Joelle!

Not sure whether this mail was an “fyi” or question for feed-back, as I’m not directly involved in the discussion, but very briefly:

  • Diagnosis & pt demographics: yes
  • Time of first symptoms, date of diagnosis and method of diagnosis: this will require a lot of manual work and it will need to be done by domain experts (in every possible disease domain). Even when done thoroughly, there will still be a lot of missing data.

Kind regards Annelies

Annelies Verbiest MD PhD Medisch oncoloog Tel + 32 3 821 55 70

Medische Oncologie UZA, Drie Eikenstraat 655, 2650 Edegem Tel +32 3 821 30 00 / www.uza.be Denk aan het milieu vooraleer je dit bericht of de bijlagen uitprint.

Dear Katrien, dear all,

The current proposal is interesting as it is leading to a clarification of parameter definition and decreasing the number of mandatory parameters. In light of information shared during the Sciensano Registry day, some further changes could be relevant for facilitating a significant reporting increase. Indeed, during the Sciensano Registry day on 17th May, the status of the CRRD was presented, showing that up to now, it includes only around 22.000 cases, representing a few percent of the cases of rare diseases in Belgium. However, as discussed during the meeting, it is essential that all patients can be taken into account for benefiting of adequate patient cares.

I am sending a proposal in the attached document. In summary, data related to “consultation” might be removed. Ex in Saint-Luc, cases are reported retrospectively, not linked to a specific visit, by centers/IMR and not by genetic center. On the other hand, allowing to report, if available, several pairs of “diagnosis + date and method of diagnosis” could be interesting for research questions. Then these data might be linked with other databases as facilitated by the Health Data Agency (HDA, https://catalog.hda.belgium.be/) (examples shown by the KCE during the registry meeting).

This proposal is indeed open for discussion.

Kind regards,

Joëlle

Dear all

I am contacting you because you are all involved (or will be involved) in the data collection in the CRRD. As you might know, we already collected registrations for patients with a rare disease from the genetic centres from 2017 onwards. The last months we have been working towards the increase of the data collection by also allowing registration for RD patients from your Function RD. A first exercise to achieve this was to change the DCD (Data Collection Definition) to allow the data collection also from the Function e.g. by making fields that are too specific for the data collection from the genetic centres ‘non-mandatory’.

I have presented the DCD during the last working group meeting. You can find the technical sheet and the short summary of the DCD in attach of this mail (please note that we are at Sciensano still testing the DCD which might result in minor changes). A next step in the start-up of the data collection from the Functions is harmonization of registrations between Functions to make sure that the numbers of registered patients are comparable and that everyone is registering in the same manner.

For this I would like to fix a meeting with you to have a separate discussion with all genetic centres and Functions, as not everyone is working and organized in the same way. It is important to identify your differences and to know what is possible from your own genetic centre and Function.

In order to prepare for this meeting I would like to ask you to answer or prepare some aspects to be discussed during this meeting:

  • Will you still keep the data flow separate in a flow from the genetic centre and a flow from the Function RD?
  • Do you prefer to work with REDCap or Healthdata for sending of registrations to Sciensano?
  • Please provide me with an estimate of number of patients that are expected to be send to Sciensano on a yearly basis;
  • Are you thinking about identifying RD patients retrospectively and planning to send a registration for those patients to Sciensano?

Please contact me to fix a date to have this discussion. I will send this mail to my contact list in both the genetic centre and the Function. If I would have missed someone who is involved, please forward this mail.

We will also address the point on registration for patients with a predisposition.

Feel free to contact me if you have questions.

Best wishes Katrien

Katrien Van Der Kelen, PhD Scientist Epidemiology and public health Health services research T + 32 2 642 54 54
Skype: wiv-isp.katrien.vanderkelen


  1. During the Sciensano Registry day on 17th May 2024, the status of the CRRD was presented, showing that up to now, it includes only around 22.000 cases (“CRRD future - Registry Sciensano registration day CUSL feedback”)

  2. https://www.rarediseaseday.be/samenzeldzaam/

  3. https://github.com/intersystems/iknow/releases