Paul-Ehrlich-Institut

Information on the Use of Cookies

In order to operate and optimise our website, we would like to collect and analyse statistical information completely anonymously. Will you accept the temporary use of statistics cookies?

You can revoke your consent at any time in our privacy policy.

OK

A Feasibility Study for the Risk Evaluation of COVID-19 Vaccines (RiCO) at the Population Level in Germany – Utilisation of Various Secondary Data Bodies for Pharmacovigilance and Further Research

Carrying out comprehensive research into the safety and mode of action of COVID-19 vaccines in Germany will require overcoming several data and methodological hurdles. The data linkage and the fundamental evaluability of the required health data will be tested as part of a feasibility study by the University Hospital Cologne, Ruhr University Bochum and the Paul-Ehrlich-Institut. This article describes the methodological approach and data flow utilised for the study as well as the different ways in which this data can be used.

COVID-19-Statistic and Stethocope (Source: ronstik/Shutterstock) Source: ronstik/Shutterstock

Background

The World Health Organization (WHO) characterised the COVID-19 outbreak as a pandemic on 11 March 2020. COVID-19 is caused by an infection with the SARS-CoV-2 virus, which originated in China and from there spread globally very quickly. The clinical picture covers a wide range of symptoms and conditions. The most commonly observed symptoms include cough, fever, runny nose, and loss of smell and taste. The disease progression varies greatly in symptoms and severity; infections can range from symptomless up to severe pneumonia followed by respiratory failure and death. COVID-19 can manifest itself in a variety of ways that affect other organ systems besides the lungs. Which systems are affected depends on factors such as the density of the ACE-2 receptors in the tissues that allow the virus to enter the cell. In addition to direct cytopathic (cell-altering) effects, excessive immune reactions and vascular dysfunction as a result of hypercoagulability (increased blood clotting) are observed.

COVID-19 vaccine candidates were developed under extreme time pressure all over the world in 2020 and tested in preclinical and clinical studies. This period of development also saw new approaches taken with adenovirus-based and mRNA vaccines.

All authorised COVID-19 vaccines have shown high efficacy against SARS-CoV-2 infections in phase I–III clinical trials. Due to the limitations of clinical trials (limited number of samples, very homogeneous study population due to strict inclusion and exclusion criteria), the safety of the novel vaccines also needed to be monitored particularly intensively after authorisation. In parallel with the vaccination campaigns, pharmacoepidemiological studies were therefore carried out in many countries, some at the population level, to investigate the safety of the various vaccines.

Vaccination Campaign in Germany

The COVID-19 vaccination campaign in Germany, which began on 27 December, 2020, was designed in such a way that the majority of vaccinations were initially carried out in the newly established vaccination centres. Vaccinations in senior care and nursing homes (by mobile vaccination teams), hospitals, private companies, public health offices, and pharmacies followed. All vaccinations administered were reported electronically to the Robert Koch-Institut's (RKI) Digital Vaccine Monitoring system (Digitales Impfquotenmonitoring, DIM), whereby any personal data was pseudonymised via an algorithm specially developed by the Federal Printing Office (Bundesdruckerei). Registered doctors began administering COVID-19 vaccinations in April 2021. Case numbers were reported in a timely manner by the vaccinating practices to the National Association of Statutory Health Insurance Physicians (Kassenärztliche Bundesvereinigung, KBV) and on a staggered basis to the Association of Statutory Health Insurance Physicians (ASHIP, German: Kassenärztliche Vereinigung, KV) responsible for the federal state where the practice was located. Case numbers were not forwarded to health insurance providers as done for other vaccinations. The ASHIP data is currently being pseudonymised by the Bundesdruckerei (using the same algorithm as the DIM) as part of the RKI's vaccine surveillance and can be used in combination with the DIM data by the RKI and the Paul-Ehrlich-Institut. Since April 2023, the administration of COVID-19 vaccinations is considered standard medical care and is billed via the statutory health insurance providers, see the Association of Statutory Health Insurance Physicians Westfalen-Lippe (2024) as an example.

All steps in the data processing chain related to the recording and reporting of vaccinations in the first few years of the pandemic had to be redesigned and implemented for the vaccination campaign. These steps included the entire data flow from the various reporting bodies, pseudonymisation via the Bundesdruckerei, and the provision of the data to the RKI and the Paul-Ehrlich-Institut. The algorithm used by the Bundesdruckerei for pseudonymisation was also newly developed.

These extensive new developments and the structured distribution of the various data required for pharmacovigilance at the population level have made monitoring of the new COVID-19 vaccines very difficult in Germany in comparison to other countries. This is due to the fact that, in addition to the vaccination data itself, data must be available for mapping vaccine-related outcomes (especially adverse events). Mapping this kind of data in the German system is most likely to be achieved by using the routine data of the almost 100 statutory and about 40 private health insurance providers, in which relevant hospital stays, emergency care, etc., are included. Another challenge is the lack of a unique identifier for all residents, such as the social security number used in Europe's Nordic countries. The lifelong individual health insurance number acts as a unique identifier in Germany for approximately 90 percent of health insurance policyholders that have statutory health insurance.

Population-based Pharmacoepidemiological Study

In 2020, a population-based pharmacoepidemiological study to identify potential risks of COVID-19 vaccines was initiated by the Paul-Ehrlich-Institut (PEI) and the PMV research group at the University Hospital Cologne, in cooperation with the Department of Medical Informatics, Biometry and Epidemiology at the Ruhr University Bochum and the Robert Koch-Institut (RKI). The study uses vaccination data (DIM, vaccinations by registered doctors) and routine statutory health insurance data. The individual data of vaccinated and unvaccinated persons are to be analysed within the framework of this secondary data-based observational study. In addition to investigating deaths at varying intervals from vaccination, the study's focus is on investigating medical conditions or adverse events for which there were and are risk indications either in the phases I–III clinical trials or in the course of the national and international vaccination campaign(s). Long-term courses of possible vaccine side effects/complications are also to be investigated, along with the safety of the vaccines in groups of people who were not included in clinical trials before authorisation or who were underrepresented in these studies, such as patients with autoimmune diseases.

It became clear in planning the implementation of this study that the challenges outlined above regarding harnessing data could have a potentially large impact on the quality of the results. These issues involve both the practical use of the data flow as well as the quality and reliability of the identifiers used for linking the data.

Therefore, a feasibility study will be carried out to assess the quality expected from a potential data linkage of the required data. The approach will be described later in this text.

RiCO Feasibility Study Objectives

The objectives of the feasibility study are to investigate the feasibility of the data flow and to determine various quality indicators for the joint use of COVID-19 vaccination data and routine statutory health insurance data for research into the safety of COVID-19 vaccines and into other issues, such as Long COVID.

The feasibility study will implement a data flow that has already been developed to combine and link DIM vaccination data, vaccination data from the outpatient sector (both sets taken from the first pandemic years before April 2023) and routine statutory health insurance data. The technical feasibility of the data linkage will first be tested using a reduced study population compared to the general population. Furthermore, the quality of the available data will be described and an estimate will be made of the proportion of different types of errors in the data linkage (due to the pseudonymisation methods used). Finally, recommendations for various data linkage procedures will also be developed based on the planned analyses.

Methodology

The study is based on two core methodological components, which are described in more detail below. One is the previously mentioned data flow, which is required to make the data usable, and the other is the statistical procedure for analysing the quality of the data and data linkage.

RiCO Data Flow

As described at the beginning of the article, three data sets are required for a pharmacovigilance study on the COVID-19 vaccinations. Two of those data sets (referred to as data set 2a and data set 2b in the figure on page 31) include data on vaccination events from the first pandemic years. This data comes from the initial vaccination campaign in vaccination centres, care facilities, etc., and was collected via digital vaccination monitoring (data set 2a). It also originates from the initial vaccinations in the outpatient sector before remuneration via the statutory health insurance providers under the fifth book of the Social Code and was collected via the Associations of Statutory Health Insurance Physicians (data set 2b). Both data sets include information on the time of vaccination, the active substance used, the vaccination series (how many vaccinations), the reason for vaccination and other information, such as the postal code of the place of residence, as well as several pseudonymised identifiers. In addition, routine data from the statutory health insurance providers (data set 1) is required in order to operationalise both the observed pharmacovigilance endpoints (such as hospitalisations) and relevant influencing factors (such as morbidity burden, medication).

RiCO data flow – simplified representation (Source: Paul-Ehrlich-Institut) Figure: RiCO data flow – simplified representation Source: Paul-Ehrlich-Institut

The figure shows a simplified version of the data flow that is required to combine this data. The decisive role here is played by the different identifiers (name, date of birth, insurance number) in the data records, which make it possible to merge the data at the level of individual insured persons in the first place. These identifiers underwent a multi-stage pseudonymisation process to maintain data security.

The starting point and weakest link in the chain are the identifiers in the DIM vaccination data (data set 2a). These identifiers are based on the first and last name and date of birth of the vaccinated person, as recorded at the vaccination centre. They could be based on handwritten lists written by centre employees after verbal communication of the information by the person to be vaccinated, especially for vaccinations administered in the first weeks and months of the vaccination campaign.

Other methods include the electronic recording and reading of the health insurance card. The method of recording is not evident in the data generated. A pseudonym was generated from these three identifiers via concatenation and hashing. This process was carried out four times: first on the basis of the original information, then the writing style was normalised, and finally two phonetic versions were prepared with the aim of intercepting errors in the initial information. The resulting four pseudonyms were transmitted to the Bundesdruckerei and hashed again using a salt known only to the Bundesdruckerei (as a key) and forwarded as pseudonyms to the Robert Koch-Institut/Paul-Ehrlich-Institut.

The same identifiers and pseudonyms are used in the ASHIP vaccination data. According to the available information, this data included recordings of the first name, last name and date of birth – as is customary in the statutory health insurance system – predominantly carried out by swiping the health insurance card and, in individual cases, via the replacement procedure, which requires manual entry of personal data.

It can thus be assumed that the rate of typographical errors is significantly reduced compared to handwriting. In the ASHIP system, a fifth pseudonym (also as a hash value) is formed on the basis of the health insurance number (Krankenversichertennummer, KVNR). Since the KVNR is a unique identifier, the pseudonym formed from it is also to be regarded as unique.

All five identifiers or pseudonyms described can also be formed from the routine statutory health insurance data by passing them through the Bundesdruckerei process.

The vaccination data (2a and 2b) were first forwarded by the Bundesdruckerei to the Robert Koch-Institut and then by the latter to the Paul-Ehrlich-Institut. This corresponds to the procedure defined in the context of the vaccination campaign and the legal basis for vaccination monitoring. The routine data (data set 1) used for the purposes of the RiCO study will be forwarded by the Bundesdruckerei directly to the Paul-Ehrlich-Institut. All three records will be merged there and can be evaluated in a secure server environment.

Analysis of Data and Linkage Quality

Since the linkage of the vaccination data and the data from the health insurance companies cannot be carried out exclusively via one unique identifier per person, it is possible that errors may occur during the linkage. In this study, two types of such linkage errors are possible: incorrect matches and missing matches, whereby in the case of missing matches, a distinction must still be made between completely missing matches and partially missing matches.

Incorrect matches occur when the information from the health insurance data is assigned to the wrong person in the vaccination data due to an identical pseudonym. This can only happen if the information used to form the pseudonym is identical for both persons or if, due to one or more input errors, the information in the data appears to be identical.

Missing matches, on the other hand, occur if a person has received at least one COVID-19 vaccination, but it was not possible to assign this information to the person's health insurance data. This can happen, for example, if a person's name or date of birth was not correctly recorded in at least one of the two vaccination records, or if the person changed their name between vaccination and the analysis of the statutory health insurance data. A completely missing match means that the information on all the COVID-19 vaccinations that a person actually received is missing. If matches are only partially missing, one potential scenario is that the information for the second vaccination was linked, but the information for the first vaccination is missing.

It can be assumed that missing matches occur significantly more often than incorrect matches. However, the exact extent of both problems is unclear. Therefore, one of the main goals of the feasibility study is to estimate how often such linkage errors occur in practice. Certain conclusions can be made on the basis of the ASHIP vaccination data and the pseudonyms generated therefrom using names and dates of birth, such as an estimate of the proportion of incorrect matches, since in this data set individuals can be distinguished based on the health insurance number pseudonym. The proportion of missing matches cannot be easily estimated. Various evaluations should be carried out in order to estimate this proportion. There are also multiple possible linkage procedures, such as taking the postal code into account when linking, which can reduce the proportion of incorrect matches. However, this comes at the price of a higher proportion of missing matches. Future use of the data should involve drawing up proposals as to which type of linkage (with vs. without postcode, which pseudonym) is best suited for which form of analysis.

The working group also carried a simulation study to investigate the influence of different degrees of linkage errors on the analysis results. It was found that with proportions of missing matches that can be realistically expected (up to 20%), no significant systematic errors in the analysis of the vaccine side effects are to be expected if the self-controlled case series method is used for evaluation.

Discussion

While in countries such as Denmark or Sweden, pharmacoepidemiological studies on the new COVID-19 vaccines were carried out very promptly and risks such as an increased risk of myocarditis in male adolescents after vaccination with the Pfizer BioNTech mRNA vaccine could be discovered, merging data from a wide variety of sources in Germany requires many steps. This will be done for the first time as part of the RiCO feasibility study.

The fact that other European countries were able to evaluate the data so quickly is partly due to the fact that secondary data-based studies on potential vaccination safety signals have been carried out in those countries for decades. One factor that makes those studies possible include the use of a unique identifier, which a person receives upon first registration at birth or after immigration from another country. That identifier is retained for life and used in all databases. In Germany, there is no such primary identifier that is used for all residents and across all applications. Each individual data source has its own identifier, which makes linking databases much more difficult. Germany also has comparatively strict data protection laws, partly for historical reasons, which only allow the merging of data from different sources after thorough examination and under strict conditions. In addition, there was no existing central research data infrastructure at the beginning of the pandemic that could have been used for scientific purposes. Instead, an entirely new infrastructure with different data flows had to be established under the difficult conditions of the COVID-19 pandemic and specifically developed for possible pharmacoepidemiological investigations. This turned out to be a very lengthy and tedious process. In light of these events, the planned research data centre at the BfArM is a step in the right direction. The RiCO project teaches us that timely secondary data-based analyses of potential safety signals after vaccinations are only possible on the basis of pre-existing research data infrastructure.

The data flows established as part of the feasibility study and the results on the quality of the linkage of DIM, ASHIP, and statutory health insurance data will allow for the planning and initiation of new projects and the continued use of the data. As our working group's simulation study has shown, an evaluation using a self-controlled case series design is preferable when reasonably possible. The analyses that were originally planned are also possible.

Nevertheless, the simulation study shows that the data linkage described above meets high scientific requirements and that the desired analyses should be able to be carried out in principle, provided that a sufficiently large sample of statutory health insurance data can be used.

Long COVID is still a major problem worldwide, as are other secondary diseases of a SARS-CoV-2 infection, such as myocarditis. Risk factors, diagnosis, and treatment of Long COVID are currently the subject of intensive research. It is assumed that the COVID-19 vaccination plays a protective role against Long COVID, whereby some aspects remain unclear, such as the duration of the protection.

Conclusion

The results obtained as part of the feasibility study will make it possible in the future to carry out secondary data-based evaluations of risk factors and the courses of Long COVID and other diseases associated with SARS-CoV-2 infection in Germany as well, taking into account vaccination as a potential protective factor.

Updated: 27.09.2024