Policy EMA/0070 – part 2 Anonymisation of clinical reports for publication (II)

Options to establish data set anonymization

According to the Opinion 05/2014 on anonymization techniques,[1] two options are available to establish if the dataset is anonymised:

1. Demonstrate that, after anonymisation, it is no longer possible to:

  • Singling out: possibility to isolate some records of an individual in the dataset;[2]
  • Linkability: ability to link, at least, two records concerning the same data subject or a group of data subjects (in the same database or in two different databases);
  • Inference: the possibility to deduce, with significant probability, the value of an attribute from the values of a set of other attributes.[3]

It is up to a company taking due account of the ultimate purpose and use of the clinical reports to decide: which option to use (demonstrate that after anonymization all three criteria are fulfilled – singling out, linkability and inference, or perform a risk assessment); which anonymisation techniques to use in order to achieve adequate anonymisation, while retaining a maximum of scientifically useful information.

2. Perform an analysis of re-identification risk.

Measuring the risk of re-identification involves selecting an appropriate metric, a suitable threshold and the actual measurement of the risk in the clinical data information to be disclosed. The choice of a metric depends on the context of data release.

Setting an acceptable threshold encompasses:

  • the evaluation of the existing mitigation controls (none in the context of public disclosure)
  • the extent to which a particular disclosure would be an invasion of privacy to the trial participant
  • the motives and the capacity of the attacker to re-identify the data.

Once a threshold has been determined, the actual probability of re-identification can be measured.

MAHs/aApplicants are encouraged to use quantitative methods to measure the risk of re-identification as soon as they are in a position to do so.

EMA recommendation to MAHs/Applicants on how to best achieve anonymisation of personal data of trial participants

There are several sections with data results in clinical reports that may contain personal data of trial participants. These include:

  • disposition (recruiting, pre-assignment, period/arms of the trial, etc.) of trial participants
  • protocol deviations
  • demographics
  • other baseline characteristics
  • treatment compliance
  • pharmacodynamics
  • pharmacokinetics,
  • efficacy
  • safety (adverse events, laboratory findings, and vital signs).

In general, clinical overviews and clinical summaries do not contain personal data related to trial participants. An exception is section (Narratives) of the Summary of Clinical Safety. In addition, some of the tables included in the clinical overviews and clinical summaries may also contain personal data.[4]

Anonymisation of direct and quasi identifiers

Clinical reports submitted to the Agency EMA mostly consist of pseudonymised aggregated data and, therefore, it is unlikely that direct identifiers are present in the reports. Nonetheless, any direct identifiers still present should be redacted, (e.g. name, email, phone number, signature and full address). Patient ID numbers (including randomization/treatment number or safety case ID) can be either redacted or re-coded (Section of the EMA Guidance anonymization techniques).

Quasi-identifiers are not always to be redacted. The need to redact quasi identifiers will depend on the following aspects:

  • Number of quasi identifiers per trial participant.
  • Frequency of trial participants with same category/value on a set of the quasi identifiers (group size).
  • The size of a population.

It is up to the applicant/MAH to decide which quasi identifiers need to be redacted and which could remain in the reports. The rationale for the decision should be included in the risk assessment section of the anonymisation report to be provided to EMA (Section Hereunder is shown how to proceed for instance with patient dates, geographical location and small population / rare diseases:

  • Dates: individual patient dates can be offset or alternatively is possible to derive the Relative Study Day Method (Section [5]
  • Geographical location: can be aggregate or generalise from country to region or continent (section
  • Small populations and rare diseases: risk assessment is key to ensuring adequate anonymisation (section

[1]  Opinion 05/2014 on anonymisation techniques of the Art. 29 WP analyses the effectiveness and limits of existing anonymisation techniques against the EU legal background of data protection and provides recommendations to handle these techniques by taking account of the residual risk of identification inherent in each of them.

[2] In the context of phase 1 of policy 0070, dataset is the set of clinical reports published by the Agency

[3] An anonymisation solution preventing these three criteria is considered to be robust against identification performed by the most likely and reasonable means the data controller or any third party may employ and will render the data anonymous

[4]  Described in ICH M4E (R1)8 which states that “Narratives should not be included here, unless an abbreviated narrative of particular events is considered critical to the summary assessment of the drug.”

[5]  For additional information see TransCelerete BioPharma, Data De-identification and Anonymization of Individual Patient Data in Clinical Studies  – A Model Approach, http://www.transceleratebiopharmainc.com/wp-content/uploads/2015/04/CDT-Data-Anonymization-Paper-FINAL.pdf