Policy EMA/0070 – part 2 Anonymisation of clinical reports for publication (I)

Statistics copiaOn August 20th, EMA issued a guidance to pharmaceutical industry on anonymisation of clinical reports, in the context of phase 1 of the policy, i.e. the publication of clinical reports on the EMA website. The guidance, whose terms and indications were briefly anticipated during the EMA webinar of June 24th, aims at assisting companies by recommending methods, techniques and processes that could be applied to clinical reports, for the purpose of achieving adequate anonymisation while retaining a maximum of scientifically useful information on medicinal products for the benefit of the public. Maxer synthesized the two sources and, as a result, produced this document on the state of the art of anonymisation.


MAHs/Applicants have the responsibility for submitting clinical reports that were rendered anonymous, i.e. the publication under policy 0070 subject to Terms of Use (ToU).

The data in the clinical reports must be processed in such a way that it can no longer be used to identify a natural person by using “all the means likely reasonably to be used” by either the controller or a third party, as described in Directive 95/46/EC.

Anonymization techniques

The same data can be adequately anonymised in different ways, depending on the context of the data release. In the case of public data release the risk of re- identification needs to be very low, whereas for non-public data-sharing a higher risk could be acceptable.

Several anonymization techniques, a field of active research and rapidly evolving, are available to MAHs/Applicants. The legislation is not prescriptive about the techniques to be used by data controllers.[1] According to the Article 29 Working Party Opinion, examples of techniques that could be applicable to clinical reports are:

  • Masking (the removal of values for variables which allow direct or indirect identification of an individual from the data) is likely to be used by MAHs/Applicants initially since pharmaceutical companies will have to anonymize their data retrospectively, i.e. after the clinical report has already been written. However, masking is more likely to decrease the clinical utility of the data compared to other techniques. Therefore, randomization and generalisation techniques are recommended in order to optimise the clinical usefulness of the information published.
  • Randomization is a family of techniques that alters the veracity of the data in order to remove the strong link between the data and the individual. Recommended techniques include noise addition and permutation. Noise addition can consist of, for example, shifting dates randomly by a few days (forward or backwards), based on a uniform, or another type of, distribution. Permutation may have limitations as regards clinical utility as relationships between attributes can be destroyed. Differential privacy may not be applicable in the context of Policy 0070 since the same documents will be made available to all users.
  • Generalising, or diluting, the attributes of data by modifying the respective scale or order of magnitude. An example would be a trial participant who suffers from asthma, born on 19 August 1978. This date of birth would be generalised to 1978. Recommended generalisation techniques include aggregation and k-anonymity. L-diversity and t-closeness may not be recommended as they limit inferences significantly. Aggregation involves the replacement of a value by a range, e.g. a trial participant’s age is replaced with an age range (age of 56 replaced with a range of 50 to 60). K-anonymity goes a step further by preventing a trial participant from being singled out since it is grouped with, at least, k other trial participants in that range.

This and other information are available on EMA Guidance anonymization techniques (Section 6.2.2)

– To be continued –

[1] Opinion 05/2014 on anonymisation techniques of the Art. 29 WP analyses the effectiveness and limits of existing anonymization techniques against the EU legal background of data protection and provides recommendations to handle these techniques by taking account of the residual risk of identification inherent in each of them.