Table of Contents



What is Secondary Use  of Health Data?

What is the Government Considering?

What is De-Identification?

Problems with De-identification

What is Anonymisation?

What is Re-Identification?

How Easy is Re-identification?

What does the USA say?

What does the UK say?

What else is going on in the UK?


Links to information on secondary use of health data, de-identification and re-identification


The term “Secondary Use” is a polite and non-threatening way of saying “Use of personal, private, health data that was gathered to support an individual’s health care, in ways it was not originally intended”.

There is no doubt that the analysis of large amounts of good quality health data can reveal interesting and valuable insights.

However the risks to privacy, safety and the well being of patients is significant.

It is important to realise that health data that exists in health records if for the support of the healthcare of specific patients. Patients are usually quite willing for the people whom they consult to make notes and keep records on the patient’s health and their treatment.

However, that consent may or may not translate to transferring patent health data to secondary systems like My Health Record. In fact the recent chnges to the legislation that permits My Health Record means that it is now not a requirement that patients give their consent for their health data to be collected and stored.

This was done so that the government can register people for a My Health record without asking their permission. The government’s trials of “opt-out” registration did just that and opens the way for the government to give all Australians a My Health Record which can then start to accumulate health data automatically.

When it comes to using a patient’s health data for purposes for which it was not intended, the law allows this to happen provided the patient has given their consent. Considering that the patient has never been asked to give their consent, it would seem to be a difficult thing to achieve.

Even it consent had been a requirement it woud have only been for collection for the provision of health care.

We do not yet know how the government intends to address this consent problem.

This webpage discusses the subject of the use of personal, private, health data that was gathered to support an individual’s health care, in ways it was not originally intended.

What is Secondary Use of Health Data?


From an American Medical Informatics Association White Paper.

“Secondary use of health data applies personal health information (PHI) for uses outside of direct health care delivery. It includes such activities as analysis, research, quality and safety measurement, public health, payment, provider certification or accreditation, marketing, and other business applications, including strictly commercial activities.

Secondary use of health data can enhance health care experiences for individuals, expand knowledge about disease and appropriate treatments, strengthen understanding about effectiveness and efficiency of health care systems, support public health and security goals, and aid businesses in meeting customers’ needs. Yet, complex ethical, political, technical, and social issues surround the secondary use of health data.

While not new, these issues play increasingly critical and complex roles given current public and private sector activities not only expanding health data volume, but also improving access to data. Lack of coherent policies and standard “good practices” for secondary use of health data impedes efforts to strengthen the U.S. health care system.

The nation requires a framework for the secondary use of health data with a robust infrastructure of policies, standards, and best practices. Such a framework can guide and facilitate widespread collection, storage, aggregation, linkage, and transmission of health data. The framework will provide appropriate protections for legitimate secondary use.”

Toward a national framework for the secondary use of health data

Challenges for secondary use of clinical data

EHR data does not automatically lead to knowledge

  • Data quality and accuracy is not a top priority for busy clinicians (de Lusignan, 2005)
  • There is “tension” between structured and narrative documentation (Rosenbloom, 2011)
  • Many data idiosyncrasies (Weiner, 2011)
  • “Left censoring”: First instance of disease in record may not be when first manifested
  • “Right censoring”: Data source may not cover long enough time interval
  • Data might not be captured from other clinical (other hospitals or health systems) or non‐clinical (OTC drugs) settings
  • Bias in testing or treatment
  • Institutional or personal variation in practice or documentation styles
  • Inconsistent use of coding or standards

Data in EHRs is incomplete

  • Claims data failed to identify more than half of patients with prognostically important cardiac conditions prior to admission for catheterization (Jollis, 1993)
  • Various approaches generated variable rate of retrieval of cases for quality measurement (Benin, 2005; Rhodes, 2007; Parsons, 2012); algorithmic methods can lead to improvement (Benin, 2011)
  • At Columbia University Medical Center, 48.9% of patients with ICD‐9 code for pancreatic cancers did not have corresponding disease documentation in pathology reports, with many data elements incompletely documented (Botsis, 2010)

Secondary Use of Clinical Data from Electronic Health Records
Presentation by William Hersh
Citations are included in the presentation


The use of health data for scientific research purposes is quite justifiable, providing it is done safely and with respect to the individuals whose data is being used.

The issue is not “should valuable health data be used for secondary purposes?”, it is “how should it be done?”.

With the advent of Big Data and new and better techniques for data analysis and data linking, both the value of secondary use and the risks of using the data have increased enormously.

Australia is probably way behind the US in this area and a lot can be learned from their efforts and experiences. Many of the quotes on this page are from published material that originated in the US.

We should only take a different approach if we, or our environment, are different. If nothing else, it’s a good place to start.


What is the Government Considering?

My Health Record Data sharing

“In June 2016 HealthConsult was engaged to assist the Australian Government in developing a framework for the secondary use of data in the My Health Record system for research, policy, system use, quality improvement and evaluation activities.”

About This project

Under the My Health Records Act 2012 (the Act), health information in a My Health Record may be collected, used and disclosed “for any purpose” with the consent of the healthcare recipient. One of the functions of the System Operator is “to prepare and provide de-identified data for research and public health purposes.”

Before these provisions of the Act will be implemented, a framework for secondary use of My Health Record system data must be established.

HealthConsult’s role is to develop a draft Framework and associated draft Implementation Plan that will facilitate the secondary uses of My Health Record system data.

Project time frames

This project commenced in late June 2016 and is expected to be completed in February 2017.

The public consultation process will begin in late August/early September and will conclude in early November.”


In addition, the Australian Bureau of Statistics has this on its website:

Retention of names and addresses collected in the 2016 Census of Population and Housing

The Australian Bureau of Statistics has decided to retain names and addresses collected in the 2016 Census of Population and Housing in order to enable a richer and dynamic statistical picture of Australia through the combination of Census data with other survey and administrative data.

Whilst the Census has always been valuable in its own right, when used in combination with other data the Census can provide even greater insight. Some examples are:

  • The combination of Census data and education data can provide insight into employment outcomes from the various educational pathways available to Australians, and
  • The combination of Census data and health data can help improve Australia’s understanding and support of people who require mental health services and assist with the design of better programs of support and prevention.

The retention of addresses will also support the ABS Address Register enabling more efficient survey operations, reducing the cost to taxpayers and the burden on Australian households.

This decision has been informed by public submissions, public testing and the conduct of a Privacy Impact Assessment.

Retention of names and addresses collected in the 2016 Census of Population and Housing
Australian Bureau of Statistics website. First published 18 December 2015, last updated 14 April 2016

This gives the strong impression that heath data from My Health Record, and potentially other data stores, will be used for a variety of purposes. What these purposes are and what their limits are is not clear.

And the Prime Minister has even played the terrorism card:

“Prime Minister Malcolm Turnbull has indicated the Federal Government may want access to the mental health files of individuals suspected of terrorist activity.”

ABC news

Other data sharing initiatives:

PM&C Public Sector Data Management

This is a link to an internal study commissioned by the Secretary of the Department of the Prime Minister and Cabinet on how the Australian Public Service manages public sector data.

And here is an “Implementation Report outlines the outputs and initiatives that have been undertaken to date to meet the recommendations of the report and further the public data agenda.”

The biggest risk of sharing de-identified data is that it may be re-identified. This section is the only mention in the internal study of re-identification.

Think differently about legislation

Significant gains can be made in the short-term by educating staff on how to interpret legislation to share and make better use of data.

This requires a change in mindset for staff to look for ways to make data available within the law. Health is seeking to link extracts of PBS and Medicare Benefits Schedule (MBS) data, to be shared with the States.

Publishing guidelines can also reduce risk aversion. It provides support for staff to take a pragmatic interpretation of the law. For example, the Privacy Guidelines published by the Information Commissioner have enabled staff to release data where it was previously not released due to uncertainty.

The guidelines could indicate that the term ‘reasonably identifiable’ should take into account:

  • who will have access to information;
  • what information they have; and
  • the likelihood of being able to re-identify information.

The Implementation report does not contain any mention of the term re-identification

The reference to the Department of Health sharing linked PBS and MBS data with the states could well be related to this media release: Govt releases billion-line ‘de-identified’ health dataset