π The Observatory of Anonymity
de-identifying data is really hard, and it only gets harder over time. Say the NHS releases prescribing data: date, doctor, prescription, and a random identifier. That’s a super-useful data-set for medical research.
And say the next year, Addison-Lee or another large minicab company suffers a breach (no human language contains the phrase “as secure as minicab IT”) that contains many of the patients’ journeys that resulted in that prescription-writing.
Merge those two data-sets and you re-identify many of the patients in the data. Subsequent releases and breaches compound the problem, and there’s nothing the NHS can do to either predict or prevent a breach by a minicab company.
Even if the NHS is confident in its anonymization, it can never be confident in the sturdiness of that anonymity over time.