Privacy Rule
What counts as PHI under HIPAA?
TL;DR
PHI is individually identifiable health information held or transmitted by a covered entity or business associate in any form or medium, excluding employment records and certain education records; once identifiers are removed following HIPAA’s Safe Harbor or expert determination methods, data is no longer PHI subject to the Privacy Rule.
Understand Protected Health Information (PHI), the 18 identifiers, limited data sets, and the Safe Harbor method for de-identification—with regulatory citations.
Protected Health Information (PHI) is the atomic unit of HIPAA’s Privacy Rule obligations. If information is PHI, a broad set of permitted-use rules, minimum necessary expectations, individual rights, and breach notification triggers apply. If information is properly de-identified under HIPAA, it is no longer PHI for those purposes—though ethics, contracts, and state law may still impose duties.
45 CFR §160.103 defines PHI as individually identifiable health information held or transmitted by a covered entity or business associate in any form or medium, with specific exclusions (such as certain education records governed by FERPA and employment records held by a covered entity in its capacity as an employer).
The two components: health information + identifiability
PHI requires health information—for example diagnoses, lab values, clinical notes, prescriptions, payment information tied to care, or enrollment data—that is also identifiable to an individual. The regulatory detail lives in how “identifiable” is operationalized through the identifier lists and de-identification standards in 45 CFR §164.514.
Tip
Product and analytics teams should map data elements, not just database table names. A field labeled “user_id” may still be PHI if it can be linked to clinical facts and an individual.
The 18 identifiers (Safe Harbor context)
HIPAA’s Safe Harbor method requires removal of 18 categories of identifiers and no actual knowledge that residual information could be used—alone or in combination—to identify the individual. The list is broader than many engineers expect: it includes device identifiers, IP addresses, and URLs, not only “name + MRN.”
Dates tied to an individual (other than year) are also sensitive. That is why birth dates, admission dates, and death dates often require transformation or suppression before a dataset can qualify as de-identified under Safe Harbor.
Expert determination pathway
Organizations with rich datasets sometimes rely on a qualified expert who applies statistical and scientific methods to determine that the risk of re-identification is very small. This path can preserve more utility than Safe Harbor but requires rigorous documentation and ongoing monitoring when the data or external data landscape changes.
Limited data sets and data use agreements
A limited data set is not fully de-identified PHI. It may include dates and city-level geography, for example, but must exclude direct identifiers. Disclosures for research, public health, and certain healthcare operations require a data use agreement that restricts who can use or disclose the information and limits the recipient’s re-identification attempts.
Limited data sets remain PHI. They reduce exposure but do not eliminate Privacy Rule obligations the way compliant de-identification does.
Operational mistakes that create PHI risk
Teams sometimes assume that “internal analytics sandboxes” are exempt. If the sandbox contains identifiable health information derived from the covered entity or business associate, it is still PHI—even if not production-facing.
Another common gap is log enrichment: application logs that capture clinical codes with user identifiers or device tokens can quietly become PHI repositories without retention or access controls.
Connecting PHI governance to Security Rule controls
Once data is classified as PHI or ePHI, Security Rule safeguards apply: access controls, auditability, integrity protections, and transmission security for electronic forms. Privacy Rule policies then govern who may use the data for treatment, payment, healthcare operations, or pursuant to authorization.
Minimum necessary and PHI flows
Even when a use is generally permitted, the Privacy Rule’s minimum necessary standard often requires reasonable efforts to limit PHI to the amount reasonably necessary to accomplish the intended purpose. Operationalizing minimum necessary means tying data fields to roles: billing teams see financial identifiers clinicians need not export; product analytics should not ingest full clinical narratives when aggregated metrics suffice.
45 CFR §164.502(b) frames minimum necessary expectations; your policies should name who decides necessity for non-routine disclosures and how those decisions are documented.
Research, public health, and special disclosure pathways
PHI can be shared for research or public health under specific regulatory pathways that may not require individual authorization. Those pathways still impose conditions—such as IRB or privacy board approval, data use agreements, or statistical de-identification—that must be reflected in contracts and technical controls.
Teams building “data collaboration” features should involve privacy counsel early. A feature that feels like simple record linkage can inadvertently create re-identification risk or unauthorized redisclosure if downstream recipients are not bound by appropriate assurances.
Checklist for product and data teams
- Maintain a data dictionary listing tables or events that can contain identifiers + clinical facts.
- Classify environments (prod, staging, analytics) and prohibit PHI in lower environments unless controls match production.
- Review new identifiers introduced by integrations (device serials, wearables tokens, advertising IDs).
- Map retention: PHI should not live in caches or backups longer than necessary for the purpose.
- Train engineers on Safe Harbor vs limited data set vs production PHI—confusion here drives accidental exposure.
Sources & citations
- 45 CFR §160.103 — Definitions (PHI)Open
- 45 CFR §164.514 — Requirements for de-identificationOpen
- HHS guidance on de-identificationOpen
All content verified against official HHS guidance and the Code of Federal Regulations.
Frequently asked questions
What is the difference between PHI and ePHI?▾
Are employee health records PHI?▾
What are the 18 HIPAA identifiers?▾
What is a limited data set?▾
If we remove names, is the data automatically de-identified?▾
Not legal advice. medcomply.ai provides compliance intelligence for educational and operational planning. Consult qualified counsel for legal interpretation.