Privacy Rule

What counts as PHI under HIPAA?

TL;DR

PHI is individually identifiable health information held or transmitted by a covered entity or business associate in any form or medium, excluding employment records and certain education records; once identifiers are removed following HIPAA’s Safe Harbor or expert determination methods, data is no longer PHI subject to the Privacy Rule.

PHI is individually identifiable health information held or transmitted by a covered entity or business associate in any form or medium, excluding employment records and certain education records; once identifiers are removed following HIPAA’s Safe Harbor or expert determination methods, data is no longer PHI subject to the Privacy Rule.

Understand Protected Health Information (PHI), the 18 identifiers, limited data sets, and the Safe Harbor method for de-identification—with regulatory citations.

medcomply.ai editorial teamPublished April 10, 2026Updated April 10, 20265 min read

Protected Health Information (PHI) is the atomic unit of HIPAA’s Privacy Rule obligations. If information is PHI, a broad set of permitted-use rules, minimum necessary expectations, individual rights, and breach notification triggers apply. If information is properly de-identified under HIPAA, it is no longer PHI for those purposes—though ethics, contracts, and state law may still impose duties.

45 CFR §160.103 defines PHI as individually identifiable health information held or transmitted by a covered entity or business associate in any form or medium, with specific exclusions (such as certain education records governed by FERPA and employment records held by a covered entity in its capacity as an employer).

The two components: health information + identifiability

PHI requires health information—for example diagnoses, lab values, clinical notes, prescriptions, payment information tied to care, or enrollment data—that is also identifiable to an individual. The regulatory detail lives in how “identifiable” is operationalized through the identifier lists and de-identification standards in 45 CFR §164.514.

Tip

Product and analytics teams should map data elements, not just database table names. A field labeled “user_id” may still be PHI if it can be linked to clinical facts and an individual.

The 18 identifiers (Safe Harbor context)

HIPAA’s Safe Harbor method requires removal of 18 categories of identifiers and no actual knowledge that residual information could be used—alone or in combination—to identify the individual. The list is broader than many engineers expect: it includes device identifiers, IP addresses, and URLs, not only “name + MRN.”

Dates tied to an individual (other than year) are also sensitive. That is why birth dates, admission dates, and death dates often require transformation or suppression before a dataset can qualify as de-identified under Safe Harbor.

Expert determination pathway

Organizations with rich datasets sometimes rely on a qualified expert who applies statistical and scientific methods to determine that the risk of re-identification is very small. This path can preserve more utility than Safe Harbor but requires rigorous documentation and ongoing monitoring when the data or external data landscape changes.

Limited data sets and data use agreements

A limited data set is not fully de-identified PHI. It may include dates and city-level geography, for example, but must exclude direct identifiers. Disclosures for research, public health, and certain healthcare operations require a data use agreement that restricts who can use or disclose the information and limits the recipient’s re-identification attempts.

Limited data sets remain PHI. They reduce exposure but do not eliminate Privacy Rule obligations the way compliant de-identification does.

Operational mistakes that create PHI risk

Teams sometimes assume that “internal analytics sandboxes” are exempt. If the sandbox contains identifiable health information derived from the covered entity or business associate, it is still PHI—even if not production-facing.

Another common gap is log enrichment: application logs that capture clinical codes with user identifiers or device tokens can quietly become PHI repositories without retention or access controls.

Connecting PHI governance to Security Rule controls

Once data is classified as PHI or ePHI, Security Rule safeguards apply: access controls, auditability, integrity protections, and transmission security for electronic forms. Privacy Rule policies then govern who may use the data for treatment, payment, healthcare operations, or pursuant to authorization.

Minimum necessary and PHI flows

Even when a use is generally permitted, the Privacy Rule’s minimum necessary standard often requires reasonable efforts to limit PHI to the amount reasonably necessary to accomplish the intended purpose. Operationalizing minimum necessary means tying data fields to roles: billing teams see financial identifiers clinicians need not export; product analytics should not ingest full clinical narratives when aggregated metrics suffice.

45 CFR §164.502(b) frames minimum necessary expectations; your policies should name who decides necessity for non-routine disclosures and how those decisions are documented.

Research, public health, and special disclosure pathways

PHI can be shared for research or public health under specific regulatory pathways that may not require individual authorization. Those pathways still impose conditions—such as IRB or privacy board approval, data use agreements, or statistical de-identification—that must be reflected in contracts and technical controls.

Teams building “data collaboration” features should involve privacy counsel early. A feature that feels like simple record linkage can inadvertently create re-identification risk or unauthorized redisclosure if downstream recipients are not bound by appropriate assurances.

Checklist for product and data teams

  1. Maintain a data dictionary listing tables or events that can contain identifiers + clinical facts.
  2. Classify environments (prod, staging, analytics) and prohibit PHI in lower environments unless controls match production.
  3. Review new identifiers introduced by integrations (device serials, wearables tokens, advertising IDs).
  4. Map retention: PHI should not live in caches or backups longer than necessary for the purpose.
  5. Train engineers on Safe Harbor vs limited data set vs production PHI—confusion here drives accidental exposure.

Sources & citations

  • 45 CFR §160.103 — Definitions (PHI)Open
  • 45 CFR §164.514 — Requirements for de-identificationOpen
  • HHS guidance on de-identificationOpen

All content verified against official HHS guidance and the Code of Federal Regulations.

Frequently asked questions

What is the difference between PHI and ePHI?
PHI is the broader category of protected health information in any medium. Electronic PHI (ePHI) is PHI maintained or transmitted in electronic form and is the primary focus of the HIPAA Security Rule’s technical safeguards.
Are employee health records PHI?
Employment records held by a covered entity in its role as employer are generally excluded from the PHI definition, but health information created or received in healthcare operations or benefits administration may still be PHI depending on context and holder.
What are the 18 HIPAA identifiers?
They include names, geographic subdivisions smaller than a state, dates directly related to an individual (except year), telephone and fax numbers, email addresses, Social Security numbers, medical record numbers, health plan numbers, account numbers, certificate numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifying number or code.
What is a limited data set?
A limited data set is PHI that may contain certain dates and geographic information but excludes direct identifiers like name and address. It may be used or disclosed for research, public health, or healthcare operations with a data use agreement containing required safeguards.
If we remove names, is the data automatically de-identified?
Not necessarily. Removing only names may leave other identifiers that still permit identification. HIPAA recognizes Safe Harbor (remove all 18 identifiers and no actual knowledge of re-identification risk) or expert determination as compliant paths to de-identification.

Not legal advice. medcomply.ai provides compliance intelligence for educational and operational planning. Consult qualified counsel for legal interpretation.