Hashing PII for CAPI purchase events without leaks

Q: How do I verify the hash is correct without logging raw input?

Use Meta's Test Events tool with a test_event_code and a test account whose email you control. Fire the event, check Events Manager to see if Meta matched the event to the test user. If it matches, your hash is correct. No need to log raw values at any point.

Every CAPI purchase event carries a block of user identifiers that Meta expects to arrive pre-hashed. Most implementations I audit do the hashing correctly. A meaningful number also log the raw values at some point along the pipeline. That is the leak.

If you are shipping server-side Meta CAPI and you want the capi purchase event pii hashing pipeline to be tight, this tutorial walks the six-step sequence I used during a Q2 2024 DTC rebuild. Only the hashed payload crosses the boundary; raw email and phone never reach Datadog or the observability spans.

What a leak actually looks like

A leak is not a hack. It is an accident. Someone adds a console.log(order) during debugging, the log ships to production, and suddenly the entire order object (including email, phone, billing_address) lives in your log aggregator for the retention window.

A more subtle leak: an error handler calls JSON.stringify(payload) into the error message, which means any Sentry event from that handler carries the raw PII inside the error body. Sentry's default PII scrubber does not know about CAPI payloads.

The structural fix is to route all PII through a single normalization-and-hash utility, and to never log the raw input at any stage of the pipeline. Hashed values are fine to log; raw values stay out of the log stream entirely.

Single arch from the stone bridge at sunset, the arch dominant in the lower two-thirds of the frame, dramatic backlit composition. — // one arch · backlit by sunset

Prerequisites

Three things to have in place before this tutorial helps you.

A working server-side CAPI handler. If you do not have one, the field guide to Meta CAPI covers the build from scratch. This tutorial assumes you are enriching an existing handler.

A single place where all CAPI events pass through. If you have three handlers each building their own payload, consolidate them first. The hashing pipeline is only useful if every event goes through it.

Log redaction configured on your log aggregator for common PII keys (email, phone, address). This is a belt-and-suspenders move against accidental logs.

Step 1: Define the fields Meta expects

Meta's CAPI reference lists the user_data fields with their expected format and hash requirements. The commonly-used fields for a Purchase event:

Field	Normalized format	Hashed (SHA-256)?
`em` (email)	lowercase, trimmed	yes
`ph` (phone)	E.164, digits only, leading +	yes
`external_id`	string (customer id)	yes
`ct`, `st`, `zp`, `country`	lowercase, alphanumeric only	yes
`fn`, `ln` (names)	lowercase, trimmed, remove punctuation	yes
`fbp`, `fbc`	opaque cookie values, as-is	no (already opaque)
`client_ip_address`	IPv4 or IPv6, as-is	no
`client_user_agent`	raw UA string	no

The fields split into two groups. Hashed fields are PII-sensitive and must be pre-hashed before transmission. Non-hashed fields (cookies, IP, user agent) are transmitted raw. Meta server-side still treats them as sensitive but they are not hashable by contract.

Ultra-wide of the entire stone arch bridge at sunset seen from a distant vantage, five arches visible spanning the frame, rolling green hills. — // the full bridge · five arches over hills

Step 2: Build a normalizer per field type

Normalize first, hash second. Normalization that happens after hashing is not normalization, it is garbage.

// src/lib/capi/normalize.ts

export function normalizeEmail(raw: string | null | undefined): string | undefined {
  if (!raw) return undefined;
  const n = raw.trim().toLowerCase();
  return n.includes("@") ? n : undefined;
}

export function normalizePhone(
  raw: string | null | undefined,
  defaultCountry = "1",
): string | undefined {
  if (!raw) return undefined;
  const digits = raw.replace(/\D/g, "");
  if (!digits) return undefined;
  if (digits.length === 10) return `+${defaultCountry}${digits}`;
  if (digits.length === 11 && digits.startsWith(defaultCountry)) return `+${digits}`;
  return `+${digits}`;
}

export function normalizeAlphaNum(raw: string | null | undefined): string | undefined {
  if (!raw) return undefined;
  const n = raw.trim().toLowerCase().replace(/[^a-z0-9]/g, "");
  return n || undefined;
}

export function normalizeName(raw: string | null | undefined): string | undefined {
  if (!raw) return undefined;
  const n = raw.trim().toLowerCase().replace(/[^a-zà-ÿ\-']/g, "");
  return n || undefined;
}

export function normalizeId(raw: string | number | null | undefined): string | undefined {
  if (raw == null || raw === "") return undefined;
  return String(raw).trim() || undefined;
}

Keep the normalizers pure. The contract is no side effects, no logging, and no error handling beyond returning undefined for unusable input. This file will be read, audited, and reused on every handler; its behavior must be boring and predictable.

Step 3: Wrap hashing in a single utility

One entry point. Every field goes through it. Any PR that adds a new field to the payload must route through this utility.

// src/lib/capi/hash.ts
import { createHash } from "node:crypto";
import {
  normalizeEmail,
  normalizePhone,
  normalizeAlphaNum,
  normalizeName,
  normalizeId,
} from "./normalize";

export type HashableField =
  | "email"
  | "phone"
  | "id"
  | "name"
  | "alphaNum";

const NORMALIZERS = {
  email: normalizeEmail,
  phone: normalizePhone,
  id: normalizeId,
  name: normalizeName,
  alphaNum: normalizeAlphaNum,
} as const;

export function hashField(kind: HashableField, raw: unknown): string | undefined {
  const normalizer = NORMALIZERS[kind] as (v: unknown) => string | undefined;
  const normalized = normalizer(raw as never);
  if (!normalized) return undefined;
  return createHash("sha256").update(normalized).digest("hex");
}

The utility does three things. It coerces any input into the normalizer. It returns undefined if the input is unusable. It never touches a logger, a span, or an error sink. The normalized value lives in memory for the SHA-256 call and then falls out of scope.

From directly underneath one arch of the bridge looking up at the curve overhead, weathered stone keystones visible. — // looking up · curve and keystones

Step 4: Compose the user_data block in one place

Every call site that builds a CAPI payload imports a single function that composes user_data. This is where the hashing rules are enforced structurally.

// src/lib/capi/user-data.ts
import { hashField } from "./hash";

type UserDataInput = {
  email?: string | null;
  phone?: string | null;
  customerId?: string | number | null;
  firstName?: string | null;
  lastName?: string | null;
  city?: string | null;
  state?: string | null;
  zip?: string | null;
  country?: string | null;
  fbp?: string | null;
  fbc?: string | null;
  clientIp?: string | null;
  clientUserAgent?: string | null;
};

export function buildUserData(input: UserDataInput) {
  return {
    em: hashField("email", input.email),
    ph: hashField("phone", input.phone),
    external_id: hashField("id", input.customerId),
    fn: hashField("name", input.firstName),
    ln: hashField("name", input.lastName),
    ct: hashField("alphaNum", input.city),
    st: hashField("alphaNum", input.state),
    zp: hashField("alphaNum", input.zip?.split("-")[0]), // US ZIP+4 -> first 5
    country: hashField("alphaNum", input.country),
    fbp: input.fbp ?? undefined,
    fbc: input.fbc ?? undefined,
    client_ip_address: input.clientIp ?? undefined,
    client_user_agent: input.clientUserAgent ?? undefined,
  };
}

Every field that is supposed to be hashed routes through hashField, and the rest (cookies, IP, UA) pass through without transformation. This is a deliberately small, inspectable function; the only way to add a new hashed field is to add a line here.

Step 5: Never log the raw input

The log hygiene rule is one sentence. The input object to buildUserData must not be logged at any point in the request lifecycle.

Enforce this structurally in three places.

First, in your handler, destructure the raw order into the buildUserData input and do not hold a reference to the larger order object in scope during any log statement:

export async function POST(req: Request) {
  const order = await req.json();

  const userData = buildUserData({
    email: order.email ?? order.customer?.email,
    phone: order.phone ?? order.billing_address?.phone,
    customerId: order.customer?.id,
    firstName: order.billing_address?.first_name,
    lastName: order.billing_address?.last_name,
    city: order.billing_address?.city,
    state: order.billing_address?.province_code,
    zip: order.billing_address?.zip,
    country: order.billing_address?.country_code,
    fbp: getCookie(req, "_fbp"),
    fbc: getCookie(req, "_fbc"),
    clientIp: req.headers.get("x-forwarded-for")?.split(",")[0]?.trim() ?? undefined,
    clientUserAgent: req.headers.get("user-agent") ?? undefined,
  });

  const event = {
    event_name: "Purchase",
    event_id: hashEventId(order.id, "Purchase"),
    event_time: Math.floor(Date.now() / 1000),
    action_source: "website",
    user_data: userData,
    custom_data: buildCustomData(order),
  };

  // Log only the hashed payload, never `order` or `userData` raw inputs.
  logger.info("capi.purchase.fired", {
    event_id: event.event_id,
    event_name: event.event_name,
    user_data_fields_present: Object.keys(userData).filter(
      (k) => userData[k as keyof typeof userData] !== undefined,
    ),
  });

  await sendToMeta(event);
  return new Response("ok");
}

The log carries which fields were populated, not the values. That is plenty for operational observability.

Second, configure your log aggregator to redact common PII keys at ingestion. Datadog, Logflare, and most providers support this. It is a last-line defense against a console.log(order) that slipped through review.

Third, make your error handler PII-aware. If the handler catches an error, log the error message and the event_id only, never the raw order object.

try {
  await sendToMeta(event);
} catch (err) {
  logger.error("capi.purchase.failed", {
    event_id: event.event_id,
    error_message: err instanceof Error ? err.message : "unknown",
  });
  // do not log `event`, `order`, or `userData`
  throw err;
}

Extreme macro of weathered ancient stone surface from the bridge at sunset, hairline cracks, lichen colonies, erosion patterns. — // the surface · cracks and lichen

Step 6: Test the pipeline end-to-end

Three tests pin the pipeline against regression.

A hashing test. Given a known raw input, the utility produces a known SHA-256 output. This catches any accidental change to normalization rules.

// hash.test.ts
describe("hashField", () => {
  it("produces a stable hash for a normalized email", () => {
    const h = hashField("email", "  Michael@Example.com  ");
    expect(h).toBe(
      "38a47f4d6d8a58df1e8e2f3c1b2a9d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c"
        .slice(0, 64), // replace with real expected value from your environment
    );
  });

  it("returns undefined for unusable input", () => {
    expect(hashField("email", "")).toBeUndefined();
    expect(hashField("email", null)).toBeUndefined();
    expect(hashField("phone", "abc")).toBeUndefined();
  });
});

A log-leak test. A property-based test fires a synthetic order through the handler, captures all log output via a test logger, and asserts that no log line contains the raw email, phone, or address. This catches regression if someone adds a debug log during a PR.

A contract test. Fire a synthetic event against Meta Test Events with a test_event_code, then fetch the event from Meta's API and assert that user_data contains all expected hashed fields. This catches drift between your payload shape and Meta's expected shape, which is the failure mode the CAPI payload mismatch postmortem was written for.

Common mistakes

Four leaks I see often.

Passing the raw order object to buildUserData. Looks convenient, lets you pass everything through, but the function signature no longer constrains what gets hashed. Keep the input type narrow and explicit.

Hashing with different normalizations in different files. You have utils/hash.ts and lib/crypto.ts and the Shopify webhook handler uses one while the checkout handler uses the other. Match quality drifts between event types because the normalization diverged. One file, one entry point.

Logging the order object in middleware. A logging middleware that snapshots the request body runs before your handler, captures the raw order, and ships it to logs before buildUserData even runs. Exclude CAPI handler paths from request-body logging middleware.

Trusting Sentry's default PII scrubber. Sentry scrubs fields it recognizes. It will not scrub custom nested payloads or JSON-stringified blobs. Configure Sentry's beforeSend hook to drop any extra or contexts field that looks like an order or a CAPI payload.

One end of the stone arch bridge at sunset where the bridge meets the rolling hill, the leftmost arch visible. — // where bridge meets hill · the seam

What to try next

Wire the same pipeline on every event type, not just Purchase. ViewContent, AddToCart, and InitiateCheckout all need the same user_data enrichment. If your match quality is strong on Purchase and weak on ViewContent, the discrepancy is almost always that the browsed-event handlers build their own payload inline instead of routing through buildUserData.

If you want a third-party scan that checks exactly this class of issue (raw PII in logs, normalization drift, missing fields per event type), the CAPI Leak Report covers 14 checks including PII hygiene, and it flags the specific files where raw inputs reach a log sink. It is the scan I wrote after seeing this failure mode on the tracking gap engagement.

For the broader match quality story (why these fields matter to the ad algorithm), the match quality 9 tutorial walks how each field contributes to the score.

FAQ

Can I hash on the client and send hashed values to my server?

Technically yes, but it makes normalization harder to enforce and the client becomes a PII-handling environment that is harder to audit. Prefer server-side hashing. Send raw PII over HTTPS to your own endpoint (the customer already gave you their email at checkout), hash there, forward to Meta.

Does Meta accept pre-hashed fields from any source, or do I need to sign the hash?

Meta accepts the raw SHA-256 hex digest as the identifier; signing and HMAC are not part of the contract. Meta computes the same hash from their hash-on-file PII to match.

What about GDPR? Is hashed PII still personal data?

Yes, under GDPR hashed PII is pseudonymous data, not anonymous. The hash can be used to identify a person indirectly (via Meta's hash-on-file). You still need a lawful basis to transmit it for ad purposes, which for marketing is typically consent. The CAPI and Klaviyo consent post covers the consent gate pattern.

Should I hash fbp and fbc cookies?

No. They are already opaque identifiers. Hashing them produces a value that will never match Meta's records. Pass them through raw.

How do I verify the hash is correct without logging raw input?

Use Meta's Test Events tool with a test_event_code and a test account whose email you control. Fire the event, check Events Manager to see if Meta matched the event to the test user. If it matches, your hash is correct. No need to log raw values at any point.

Does a missing hashed field hurt more than a wrong hashed field?

Yes. A missing field (undefined) is neutral. A present-but-wrong hashed field (wrong normalization) counts as a match attempt that failed, which actively lowers match quality. If you cannot normalize a field correctly for a given input, return undefined rather than sending a mangled hash.

Ultra-wide pure-atmosphere of the pink and orange sunset above rolling hills at last light, no bridge in frame, atmospheric break shot. — // just sky and hills · sunset atmosphere

Sources and specifics

Pipeline shape is drawn from the Q2 2024 Shopify DTC rebuild documented in the tracking gap case study.
Meta's CAPI hashing contract (SHA-256, lowercase, trimmed, E.164, two-letter ISO codes) is published in the Events Manager documentation as of April 2026.
Log redaction recommendation applies to Datadog, Logflare, Axiom, and most cloud log aggregators; exact syntax differs per provider.
The single-entry-point pattern is the same discipline used to prevent schema drift in the CAPI payload mismatch postmortem.
The leak-surface of error-handling paths was the root cause of one observability-PII incident I cleaned up in late 2024; it is the reason the error logging rule is explicit in this post.