Every CAPI purchase event carries a block of user identifiers that Meta expects to arrive pre-hashed. Most implementations I audit do the hashing correctly. A meaningful number also log the raw values at some point along the pipeline. That is the leak.
If you are shipping server-side Meta CAPI and you want the capi purchase event pii hashing pipeline to be tight, this tutorial walks the six-step sequence I used during a Q2 2024 DTC rebuild. No raw email ever lands in Datadog. No phone number ends up in an observability span. The hashed payload and nothing else.
What a leak actually looks like
A leak is not a hack. It is an accident. Someone adds a console.log(order) during debugging, the log ships to production, and suddenly the entire order object (including email, phone, billing_address) lives in your log aggregator for the retention window.
A more subtle leak: an error handler calls JSON.stringify(payload) into the error message, which means any Sentry event from that handler carries the raw PII inside the error body. Sentry's default PII scrubber does not know about CAPI payloads.
The structural fix is to route all PII through a single normalization-and-hash utility, and to never log the raw input at any stage of the pipeline. The hashed values are fine to log. The raw are not.
Default view: only the SHA-256 output is visible. Raw and normalized inputs stay redacted from any log sink.
Prerequisites
Three things to have in place before this tutorial helps you.
A working server-side CAPI handler. If you do not have one, the field guide to Meta CAPI covers the build from scratch. This tutorial assumes you are enriching an existing handler.
A single place where all CAPI events pass through. If you have three handlers each building their own payload, consolidate them first. The hashing pipeline is only useful if every event goes through it.
Log redaction configured on your log aggregator for common PII keys (email, phone, address). This is a belt-and-suspenders move against accidental logs.
Step 1: Define the fields Meta expects
Meta's CAPI reference lists the user_data fields with their expected format and hash requirements. The commonly-used fields for a Purchase event:
| Field | Normalized format | Hashed (SHA-256)? |
|---|---|---|
em (email) | lowercase, trimmed | yes |
ph (phone) | E.164, digits only, leading + | yes |
external_id | string (customer id) | yes |
ct, st, zp, country | lowercase, alphanumeric only | yes |
fn, ln (names) | lowercase, trimmed, remove punctuation | yes |
fbp, fbc | opaque cookie values, as-is | no (already opaque) |
client_ip_address | IPv4 or IPv6, as-is | no |
client_user_agent | raw UA string | no |
The fields split into two groups. Hashed fields are PII-sensitive and must be pre-hashed before transmission. Non-hashed fields (cookies, IP, user agent) are transmitted raw. Meta server-side still treats them as sensitive but they are not hashable by contract.
Step 2: Build a normalizer per field type
Normalize first, hash second. Normalization that happens after hashing is not normalization, it is garbage.
// src/lib/capi/normalize.ts
export function normalizeEmail(raw: string | null | undefined): string | undefined {
if (!raw) return undefined;
const n = raw.trim().toLowerCase();
return n.includes("@") ? n : undefined;
}
export function normalizePhone(
raw: string | null | undefined,
defaultCountry = "1",
): string | undefined {
if (!raw) return undefined;
const digits = raw.replace(/\D/g, "");
if (!digits) return undefined;
if (digits.length === 10) return `+${defaultCountry}${digits}`;
if (digits.length === 11 && digits.startsWith(defaultCountry)) return `+${digits}`;
return `+${digits}`;
}
export function normalizeAlphaNum(raw: string | null | undefined): string | undefined {
if (!raw) return undefined;
const n = raw.trim().toLowerCase().replace(/[^a-z0-9]/g, "");
return n || undefined;
}
export function normalizeName(raw: string | null | undefined): string | undefined {
if (!raw) return undefined;
const n = raw.trim().toLowerCase().replace(/[^a-zà-ÿ\-']/g, "");
return n || undefined;
}
export function normalizeId(raw: string | number | null | undefined): string | undefined {
if (raw == null || raw === "") return undefined;
return String(raw).trim() || undefined;
}
Keep the normalizers pure. No side effects. No logging. No error handling beyond returning undefined for unusable input. This file will be read, audited, and reused on every handler; its behavior must be boring and predictable.
Step 3: Wrap hashing in a single utility
One entry point. Every field goes through it. Any PR that adds a new field to the payload must route through this utility.
// src/lib/capi/hash.ts
import { createHash } from "node:crypto";
import {
normalizeEmail,
normalizePhone,
normalizeAlphaNum,
normalizeName,
normalizeId,
} from "./normalize";
export type HashableField =
| "email"
| "phone"
| "id"
| "name"
| "alphaNum";
const NORMALIZERS = {
email: normalizeEmail,
phone: normalizePhone,
id: normalizeId,
name: normalizeName,
alphaNum: normalizeAlphaNum,
} as const;
export function hashField(kind: HashableField, raw: unknown): string | undefined {
const normalizer = NORMALIZERS[kind] as (v: unknown) => string | undefined;
const normalized = normalizer(raw as never);
if (!normalized) return undefined;
return createHash("sha256").update(normalized).digest("hex");
}
The utility does three things. It coerces any input into the normalizer. It returns undefined if the input is unusable. It never touches a logger, a span, or an error sink. The normalized value lives in memory for the SHA-256 call and then falls out of scope.
Step 4: Compose the user_data block in one place
Every call site that builds a CAPI payload imports a single function that composes user_data. This is where the hashing rules are enforced structurally.
// src/lib/capi/user-data.ts
import { hashField } from "./hash";
type UserDataInput = {
email?: string | null;
phone?: string | null;
customerId?: string | number | null;
firstName?: string | null;
lastName?: string | null;
city?: string | null;
state?: string | null;
zip?: string | null;
country?: string | null;
fbp?: string | null;
fbc?: string | null;
clientIp?: string | null;
clientUserAgent?: string | null;
};
export function buildUserData(input: UserDataInput) {
return {
em: hashField("email", input.email),
ph: hashField("phone", input.phone),
external_id: hashField("id", input.customerId),
fn: hashField("name", input.firstName),
ln: hashField("name", input.lastName),
ct: hashField("alphaNum", input.city),
st: hashField("alphaNum", input.state),
zp: hashField("alphaNum", input.zip?.split("-")[0]), // US ZIP+4 -> first 5
country: hashField("alphaNum", input.country),
fbp: input.fbp ?? undefined,
fbc: input.fbc ?? undefined,
client_ip_address: input.clientIp ?? undefined,
client_user_agent: input.clientUserAgent ?? undefined,
};
}
Every field that is supposed to be hashed routes through hashField. Every field that is not (cookies, IP, UA) passes through without transformation. This is a deliberately small, inspectable function; the only way to add a new hashed field is to add a line here.
Step 5: Never log the raw input
The log hygiene rule is one sentence. The input object to buildUserData must not be logged at any point in the request lifecycle.
Enforce this structurally in three places.
First, in your handler, destructure the raw order into the buildUserData input and do not hold a reference to the larger order object in scope during any log statement:
export async function POST(req: Request) {
const order = await req.json();
const userData = buildUserData({
email: order.email ?? order.customer?.email,
phone: order.phone ?? order.billing_address?.phone,
customerId: order.customer?.id,
firstName: order.billing_address?.first_name,
lastName: order.billing_address?.last_name,
city: order.billing_address?.city,
state: order.billing_address?.province_code,
zip: order.billing_address?.zip,
country: order.billing_address?.country_code,
fbp: getCookie(req, "_fbp"),
fbc: getCookie(req, "_fbc"),
clientIp: req.headers.get("x-forwarded-for")?.split(",")[0]?.trim() ?? undefined,
clientUserAgent: req.headers.get("user-agent") ?? undefined,
});
const event = {
event_name: "Purchase",
event_id: hashEventId(order.id, "Purchase"),
event_time: Math.floor(Date.now() / 1000),
action_source: "website",
user_data: userData,
custom_data: buildCustomData(order),
};
// Log only the hashed payload, never `order` or `userData` raw inputs.
logger.info("capi.purchase.fired", {
event_id: event.event_id,
event_name: event.event_name,
user_data_fields_present: Object.keys(userData).filter(
(k) => userData[k as keyof typeof userData] !== undefined,
),
});
await sendToMeta(event);
return new Response("ok");
}
The log carries which fields were populated, not the values. That is plenty for operational observability.
Second, configure your log aggregator to redact common PII keys at ingestion. Datadog, Logflare, and most providers support this. It is a last-line defense against a console.log(order) that slipped through review.
Third, make your error handler PII-aware. If the handler catches an error, log the error message and the event_id only, never the raw order object.
try {
await sendToMeta(event);
} catch (err) {
logger.error("capi.purchase.failed", {
event_id: event.event_id,
error_message: err instanceof Error ? err.message : "unknown",
});
// do not log `event`, `order`, or `userData`
throw err;
}
Step 6: Test the pipeline end-to-end
Three tests pin the pipeline against regression.
A hashing test. Given a known raw input, the utility produces a known SHA-256 output. This catches any accidental change to normalization rules.
// hash.test.ts
describe("hashField", () => {
it("produces a stable hash for a normalized email", () => {
const h = hashField("email", " Michael@Example.com ");
expect(h).toBe(
"38a47f4d6d8a58df1e8e2f3c1b2a9d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c"
.slice(0, 64), // replace with real expected value from your environment
);
});
it("returns undefined for unusable input", () => {
expect(hashField("email", "")).toBeUndefined();
expect(hashField("email", null)).toBeUndefined();
expect(hashField("phone", "abc")).toBeUndefined();
});
});
A log-leak test. A property-based test fires a synthetic order through the handler, captures all log output via a test logger, and asserts that no log line contains the raw email, phone, or address. This catches regression if someone adds a debug log during a PR.
A contract test. Fire a synthetic event against Meta Test Events with a test_event_code, then fetch the event from Meta's API and assert that user_data contains all expected hashed fields. This catches drift between your payload shape and Meta's expected shape, which is the failure mode the CAPI payload mismatch postmortem was written for.
Common mistakes
Four leaks I see often.
Passing the raw order object to buildUserData. Looks convenient, lets you pass everything through, but the function signature no longer constrains what gets hashed. Keep the input type narrow and explicit.
Hashing with different normalizations in different files. You have utils/hash.ts and lib/crypto.ts and the Shopify webhook handler uses one while the checkout handler uses the other. Match quality drifts between event types because the normalization diverged. One file, one entry point.
Logging the order object in middleware. A logging middleware that snapshots the request body runs before your handler, captures the raw order, and ships it to logs before buildUserData even runs. Exclude CAPI handler paths from request-body logging middleware.
Trusting Sentry's default PII scrubber. Sentry scrubs fields it recognizes. It will not scrub custom nested payloads or JSON-stringified blobs. Configure Sentry's beforeSend hook to drop any extra or contexts field that looks like an order or a CAPI payload.
What to try next
Wire the same pipeline on every event type, not just Purchase. ViewContent, AddToCart, and InitiateCheckout all need the same user_data enrichment. If your match quality is strong on Purchase and weak on ViewContent, the discrepancy is almost always that the browsed-event handlers build their own payload inline instead of routing through buildUserData.
If you want a third-party scan that checks exactly this class of issue (raw PII in logs, normalization drift, missing fields per event type), the CAPI Leak Report covers 14 checks including PII hygiene, and it flags the specific files where raw inputs reach a log sink. It is the scan I wrote after seeing this failure mode on the tracking gap engagement.
For the broader match quality story (why these fields matter to the ad algorithm), the match quality 9 tutorial walks how each field contributes to the score.
FAQ
Can I hash on the client and send hashed values to my server?
Technically yes, but it makes normalization harder to enforce and the client becomes a PII-handling environment that is harder to audit. Prefer server-side hashing. Send raw PII over HTTPS to your own endpoint (the customer already gave you their email at checkout), hash there, forward to Meta.
Does Meta accept pre-hashed fields from any source, or do I need to sign the hash?
Meta accepts the raw SHA-256 hex digest. No signing. No HMAC. The hash itself is the identifier. Meta computes the same hash from their hash-on-file PII to match.
What about GDPR? Is hashed PII still personal data?
Yes, under GDPR hashed PII is pseudonymous data, not anonymous. The hash can be used to identify a person indirectly (via Meta's hash-on-file). You still need a lawful basis to transmit it for ad purposes, which for marketing is typically consent. The CAPI and Klaviyo consent post covers the consent gate pattern.
Should I hash fbp and fbc cookies?
No. They are already opaque identifiers. Hashing them produces a value that will never match Meta's records. Pass them through raw.
How do I verify the hash is correct without logging raw input?
Use Meta's Test Events tool with a test_event_code and a test account whose email you control. Fire the event, check Events Manager to see if Meta matched the event to the test user. If it matches, your hash is correct. No need to log raw values at any point.
Does a missing hashed field hurt more than a wrong hashed field?
Yes. A missing field (undefined) is neutral. A present-but-wrong hashed field (wrong normalization) counts as a match attempt that failed, which actively lowers match quality. If you cannot normalize a field correctly for a given input, return undefined rather than sending a mangled hash.
Sources and specifics
- Pipeline shape is drawn from the Q2 2024 Shopify DTC rebuild documented in the tracking gap case study.
- Meta's CAPI hashing contract (SHA-256, lowercase, trimmed, E.164, two-letter ISO codes) is published in the Events Manager documentation as of April 2026.
- Log redaction recommendation applies to Datadog, Logflare, Axiom, and most cloud log aggregators; exact syntax differs per provider.
- The single-entry-point pattern is the same discipline used to prevent schema drift in the CAPI payload mismatch postmortem.
- The leak-surface of error-handling paths was the root cause of one observability-PII incident I cleaned up in late 2024; it is the reason the error logging rule is explicit in this post.
