Advanced

Failure Analysis

The reporter classifies each failure into one of seven root-cause categories, groups similar failures into clusters, and renders the breakdown in the detailed PDF. Classification runs entirely offline — no network call, no AI service.

Overview

Failure analysis turns a wall of stack traces into a ranked set of clusters. Every failed test is matched against seven root-cause categories, then failures that share a cause and signature are grouped together so you can see what broke rather than scrolling through every individual trace.

Seven categories — each failure is sorted into one of seven trained root-cause buckets: assertion failures, timeouts, element-not-found, network errors, navigation errors, environment/config issues, and flaky-passed-on-retry. Anything the classifier can't place with confidence falls back to unknown.
Clusters — failures with the same cause and a similar signature collapse into a single cluster with a representative message and a count.
Match strength — each cluster carries a qualitative strong / moderate / weak label, never a numeric confidence percentage.

Fully offline

Classification runs in-process: a small, embedded Naive Bayes classifier, trained offline and shipped inside the package. It never sends your stack traces, test names, or any other data over the network, and it does not call any AI or LLM service. The report shows a qualitative match strength — strong, moderate, or weak — rather than a raw probability, because a precise-looking percentage would overstate what a lightweight model should promise.

Where it renders

How much detail you see depends on the template:

detailed — full failure-analysis section: every cluster, its category, match strength, representative message, and affected test count.
executive — a single one-liner summarising the dominant root cause.
minimal — nothing; the section is omitted entirely.

Setup

Failure analysis is on by default. Pair it with the detailed template to see the full breakdown.

reporter: [['@reportforge/playwright-pdf', {  template: 'detailed',  failureAnalysis: { enabled: true, maxClusters: 5 },}]]

To turn the section off entirely, set failureAnalysis: { enabled: false }. Your tests and PDF are otherwise unaffected.

Local feedback collection (privacy)

When a failure can't be recognised — or is matched only with weak confidence — the reporter records it locally so the classifier can improve over time. These samples are appended to ~/.reportforge/{project}/unclassified.csv.

Tokenized and redacted — the stored text is reduced to tokens with email addresses and filesystem paths stripped out. Raw error text is never written, and no secrets or URLs are captured.
Local only — the file stays on your machine. Nothing is ever transmitted, uploaded, or phoned home.
First-run notice — a one-time notice is printed to stderr the first time anything is collected, so the behaviour is never silent.

Collection is on by default. To opt out, set collectUnclassified: false:

reporter: [['@reportforge/playwright-pdf', {  failureAnalysis: { collectUnclassified: false },}]]

To review what has been collected, merge every project's file into a single CSV in the current directory:

npx @reportforge/playwright-pdf reportforge-export-feedback

This writes ./reportforge-feedback.csv. There is no upload in v1 — sharing the exported file is entirely a manual user choice.

Model updates

The classifier model auto-updates over the existing license-refresh channel — no separate download, no extra network call. When the reporter refreshes its license, the server may include a newer model in the same response, which is cached locally for subsequent runs.

Signed and verified — every delivered model is Ed25519-signed and verified against the public key bundled into the reporter at build. The signature is re-checked on every load, so a tampered cache file is rejected and never used.
Fully offline-tolerant — there is no network dependency. The bundled model is always the floor: if there is no cached model, no network, or the cached model fails verification, the reporter falls back to the model shipped in the package.
Monotonic — a delivered model is only adopted when its version is newer than the bundled model; older or equal versions are ignored.

To pin the bundled model for reproducibility or audit, set autoUpdateModel: false. The reporter then always uses the model shipped in the package and ignores any cached update:

reporter: [['@reportforge/playwright-pdf', {  template: 'detailed',  failureAnalysis: { autoUpdateModel: false },}]]

Options reference

All keys live under the failureAnalysis object.

Key	Type / Default	Description
enabled	boolean · true	Set to false to skip classification and omit the failure-analysis section.
maxClusters	number · 10	Maximum number of clusters to render, ranked by affected test count. Remaining failures are folded into the totals.
minStrength	enum · 'weak'	'weak' · 'moderate' · 'strong'. Clusters below this match strength are hidden.
maxFailuresToAnalyse	number · 500	Upper bound on failures fed into the classifier, keeping analysis fast on very large runs.
collectUnclassified	boolean · true	Set to false to disable local collection of unrecognized failures. When enabled, tokenized and redacted samples are written to ~/.reportforge/{project}/unclassified.csv — local only, never transmitted.
autoUpdateModel	boolean · true	Auto-update the classifier model over the signed license-refresh channel (Ed25519-verified, offline-tolerant, bundled model is the floor). Set to false to pin the bundled model for reproducibility.

01Overview

02Fully offline

03Where it renders

04Setup

05Local feedback collection (privacy)

06Model updates

07Options reference