What a Production End-to-End Test Suite Teaches You About Testing

Every team that ships software eventually discovers that end-to-end tests are not unit tests with a longer runtime. They are a different discipline with different failure modes, different economics, and a different relationship to the code they verify. This article distills the lessons that emerge from maintaining a real E2E suite against a live production application over time. The examples come from building the test suite for TeamPulse Report, a SaaS product in active development.

1. Auth is infrastructure, not a test case

The first surprise is how much engineering goes into getting into the application before you write a single assertion. In a production suite, sign-in is never inside individual specs. It happens once in a global setup that saves session state, and every test picks up that state through shared storage.

This teaches two things. First, tests that share a pre-warmed auth state run faster and fail less because there is no sign-in network call per spec. Second, authenticated state is itself a test artifact that must be regenerated — it expires, breaks when the auth provider changes, and needs its own documented workflow. If your team struggles with flaky sign-in flows in E2E, the answer is not better retries; it is moving auth out of the tests entirely.

A corollary: when your application has feature flags that change the UI, you need separate auth states for different flag configurations ^[1]. One account sees the compliance features; another sees the baseline. The global setup produces both states, and specs pick the right one per feature-gating test. This keeps feature-flag tests from contaminating unrelated specs.

2. Wait for what the user sees, not for the network

The single most common failure mode in E2E tests is a timeout waiting for networkidle. Modern pages stream analytics, keep WebSocket connections alive, and run cookie scripts indefinitely. The condition "no in-flight network requests for 500 ms" is often never satisfied against a live production site.

The fix is not a longer timeout. The fix is to stop waiting for the network and start waiting for user-visible content ^[2]. Wait for the heading. Wait for the table header. Wait for the input placeholder. These are what the user actually sees, and they are what the test should assert exists before interacting.

This shift changes how you think about test stability. A test that breaks because a heading text changed is a test that detected a user-visible change — that is a real signal. A test that breaks because an analytics script loaded 50 ms slower than expected is noise. Eliminate the noise by anchoring every navigation wait to a stable DOM element the user would recognize.

// BAD — fragile against production, may never resolve
await page.goto("/audit-log");
await page.waitForLoadState("networkidle");
 
// GOOD — wait for content the user actually sees
await page.goto("/audit-log");
await expect(page.getByRole("heading", { name: /Audit/i })).toBeVisible();
await expect(page.locator("th").filter({ hasText: "Timestamp" })).toBeVisible();

3. Page objects own interaction; specs own assertion

The page object model is not controversial, but the boundary between a page object and a spec is. The litmus test: if a developer changes the page layout, which files should need updating? Only the page object — locators live there. If a developer changes a business rule, which files should need updating? Only the spec — assertions live there.

Page objects expose interaction methods (go to, filter by, click pagination) and return locators ^[3]^[4]. Specs call those methods and assert on the results. In practice, restricting page objects to a single navigation assertion (that the correct page loaded) keeps them readable. That assertion prevents the rest of the test from running against the wrong page.

When the boundary blurs — when page objects contain business-logic assertions or specs build their own locators — the test suite becomes harder to change than the application it tests. Keep the boundary sharp.

// page-objects/audit-log.ts — interactions only
export class AuditLogPage {
  constructor(private page: Page) {}
 
  async goTo() {
    await this.page.goto("/audit-log");
    await expect(this.page.locator("th").filter({ hasText: "Timestamp" }))
      .toBeVisible();
  }
 
  async filterByCategory(category: string) {
    await this.page.locator("select").selectOption(category);
  }
 
  getCategoryBadges() {
    return this.page.locator("tbody td span")
      .filter({ hasText: /^(AUTH|COMPLIANCE|SECURITY)$/ });
  }
}

// e2e/audit-log.spec.ts — assertions only
test("filtering by category shows only matching badges", async ({ page }) => {
  const auditLog = new AuditLogPage(page);
  await auditLog.goTo();
  await auditLog.filterByCategory("AUTH");
 
  const badges = auditLog.getCategoryBadges();
  const count = await badges.count();
  for (let i = 0; i < count; i++) {
    await expect(badges.nth(i)).toHaveText("AUTH");
  }
});

4. Locators are a communication channel

Every locator you write is a statement about how a user or assistive technology finds an element. A locator hierarchy encodes this:

getByRole with an accessible name says "this element has semantic meaning and a label."
getByPlaceholder says "this element is findable by its placeholder text."
getByTestId says "no accessible alternative exists here."
Raw CSS says nothing about the user's experience.

When a test breaks because a getByRole locator no longer matches, it often reveals an accessibility regression — an element lost its role or its accessible name. When a CSS-locator test breaks, it usually reveals nothing except that someone renamed a Tailwind class. The locator hierarchy is a forcing function ^[1]: it pushes you toward writing tests that catch meaningful breakage instead of cosmetic churn. If you find yourself reaching for getByTestId on a button or heading, that is usually a signal the element is missing a role or accessible label — fixing the accessibility issue often removes the need for a test ID entirely.

A practical corollary: always scope locators to the smallest containing region. A global chevron-right icon locator matches sidebar expand buttons, table sort indicators, and pagination controls all at once. Scoping to the pagination bar prevents the sidebar from silently becoming a dependency of a table test.

// BAD — matches sidebar expand buttons too
const nextBtn = page.locator("button:has(svg.lucide-chevron-right)");
 
// GOOD — scoped to the pagination bar
const paginationBar = page.locator("div.flex.items-center.justify-between")
  .filter({ hasText: /Page \d+ of \d+/ });
const nextBtn = paginationBar.locator("button:not([disabled])").last();

5. `Promise.all` is the correct primitive for navigation

When you click a button that triggers navigation, you are racing two things: the click event dispatch and the application's response. If you await the click and then wait for the response, the response may have already arrived. The test hangs on a promise that resolved before the listener was registered.

The correct pattern is Promise.all — race the action and the expected side effect together. This applies to URL changes after clicking a link, API responses after submitting a form, and any side-effect that triggers an asynchronous page transition.

// Register the response listener before dispatching the click,
// so we don’t miss the response
const [response] = await Promise.all([
  page.waitForResponse((r) => r.url().includes("/api/audit-log")),
  page.getByRole("button", { name: "Next page" }).click(),
]);
expect(response.status()).toBe(200);
 
// Wait for URL change during link navigation
await Promise.all([
  page.waitForURL("**/dashboard"),
  page.getByRole("link", { name: "Dashboard" }).click(),
]);

This is not a Playwright quirk. It is a fundamental property of asynchronous UIs ^[2]: you cannot observe an event you started listening for after the event fired. Promise.all ensures the listener is registered before the action dispatches.

6. Isolated tests are cheaper than debugging order dependencies

Every test should be runnable in isolation, in any order, with no shared mutable data. Playwright's storageState already isolates browser state. Data dependencies require more care.

A test that asserts a pagination button works but only succeeds when there are enough rows to paginate is a test with a hidden data dependency ^[4]. The fix is a conditional guard: if the pagination button is visible, click it and assert. If it is not, the test still needs something to affirm the page rendered correctly — an assertion that the data table itself is visible. This prevents the test from passing vacuously when the button is absent.

const nextBtn = paginationBar.locator("button:not([disabled])").last();
if (await nextBtn.isVisible()) {
  await nextBtn.click();
  await expect(page.getByText(/Page 2 of/)).toBeVisible();
} else {
  // Pagination not needed, but always assert the page rendered something
  await expect(page.getByRole("table")).toBeVisible();
}

The alternative — ensuring every test has the right data via a setup step — is tempting but expensive. Each setup call is another network request, another source of flakiness, another coupling between the test and the data layer. Conditional guards are cheaper and more resilient.

7. Scope E2E tests to user journeys, not coverage

E2E tests are expensive. They are slow to run, complex to debug, and sensitive to environmental issues that are invisible in lower-level tests. They should cover four types of checks per page and nothing more:

Does the page load without console errors?
Are the key interactive elements visible?
Does the primary data flow work (filter, paginate, submit)?
Is the page usable on a mobile viewport?

Everything else — input validation, error handling, edge cases, state transitions — belongs in unit and integration tests ^[5]^[6]. Those are faster, more precise, and cheaper to maintain.

This is the hardest discipline to hold. Every time a production bug slips through, the reflex is to add an E2E test for that exact scenario. Over time, this balloons the suite into something that takes 45 minutes to run and fails on three unrelated services every other run. A bug that slipped through unit and integration tests usually means the units and integrations were not tested well enough — fix those, not the E2E suite.

8. The CI whitelist is the gate

A suite of 30 specs running against production is not self-regulating. New specs introduce flakiness. Flaky specs erode trust in the entire suite. Once developers learn that a red build "always fails," they stop looking — and then a real regression ships.

The fix is a CI whitelist: only specs explicitly listed in a constant run in CI. Locally, every spec runs. Before a spec is added to the whitelist, it must pass against the target environment at least once, verified by a human. This is a manual gate, but it is the cheapest manual gate — the cost of verifying a spec once is orders of magnitude lower than the cost of a flaky spec in CI for a month.

// playwright.config.ts
const CI_SPECS = [
  "welcome.spec.ts",
  "audit-log.spec.ts",
  "compliance.spec.ts",
];
 
const config: PlaywrightTestConfig = {
  testMatch: process.env.CI ? CI_SPECS.map((s) => `e2e/${s}`) : "**/*.spec.ts",
  // ...
};

9. Test names are documentation

A test named test_filter_1 tells a future developer nothing. A test named filtering by category=AUTH shows only AUTH badges tells them exactly what broke, what was expected, and where to look. The describe block provides scope; the test name provides the assertion.

This is not pedantry. When a CI build fails at 03:00 and the on-call developer opens the test report, the test names are the first thing they read ^[8]. Each name should form a sentence that a teammate can understand without opening the spec file.

// GOOD
test.describe("Audit Log — filters", () => {
  test("filtering by category=AUTH shows only AUTH badges", async ({ page }) => {
    // ...
  });
 
  test("pagination controls are visible when results exceed one page", async ({ page }) => {
    // ...
  });
});
 
// BAD
test("test_filter_1", ...);
test("filters work", ...);

10. Trace debugging is the only debugging

When an E2E test fails in CI, the developer cannot attach a debugger or open the browser ^[9]. The trace viewer is the only tool available, and it must be usable. This means: never wrap Playwright actions in custom utility classes that hide what the trace would show.

// GOOD — direct, traceable. A trace shows the exact locator and action.
export class LoginPage {
  constructor(private page: Page) {}
 
  async login(email: string, password: string) {
    await this.page.getByPlaceholder("you@example.com").fill(email);
    await this.page.getByPlaceholder("Enter your password").fill(password);
    await this.page.getByRole("button", { name: "Sign in" }).click();
  }
}
 
// BAD — unnecessary indirection. A trace shows a wrapper call,
// not the actual fill/click targets.
export class UIActions {
  static async fill(selector: string, value: string) {
    await page.locator(selector).fill(value);
  }
}

Direct Playwright API calls produce traces that are self-documenting. Abstractions that hide the API produce traces that are opaque. Choose traceability over abstraction.

11. Soft assertions prevent debugging cycles

A test that checks five elements and fails on the first missing one requires five fix-and-retry cycles to confirm all elements are present. A test that uses expect.soft ^[10] for all five checks produces a single failure report listing every missing element. Repeated soft assertions are a smell that the test should be split.

test("renders all filter controls", async ({ page }) => {
  await goToAuditLog(page);
  await expect.soft(page.locator("select")).toBeVisible();
  await expect.soft(page.getByPlaceholder("Filter by entity...")).toBeVisible();
  await expect.soft(page.getByPlaceholder("Filter by actor...")).toBeVisible();
});

This is a small syntax change with an outsized impact on debugging efficiency. Reserve it for tests that perform multiple independent checks — page load verification, form field auditing, dashboard widget presence. Do not use it for sequential assertions where the second assertion depends on the first succeeding.

When choosing which locator type to use with soft assertions, prefer web-first assertions. expect.soft(page.getByRole("button", { name: "Submit" })).toBeVisible() uses Playwright's built-in retry and auto-waiting. expect.soft(await page.$(".btn-submit")).not.toBeNull() throws on timeout instead of failing softly. The former catches delayed renders; the latter does not.

12. Match the spec to the page, not to your memory

Every E2E spec must be written against the deployed page, not against the developer's mental model of it. Columns get renamed. Filters get redesigned. Pagination gets rewritten. A spec that asserts the old structure is worse than no spec — it fails every run and teaches the team to ignore it.

Before writing or updating a spec, visit the page in a browser. Use Playwright codegen to pick locators from the live DOM ^[7]. The spec you write should match what the page actually renders today, not what you think it rendered last week.

npx playwright codegen https://teampulse.report/audit-log

Validate against the target environment before adding the spec to CI:

E2E_BASE_URL=https://staging.teampulse.report \
  npx playwright test e2e/my-new-spec.spec.ts --headed

The meta-lesson

The thread connecting all twelve lessons is this: E2E tests are a conversation between the test suite and the application. When the application changes, the conversation changes. A well-structured suite makes that conversation clear — is this a real regression or a layout change? — fast — did the test fail because of a timeout or a bug? — and maintainable — can I fix the test by changing one file or five?

Every practice above — auth as infrastructure, content-based waiting, web-first assertions, scoped locators, Promise.all, isolated tests, CI whitelists, direct API calls, soft assertions — serves one goal: reduce the cost of maintaining the conversation so the team can keep having it.

The alternative is a suite that nobody runs because it takes too long, nobody trusts because it flakes, and nobody fixes because it is too tangled to change. That is not a testing strategy. That is a cautionary tale.

References

[1] Microsoft, "Playwright Best Practices," https://playwright.dev/docs/best-practices, accessed June 2026.

[2] Microsoft, "Playwright Navigations," https://playwright.dev/docs/navigations, accessed June 2026.

[3] Microsoft, "Playwright Issue #1604 — Page Object Model community consensus," https://github.com/microsoft/playwright/issues/1604, accessed June 2026.

[4] Selenium Project, "Page Object Models," https://www.selenium.dev/documentation/test_practices/encouraged/page_object_models/#assertions-in-page-objects, accessed June 2026.

[5] IBM, "End-to-End Testing Best Practices," https://www.ibm.com/think/insights/end-to-end-testing-best-practices, accessed June 2026.

[6] GitLab, "End-to-End Testing Best Practices," https://docs.gitlab.com/development/testing_guide/end_to_end/best_practices/, accessed June 2026.

[7] Playwright, "Playwright Codegen," https://playwright.dev/docs/codegen, accessed June 2026.

[8] Bunnyshell, "Best Practices for End-to-End Testing in 2026," https://www.bunnyshell.com/blog/best-practices-for-end-to-end-testing-in-2025/, accessed June 2026.

[9] Leapwork, "End-To-End Testing: 2026 Guide," https://leapwork.com/blog/end-to-end-testing/, accessed June 2026.

[10] Playwright, "expect.soft — Soft Assertions," https://playwright.dev/docs/api/class-test#test-expect-soft, accessed June 2026.