← Back to writing

Backend engineering

How moving work off the request path cut P99 latency by 25%

One of the more useful backend problems I worked on at SalesUp was not a broken endpoint so much as a crowded one. Some request paths were trying to validate input, persist state, kick off campaign-related automation, and wait long enough for the slower work to start behaving like part of the response itself. The result was predictable: the median path still looked acceptable, but tail latency kept widening under real traffic.

This was the moment the problem became architectural for me. The request path was owning work that did not need to belong to the response cycle.

The problem setup

The slowest paths were the ones that mixed user-facing API behavior with heavier campaign-generation and follow-up automation tasks. Those operations were valid work, but they were poor critical-path work. When they stayed inline, they directly competed with traffic that only needed a fast acknowledgement and a stable state transition.

Request path diagram

Before validate request write state start campaign and follow-up automation inline hold the response open while slow work begins
After validate request persist the required state enqueue slower automation work return a stable response immediately

The actual improvement came from shrinking the responsibility of the request path, not from a local code micro-optimization.

What I tried first

The first instinct was the usual one: look for expensive steps inside the handler and make them cheaper. That mindset helped in small ways, but it did not change the real shape of the latency problem. As long as the request path still owned slow automation work, cleaner code alone was not going to remove the long tail.

The design change that mattered

The useful shift was to treat the API as a coordinator instead of a finisher. The handler should validate the request, persist the right state, enqueue the slow work, and return a stable response. That sounds obvious in retrospect, but it changed retries, failure handling, and capacity planning in a much cleaner way than squeezing a few milliseconds out of one function.

In practice, the tasks that moved out of the synchronous path were slower campaign-generation and follow-up automation jobs. They still ran, but they no longer blocked the user-facing response.

// before
POST /campaign
  validate(request)
  writeCampaignState()
  generateCampaignContent()
  triggerFollowUpAutomation()
  return response

// after
POST /campaign
  validate(request)
  writeCampaignState()
  enqueue("generate_campaign")
  enqueue("follow_up_automation")
  return accepted_response

What the numbers said

Signal Observed change Why it mattered
Throughput +40% Removing heavier automation work from the synchronous path let the service absorb more traffic without the same contention.
P99 latency -25% The long tail improved once slow background work stopped competing directly with response-time-critical requests.
Response behavior faster acknowledgement, slower work handled out-of-band The API became easier to reason about because the request path no longer tried to complete unrelated automation before replying.

What still needed care

Moving work off the request path does not remove complexity. It moves it. Once the slower jobs lived in a background path, retries, queue behavior, and failure visibility mattered more. That tradeoff was worth it, but it reinforced the point that architecture decisions only move pain around if you do not also make the new boundary explicit.

Transferable principle

When an endpoint has a widening tail, the first question I ask now is not “which line is slow?” It is “what work on this path does not actually belong to the response?”

Related experience SalesUp backend work The production context behind the throughput and latency changes on this post. fastapi redis Project context Selected systems work A broader look at the backend and ML systems projects that sit next to this work. systems backend Next writing Designing a reviewable deadline extraction pipeline Another systems note, this time about gating, output contracts, and review boundaries. mlops pipeline