Scaling Your Web Infrastructure: When and How to Do It Right

Stratpace Team18 January 2026 7 min read

Scaling by symptom, not by visitor count

Most "when to scale your infrastructure" advice frames the decision around traffic. "Once you hit ten thousand monthly visitors, do X. Once you hit a hundred thousand, do Y." This is a comforting structure and it's mostly nonsense. Two sites with the same monthly traffic can have wildly different infrastructure needs depending on their workload shape, peak-to-average ratio, and database access pattern.

A more useful frame: scale when the symptoms tell you to. The symptoms are specific and observable, and they show up in your monitoring before they show up in your support inbox. What follows is the version of the staging that actually maps to operational reality.

What serverless gives you for free at the early stages

If you're starting in 2026 on Vercel, Cloudflare, Netlify, or AWS Lambda fronted by API Gateway, a number of scaling problems are quietly solved for you on day one.

You don't manage server capacity. The platform spins up instances on demand and scales them to zero when they're idle. Cold starts on modern serverless runtimes (especially edge runtimes) are well under a second for most workloads, and warm invocations are in the low double-digit milliseconds.

You get a global CDN for static assets without configuring one. The build output is replicated to edge nodes, and the cache hit ratio for assets is close to 100 percent without any work on your part.

You get TLS, HTTP/2, HTTP/3, automatic deploy previews per branch, and DDoS mitigation in front of your origin. Each of these is a few days of work to set up correctly on a self-hosted stack, and you would never have got around to all of them.

For a small or mid-size site with reasonable traffic, this is the answer for years. The architecture is doing the scaling for you, and your job is to write code that doesn't fight it.

What serverless doesn't give you

The thing serverless cannot do for you is open a database connection pool. Each invocation is, conceptually, a fresh process that wants its own connection. At low traffic this is fine. At higher concurrency, your Postgres instance runs out of connections, and the symptom is that requests start failing with "too many clients already" before any other resource is exhausted.

This is the most common scaling problem we see on sites that have outgrown their starting setup. The fix isn't a bigger Postgres. It's a connection pooler.

PgBouncer in transaction-pooling mode sits between your application and Postgres, multiplexing many short-lived application connections onto a small pool of real database connections. Most managed Postgres providers (Neon, Supabase, RDS Proxy, the others) ship a pooler URL alongside the direct one. Use the pooler URL for serverless workloads. The cost is some loss of features that depend on session state (advisory locks, prepared statement caching), and almost everyone can live with that.

Once the pooler is in place, the next class of database scaling problem appears: read load on the primary. Which brings us to the next stage.

Symptoms and the moves they imply

Useful map of the early- to mid-stage symptoms and what to do about each one.

Symptom: p95 response time creeps from 200 milliseconds to 600 milliseconds over a few weeks, no obvious traffic change. Almost always database. Open the slow-query log; you'll find a query whose plan changed because its table grew past a threshold and the planner now prefers a sequential scan. Add or fix the index. This is more often the cause than the framework, the runtime, or the network.

Symptom: response times spike under a specific kind of load (a marketing email goes out and the site slows down for ten minutes). You're hitting database connection pressure or you're CPU-bound on a hot endpoint. Connection pressure shows in the pooler metrics; CPU pressure shows in your APM. Either way, the fix isn't more origin instances; it's reducing the work each request does. Cache the hot read, defer the non-essential write, and look at what's running synchronously that could run later.

Symptom: database CPU sustained over 60 percent during business hours. Time to add a read replica. Route read-heavy traffic (the homepage, the catalogue pages, the marketing pages) to the replica and keep writes on the primary. Most managed Postgres providers make this a one-click operation in 2026. The application change is small; you point your read pool at a different connection string. Be careful about read-after-write expectations: replicas have a small lag, and code that writes a row and immediately reads it on a request right after will sometimes miss it.

Symptom: latency for users in regions far from your origin is two to three times worse than for nearby users. If you're already on Vercel or Cloudflare, your static and edge-rendered routes are already fast everywhere; the problem is your dynamic origin region. Either move dynamic routes to the edge runtime where you can, or add edge caching with revalidation for routes whose data doesn't change every second.

Symptom: an outage you didn't see coming. Almost always one of three things. A third-party service you depend on synchronously went down (Stripe, your auth provider, an analytics endpoint blocking the page). A deploy introduced a regression that only appears under load. Or a database operation locked a critical table for longer than the application could tolerate. The mitigation in all three cases is the same: don't make synchronous calls to third parties from the request path, deploy behind feature flags so regressions can be reverted without a redeploy, and watch your long-running queries.

The pattern: scale the part the symptoms point at. Don't pre-scale on faith.

Monitoring you actually need

Three layers, none optional once you're past hobby traffic.

Real User Monitoring. The web-vitals library, or Vercel Analytics, or whatever your platform offers. This is the only data that tells you what your users actually experience. Synthetic data (Lighthouse, k6 from a single location) is useful for catching regressions; it isn't a substitute.

Application performance monitoring. Sentry, Honeycomb, Datadog, the OpenTelemetry stack. Per-request traces, error rates, slow endpoints. The thing you need it for, more than anything else, is the slow query that only appears for one user out of a thousand and becomes the thing that wakes you up at 2 am.

Database metrics. Connection counts, pool utilisation, query latency percentiles, lock waits, CPU, IOPS. Most managed Postgres providers expose these. Look at them weekly even when nothing's wrong, so the abnormal reading is obvious when it appears.

The shape that works best is alerts on rates of change, not on absolute thresholds. "p95 latency is currently 400 milliseconds" might be normal. "p95 latency has doubled in the last hour" is always interesting.

A few decisions that age well

Keep the database close to the application. Postgres in eu-west-2 and an application in us-east-1 will hurt at any scale.

Don't put third-party scripts on the critical render path. Every analytics tag, chat widget, or cookie banner loaded synchronously is a vendor with a hand on your error budget. Load them after first paint, with priority="low", or not at all.

Pick boring storage. Postgres handles more workload than people give it credit for, and the cases where you genuinely need a different store are real but rare. If you don't have one of those workloads, Postgres is the answer.

Push slow work onto a queue. Sending emails, calling third-party APIs, generating PDFs: all belong on a queue, not on the request path.

The summary

Scale on symptoms, not on visitor counts. Serverless absorbs most of the early scaling problems for you, in exchange for one specific cost (database connection management) that you pay with a pooler. After that, the moves are the ones the metrics point at: a missing index, a read replica, an edge cache, a synchronous third-party call moved off the critical path. RUM, APM, and database metrics are not optional. The best time to put them in is before the first incident; the second best time is right now.

ScalingCDNEdge ComputingServerlessInfrastructure

Share:𝕏

Content Strategy for AI-Driven Search: Getting Cited by ChatGPT

AI Integration in Web Development: Practical Use Cases That Drive Revenue