Performance & Scalability Design Essentials

You don’t need to build the next Netflix on day one. But you do need to avoid the obvious mistakes that make your app slow before you even launch.

Performance requirements matter upfront - not because you should optimize everything, but because some decisions are expensive to change later. Choosing a relational database when you need to handle millions of writes per second is a problem. Putting your CDN in the wrong region when all your users are on the other side of the world is a problem. Not thinking about these things at all is the biggest problem.

The good news: most performance issues follow predictable patterns. Fix the common ones during design, measure the rest, then optimize what actually matters.

Know Your Performance Requirements

Before you write a single line of code, answer these questions:

How fast does it need to respond?

API endpoints: Usually 200-500ms is acceptable
Page loads: Under 3 seconds for initial load
Interactive elements: Under 100ms or they feel sluggish
Background jobs: Depends entirely on the job

How many people will use it?

10 concurrent users? Don’t overthink it
1,000 concurrent users? You need caching
100,000 concurrent users? You need serious infrastructure planning

What’s the data volume?

1,000 records? Any database works
1 million records? Indexes matter
100 million records? Database design really matters

Write these numbers down. You’ll use them to make trade-off decisions. “Should I cache this?” becomes easier when you know you have 50 users, not 50,000.

A SaaS dashboard for 20 internal employees has different requirements than a public API serving mobile apps. One needs to work reliably. The other needs to work reliably and fast and scale.

Caching Strategy Basics

Caching means storing a copy of data somewhere faster than the original source. It’s the most effective performance improvement you can make.

There are four places to cache, from closest to the user to furthest:

Browser Cache Static assets like images, CSS, JavaScript files. Set HTTP cache headers and browsers store them locally. Your server never sees repeat requests.

Cache-Control: public, max-age=31536000, immutable

This tells browsers: “Keep this file for a year and never check if it changed.” Works great for versioned assets like app.v123.js.

CDN (Content Delivery Network) Geographic distribution of static files. User in Tokyo hits a Tokyo server. User in London hits a London server. Speed of light matters - round-trip time from Tokyo to New York is 200ms just from distance.

Put images, videos, CSS, JavaScript on a CDN. Most cloud providers offer this. Cloudflare, AWS CloudFront, Vercel Edge Network all work.

Application Cache Store database query results or expensive computations in memory. Redis and Memcached are common tools.

# Without cache: hits database every time
def get_user_profile(user_id):
    return db.query("SELECT * FROM users WHERE id = ?", user_id)

# With cache: hits database once, then serves from memory
def get_user_profile(user_id):
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)
    if cached:
        return cached

    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.set(cache_key, user, expiry=300)  # Cache for 5 minutes
    return user

Database queries that take 50ms drop to under 1ms from cache. That’s a 50x improvement.

Database Cache The database itself caches frequently accessed data in RAM. You usually don’t configure this directly - the database handles it. But knowing it exists helps you understand why the first query is slow and subsequent ones are fast.

What to Cache

Cache things that:

Don’t change often (user profiles, product catalogs)
Are expensive to compute (analytics dashboards, reports)
Are accessed frequently (homepage content, navigation menus)

Don’t cache things that:

Change constantly (real-time stock prices, live sports scores)
Are unique per request (personalized recommendations based on current context)
Are already fast (simple queries on indexed columns)

How Long to Cache

This is a trade-off between performance and freshness.

Static assets: Hours to days (or until you deploy new code)
User profiles: Minutes to hours
Product inventory: Seconds to minutes
Homepage content: Minutes
Search results: Seconds

When in doubt, start with 5 minutes. You can always adjust.

Cache Invalidation

The classic computer science joke: “There are only two hard things in computer science: cache invalidation and naming things.”

It’s hard because you need to know when data changed. The simple approach: set an expiration time and accept slightly stale data. The complex approach: invalidate the cache whenever the underlying data changes.

For most apps, expiration times work fine. User updates their profile? They can wait 5 minutes to see the change reflected everywhere.

Async Processing for Expensive Operations

Some operations are too slow to do during a web request:

Generating a 200-page PDF report
Processing a 4K video upload
Sending 10,000 personalized emails
Running complex data analysis

If these take 30 seconds and you do them synchronously, the user stares at a loading spinner for 30 seconds. Their browser might timeout. They might close the tab.

The solution: acknowledge the request immediately, do the work in the background, notify when done.

# Synchronous - user waits 30 seconds
@app.route('/api/generate-report', methods=['POST'])
def generate_report():
    report = expensive_report_generation()  # 30 seconds
    return report

# Asynchronous - user gets response immediately
@app.route('/api/generate-report', methods=['POST'])
def generate_report():
    job_id = queue.enqueue(expensive_report_generation)
    return {"status": "processing", "job_id": job_id}

The user gets a response in 100ms. A background worker picks up the job, processes it, and sends an email or in-app notification when done.

Background job systems: Celery, Sidekiq, Bull, Cloud Tasks, SQS. Pick one that works with your language and infrastructure.

When to use async processing:

Operation takes more than 2-3 seconds
Operation uses significant CPU or memory
Operation can fail and needs retry logic
User doesn’t need immediate results

When to keep it synchronous:

User needs results to continue (login, payment processing)
Operation is fast (under 500ms)
Operation is simple and unlikely to fail

CDN for Static Assets

Your application server might be in Virginia. Your user might be in Singapore. That’s 15,000 kilometers and 200-300ms of latency just from physics.

Static files - images, CSS, JavaScript, fonts, videos - don’t change per user. Serve them from a CDN and the Singapore user hits a Singapore server. Latency drops to 20ms.

<!-- Slow: serving from your application server -->
<img src="https://yourapp.com/images/logo.png">

<!-- Fast: serving from CDN -->
<img src="https://cdn.yourapp.com/images/logo.png">

Most cloud platforms include CDN as a checkbox option. Vercel, Netlify, AWS CloudFront, Cloudflare - all handle this automatically if you configure them.

The performance gain is dramatic. A page with 20 images might load in 8 seconds from your server. Same page from a CDN loads in 2 seconds.

Database Query Basics

Slow database queries kill performance. Two common problems catch everyone at least once.

The N+1 Problem

This is the classic mistake. You query for a list of items, then loop through and query for related data. One query for the list, N queries for each item. Hence “N+1.”

# Bad - N+1 queries
posts = db.query("SELECT * FROM posts LIMIT 10")  # 1 query
for post in posts:
    author = db.query("SELECT * FROM users WHERE id = ?", post.author_id)  # N queries
    print(f"{post.title} by {author.name}")
# Total: 11 queries for 10 posts

Each query takes 5ms. That’s 55ms total. Seems fine. But if you have 100 posts, it’s 505ms. With 1,000 posts, it’s 5 seconds.

The fix: eager loading. Get everything in one or two queries.

# Good - 2 queries
posts = db.query("SELECT * FROM posts LIMIT 10")  # 1 query
author_ids = [p.author_id for p in posts]
authors = db.query("SELECT * FROM users WHERE id IN (?)", author_ids)  # 1 query

author_map = {a.id: a for a in authors}
for post in posts:
    author = author_map[post.author_id]
    print(f"{post.title} by {author.name}")
# Total: 2 queries for 10 posts

Same result, 95% less database load. Most ORMs have built-in eager loading. Use it.

Missing Indexes

Databases search through every row without indexes. Fast for 100 rows. Slow for 10,000 rows. Painfully slow for 1 million rows.

-- Slow - scans entire table
SELECT * FROM users WHERE email = 'user@example.com';

-- Fast - uses index
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';

Without the index: 200ms. With the index: 2ms.

Index columns you search by, filter by, or join on. Don’t index everything - indexes slow down writes and take up space. But index the columns you actually query.

Common candidates:

Primary keys (usually indexed automatically)
Foreign keys (used in joins)
Email addresses (used in login)
Status fields (used in filtering)
Created dates (used in sorting)

When to Optimize

This might be the most important section.

Don’t optimize code you haven’t written yet. Don’t optimize code that’s already fast enough. Don’t optimize code you’re guessing is slow.

The process:

Build it
Measure it
Identify actual bottlenecks
Fix the slowest thing
Measure again

“Premature optimization is the root of all evil” is attributed to Donald Knuth. He’s right. Complex caching logic, convoluted async processing, denormalized database schemas - these make code harder to understand and maintain. Only add complexity when measurements prove you need it.

That said, there are free optimizations:

Use database indexes on columns you query
Serve static assets from a CDN
Cache responses that don’t change often
Don’t do expensive work synchronously if users can wait

These are cheap to implement and save headaches later.

But choosing a specialized database because “it’s faster” without measuring? Rewriting your API in a “faster language” without profiling? That’s premature optimization.

Your app is probably slow because:

Missing database indexes
N+1 queries
No caching
Static assets served from application server
Synchronous processing of slow operations

Fix those first. Then measure. You might be done.

Common Mistakes

Optimizing the wrong thing. You speed up a function from 5ms to 1ms. Great. But that function is called once per request and the request takes 500ms. You saved 0.8% of the total time.

Measure first. Fix the biggest bottleneck. A 200ms database query matters more than a 5ms function.

Over-caching. Caching adds complexity. Invalidation is hard. Stale data causes bugs. If something is already fast, don’t cache it.

Ignoring the N+1 problem. It’s fine with 10 records. It’s a disaster with 10,000. If you’re looping and querying, you probably have an N+1 problem.

No monitoring. You can’t improve what you don’t measure. Add basic performance monitoring from day one. Many platforms include this. Use it.

Assuming what’s slow. Developers are terrible at guessing where time is spent. I’ve seen teams rewrite entire modules to optimize something that took 2% of request time while ignoring a database query taking 60%.

Key Takeaways

Performance isn’t about building the fastest possible system. It’s about building a system that’s fast enough for your users and your scale.

Start with these practices:

Define acceptable response times before you build
Cache static assets and serve from a CDN
Use background jobs for slow operations
Index database columns you query
Watch for N+1 queries

Then measure real usage and optimize what actually matters.

Your first version doesn’t need to scale to a million users. It needs to work reliably for the users you actually have. When you grow, measure where it slows down, then optimize those specific bottlenecks.

Good enough today beats perfect someday. Ship it, measure it, improve it.