TCP connection limiting in Go: Blocking Accept vs Active Load Shedding

I recently reviewed a Go codebase that implemented connection limiting in a TCP proxy. The goal was to protect the backend from being overwhelmed by too many concurrent connections.

The initial implementation used a semaphore to gatekeep the Accept() call. While this seems intuitive, it relies on the kernel’s backlog behavior in ways that can cause significant operational issues.

TL;DR #

Don’t block before calling Accept(). If you do, the OS continues to complete TCP handshakes and queues connections in the kernel backlog. Clients believe they are connected, but your application is not processing them.

Instead, use Active Load Shedding: Accept() the connection immediately, check your limits, and if full, explicitly Close() the connection. This provides immediate feedback to the client. Prefer active load shedding with a higher hard cap as a safety valve; keep a buffer between them (e.g., ~20%+), sized by your FD/memory limits.

The “Blocking Accept” Pattern #

The pattern I encountered looked like this:

func (l *LimitedListener) Accept() (net.Conn, error) {
    // 1. Wait for a permit
    l.semaphore <- struct{}{} 
    
    // 2. Accept the connection
    conn, err := l.listener.Accept()
    if err != nil {
        <-l.semaphore // release if accept failed
        return nil, err
    }
    
    return &wrappedConn{conn, l.semaphore}, nil
}

The logic is straightforward: “Wait until we have capacity, then accept the next client.” It reads like cooperative concurrency, but it ignores the underlying TCP mechanics.

The Kernel Backlog #

The problem lies in what happens when Accept() is not called.

When your application stops calling Accept(), the operating system does not stop receiving connections. The kernel continues to complete the 3-way TCP handshake with clients. Once established, these connections are placed into the accept queue (backlog). This queue grows until it hits the limit defined by the listen(2) backlog argument.

In the blocking model, when the semaphore fills up, the application loop pauses. Meanwhile, new clients are still connecting. The kernel ACKs their SYN packets and queues them, expecting the application to pick them up shortly.

The Client Experience #

From the client’s perspective, the connection appears established.

  1. The TCP handshake is complete. The client starts sending data.
  2. Since the application hasn’t called Accept(), it isn’t reading from the socket. The client’s writes fill the TCP receive window and then block.
  3. Eventually, the client times out. This often takes significantly longer than a connection refusal.

This behavior masks the overload. Upstream load balancers may even consider the node healthy because the TCP port is open and accepting connections (into the kernel backlog), effectively black-holing traffic.

Active Load Shedding #

An alternative is to move the gatekeeping after the Accept() call.

The logic changes to:

  1. The application loop spins on listener.Accept() as fast as possible.
  2. Once you hold the net.Conn, check the concurrency limit (e.g., with a non-blocking send to a buffered channel).
  3. If over the limit, Close() the connection immediately.
func (l *LimitedListener) Accept() (net.Conn, error) {
    for {
        // 1. Always Accept immediately
        conn, err := l.listener.Accept()
        if err != nil {
            return nil, err
        }

        // 2. Check limits non-blocking
        select {
        case l.semaphore <- struct{}{}:
            // We have capacity, proceed
            return &wrappedConn{conn, l.semaphore}, nil
        default:
            // 3. Overload! Shed the load.
            conn.Close()
            metrics.Inc("connection_shed")
            continue
        }
    }
}

Why this is better

  • Immediate feedback as the client receives a FIN (or RST depending on how you close) immediately. There is no hanging.
  • The kernel backlog stays empty, ensuring that valid requests are processed with minimal latency when capacity frees up.
  • You can count exactly how many connections were rejected. In the blocking model, the rejected connections are invisible to the application until they show up in OS-level packet counters.

Visualization

It’s gemini 3 release day, so we can even visualize this. Note how the kernel backlog grows and the clients eventually time out (as you balance the traffic load against server limit).

How modern proxies handle this #

It is worth noting that mature proxies like Nginx and HAProxy often use both patterns, but for different purposes.

Process protection with Blocking Accept

They use a global limit to protect the proxy process itself from running out of file descriptors or memory. When this limit is reached, they stop calling Accept(), causing the kernel backlog to fill.

Traffic control with Active Load Shedding

For controlling traffic to backends or limiting specific clients, they use active strategies.

  • Nginx: The limit_conn module accepts the connection, checks the limit, and then actively closes it (or returns 503 for HTTP) if the limit is exceeded.
  • Traefik: The InFlightConn middleware functions as a gatekeeper after the connection is accepted, closing it immediately if the limit is reached.

The key takeaway is that “Blocking Accept” should be a last-resort safety valve for the process itself, not the primary mechanism for shaping traffic or protecting backends.

Soft vs Hard Limits (in practice) #

For production systems, you often need both strategies working in tandem.

  1. Soft limit (active load shedding) is your primary traffic control. When the limit is reached (e.g., 10k active connections), you Accept() and immediately Close() (or return HTTP 429). The client gets immediate feedback and can retry.
  2. Hard limits as a safety valve set higher than the soft limit. If the soft limit logic itself is overwhelmed or if you are running out of file descriptors, this limit stops the Accept() loop. This causes the kernel backlog to fill, protecting the process from crashing.

Keep a buffer between the soft and hard limits. A good rule of thumb is to set the hard limit at least 20-25% higher than the soft limit. This buffer allows the application to accept and rapidly close excess connections without triggering the hard limit stall. Ensure your hard limit is capped by your system’s ulimit -n (minus reservations for logs/backend connections).

Implementation sketch #

func (l *HybridListener) Accept() (net.Conn, error) {
    for {
        // 1. HARD LIMIT (Blocking)
        // Protects the process from running out of FDs.
        // Blocks here if hard limit is reached -> Kernel backlog fills.
        l.hardLimit <- struct{}{}

        conn, err := l.listener.Accept()
        if err != nil {
            <-l.hardLimit
            return nil, err
        }

        // 2. SOFT LIMIT (Non-blocking)
        // Traffic shaping. Fail fast if over business limit.
        select {
        case l.softLimit <- struct{}{}:
            // Success: tracked by both limits
            return &wrappedConn{conn, l.softLimit, l.hardLimit}, nil
        default:
            // Soft limit full: Reject immediately
            conn.Close()
            <-l.hardLimit // Release hard limit reservation
            metrics.Inc("connection_shed")
            continue // Loop back to accept next
        }
    }
}

Infrastructure Tuning #

  • Ensure your LB health check endpoint is not affected by the soft connection limits. It should live on a separate port or be whitelisted; otherwise, a soft limit rejection might cause the LB to mark the node as unhealthy, cascading the failure. Hard limits are a bit trickier, maybe you actually want to put an hard-overloaded instance out of the LB?
  • Increase ulimit -n and fs.file-max above your hard limit.
  • Increase net.core.somaxconn and net.ipv4.tcp_max_syn_backlog to absorb bursts when the hard limit is briefly hit.
  • Since active shedding creates many sockets in TIME_WAIT, enable net.ipv4.tcp_tw_reuse and reduce net.ipv4.tcp_fin_timeout to free up resources faster.
  • Maximize ephemeral ports net.ipv4.ip_local_port_range (e.g., 1024-65535).

Summary #

In network programming, relying on the kernel queue for backpressure is rarely the right choice for user-facing services. Accept the work, assess capacity, and if full, explicitly reject the connection.

Resources #

  1. SYN packet handling in the wild (Cloudflare) - A deep dive into the mechanics of Linux TCP queues.
  2. Using load shedding to avoid overload (AWS Builders Library) - Broader patterns for protecting services from overload.
  3. Building Blocks of TCP (High Performance Browser Networking) - Essential reading for understanding TCP handshakes and queues.