TCP connection limiting in Go: Blocking Accept vs Active Load Shedding
I recently reviewed a Go codebase that implemented connection limiting in a TCP proxy. The goal was to protect the backend from being overwhelmed by too many concurrent connections.
The initial implementation used a semaphore to gatekeep the Accept() call. While this seems intuitive, it relies on the kernel’s backlog behavior in ways that can cause significant operational issues.
TL;DR #
Don’t block before calling Accept(). If you do, the OS continues to complete TCP handshakes and queues connections in the kernel backlog. Clients believe they are connected, but your application is not processing them.
Instead, use Active Load Shedding: Accept() the connection immediately, check your limits, and if full, explicitly Close() the connection. This provides immediate feedback to the client. Prefer active load shedding with a higher hard cap as a safety valve; keep a buffer between them (e.g., ~20%+), sized by your FD/memory limits.
The “Blocking Accept” Pattern #
The pattern I encountered looked like this:
func (l *LimitedListener) Accept() (net.Conn, error) {
// 1. Wait for a permit
l.semaphore <- struct{}{}
// 2. Accept the connection
conn, err := l.listener.Accept()
if err != nil {
<-l.semaphore // release if accept failed
return nil, err
}
return &wrappedConn{conn, l.semaphore}, nil
}
The logic is straightforward: “Wait until we have capacity, then accept the next client.” It reads like cooperative concurrency, but it ignores the underlying TCP mechanics.
The Kernel Backlog #
The problem lies in what happens when Accept() is not called.
When your application stops calling Accept(), the operating system does not stop receiving connections. The kernel continues to complete the 3-way TCP handshake with clients. Once established, these connections are placed into the accept queue (backlog). This queue grows until it hits the limit defined by the listen(2) backlog argument.
In the blocking model, when the semaphore fills up, the application loop pauses. Meanwhile, new clients are still connecting. The kernel ACKs their SYN packets and queues them, expecting the application to pick them up shortly.
The Client Experience #
From the client’s perspective, the connection appears established.
- The TCP handshake is complete. The client starts sending data.
- Since the application hasn’t called
Accept(), it isn’t reading from the socket. The client’s writes fill the TCP receive window and then block. - Eventually, the client times out. This often takes significantly longer than a connection refusal.
This behavior masks the overload. Upstream load balancers may even consider the node healthy because the TCP port is open and accepting connections (into the kernel backlog), effectively black-holing traffic.
Active Load Shedding #
An alternative is to move the gatekeeping after the Accept() call.
The logic changes to:
- The application loop spins on
listener.Accept()as fast as possible. - Once you hold the
net.Conn, check the concurrency limit (e.g., with a non-blocking send to a buffered channel). - If over the limit,
Close()the connection immediately.
func (l *LimitedListener) Accept() (net.Conn, error) {
for {
// 1. Always Accept immediately
conn, err := l.listener.Accept()
if err != nil {
return nil, err
}
// 2. Check limits non-blocking
select {
case l.semaphore <- struct{}{}:
// We have capacity, proceed
return &wrappedConn{conn, l.semaphore}, nil
default:
// 3. Overload! Shed the load.
conn.Close()
metrics.Inc("connection_shed")
continue
}
}
}
Why this is better
- Immediate feedback as the client receives a
FIN(orRSTdepending on how you close) immediately. There is no hanging. - The kernel backlog stays empty, ensuring that valid requests are processed with minimal latency when capacity frees up.
- You can count exactly how many connections were rejected. In the blocking model, the rejected connections are invisible to the application until they show up in OS-level packet counters.
Visualization
It’s gemini 3 release day, so we can even visualize this. Note how the kernel backlog grows and the clients eventually time out (as you balance the traffic load against server limit).
How modern proxies handle this #
It is worth noting that mature proxies like Nginx and HAProxy often use both patterns, but for different purposes.
Process protection with Blocking Accept
They use a global limit to protect the proxy process itself from running out of file descriptors or memory. When this limit is reached, they stop calling Accept(), causing the kernel backlog to fill.
- Nginx:
worker_connectionsacts as this hard limit. - HAProxy: The
maxconnsetting defines this process-wide limit.
Traffic control with Active Load Shedding
For controlling traffic to backends or limiting specific clients, they use active strategies.
- Nginx: The
limit_connmodule accepts the connection, checks the limit, and then actively closes it (or returns 503 for HTTP) if the limit is exceeded. - Traefik: The
InFlightConnmiddleware functions as a gatekeeper after the connection is accepted, closing it immediately if the limit is reached.
The key takeaway is that “Blocking Accept” should be a last-resort safety valve for the process itself, not the primary mechanism for shaping traffic or protecting backends.
Soft vs Hard Limits (in practice) #
For production systems, you often need both strategies working in tandem.
- Soft limit (active load shedding) is your primary traffic control. When the limit is reached (e.g., 10k active connections), you
Accept()and immediatelyClose()(or return HTTP 429). The client gets immediate feedback and can retry. - Hard limits as a safety valve set higher than the soft limit. If the soft limit logic itself is overwhelmed or if you are running out of file descriptors, this limit stops the
Accept()loop. This causes the kernel backlog to fill, protecting the process from crashing.
Keep a buffer between the soft and hard limits. A good rule of thumb is to set the hard limit at least 20-25% higher than the soft limit. This buffer allows the application to accept and rapidly close excess connections without triggering the hard limit stall. Ensure your hard limit is capped by your system’s ulimit -n (minus reservations for logs/backend connections).
Implementation sketch #
func (l *HybridListener) Accept() (net.Conn, error) {
for {
// 1. HARD LIMIT (Blocking)
// Protects the process from running out of FDs.
// Blocks here if hard limit is reached -> Kernel backlog fills.
l.hardLimit <- struct{}{}
conn, err := l.listener.Accept()
if err != nil {
<-l.hardLimit
return nil, err
}
// 2. SOFT LIMIT (Non-blocking)
// Traffic shaping. Fail fast if over business limit.
select {
case l.softLimit <- struct{}{}:
// Success: tracked by both limits
return &wrappedConn{conn, l.softLimit, l.hardLimit}, nil
default:
// Soft limit full: Reject immediately
conn.Close()
<-l.hardLimit // Release hard limit reservation
metrics.Inc("connection_shed")
continue // Loop back to accept next
}
}
}
Infrastructure Tuning #
- Ensure your LB health check endpoint is not affected by the soft connection limits. It should live on a separate port or be whitelisted; otherwise, a soft limit rejection might cause the LB to mark the node as unhealthy, cascading the failure. Hard limits are a bit trickier, maybe you actually want to put an hard-overloaded instance out of the LB?
- Increase
ulimit -nandfs.file-maxabove your hard limit. - Increase
net.core.somaxconnandnet.ipv4.tcp_max_syn_backlogto absorb bursts when the hard limit is briefly hit. - Since active shedding creates many sockets in
TIME_WAIT, enablenet.ipv4.tcp_tw_reuseand reducenet.ipv4.tcp_fin_timeoutto free up resources faster. - Maximize ephemeral ports
net.ipv4.ip_local_port_range(e.g., 1024-65535).
Summary #
In network programming, relying on the kernel queue for backpressure is rarely the right choice for user-facing services. Accept the work, assess capacity, and if full, explicitly reject the connection.
Resources #
- SYN packet handling in the wild (Cloudflare) - A deep dive into the mechanics of Linux TCP queues.
- Using load shedding to avoid overload (AWS Builders Library) - Broader patterns for protecting services from overload.
- Building Blocks of TCP (High Performance Browser Networking) - Essential reading for understanding TCP handshakes and queues.