Understanding Envoy Proxy's Hot Restart Implementation
Joshua Varghese
Posted on November 9, 2024
As modern distributed systems grow in complexity, the ability to update proxy configurations without dropping active connections has become crucial. In this post, I'll break down how Envoy Proxy implements its hot restart mechanism, a feature that allows seamless configuration updates and binary upgrades without disrupting existing connections.
What is Hot Restart?
Hot restart (or hot reload) is a mechanism that allows a proxy server to reload its configuration or upgrade its binary while maintaining existing client connections. This is achieved by having the new process take over the listening sockets and existing connections from the old process, ensuring zero connection drops during the transition.
Envoy's Approach
Envoy implements hot restart through a parent-child process model, where the parent process manages the handover of socket descriptors to the new child process. Here's how it works:
- Shared Memory Architecture Envoy uses shared memory to facilitate communication between the old and new processes. This is implemented in the HotRestartImpl class:
class HotRestartImpl {
private:
static constexpr uint64_t MAX_STAT_SEGMENTS = 256;
SharedMemory* shmem_;
Stats::StatDataAllocator* stats_allocator_;
};
- Socket Passing Process The hot restart process follows these key steps:
Initialize Shared Memory: The parent process creates a shared memory segment that both processes can access.
Socket Duplication: The parent process duplicates its listening sockets.
Graceful Handover: Traffic is gradually transferred to the new process.
Here's a simplified version of how Envoy handles socket passing:
class HotRestartingChild {
public:
void initialize(int argc, char** argv) {
// Request parent's listen sockets
std::vector<int> fds = parent_.retrieveListenSockets();
// Initialize new server with inherited sockets
for (int fd : fds) {
Server::createListenerFromSocket(fd);
}
// Signal ready to parent
parent_.sendReady();
}
};
- State Transfer One of the most critical aspects is transferring the state of existing connections:
void HotRestartImpl::drainListeners() {
// 1. Stop accepting new connections
for (auto& listener : listeners_) {
listener->stopAcceptingConnections();
}
// 2. Wait for existing connections to complete
while (hasActiveConnections()) {
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
// 3. Signal completion to new process
notifyNewProcess();
}
Key Implementation Challenges
- File Descriptor Handling Envoy needs to carefully manage file descriptors to ensure they're properly transferred and not leaked:
Uses SCM_RIGHTS to pass file descriptors between processes
Maintains a registry of active file descriptors
Implements careful cleanup mechanisms
- Connection State Management The proxy must maintain connection state during the transition:
TCP connection parameters
TLS session information
Protocol-specific state (HTTP/2 streams, etc.)
- Configuration Compatibility Envoy ensures that configuration changes are compatible with existing connections:
bool HotRestartImpl::validateConfig(
const envoy::config::bootstrap::v3::Bootstrap& new_config) {
// Verify that critical fields haven't changed
// Check listener compatibility
// Validate cluster configurations
return isCompatible;
}
Conclusion
Diving into Envoy's hot restart implementation has been quite the journey! It's fascinating to see how they've tackled the challenge of swapping out a running proxy without dropping connections. The elegant dance between parent and child processes, the careful handling of file descriptors, and the intricate state management all come together to make this possible.
What really stands out is how much thought went into making the system robust. It's not just about passing sockets around – it's about handling edge cases, ensuring configuration compatibility, and providing fallback options when things don't go as planned.
Note: This is a high-level overview based on Envoy's open-source implementation. For the most up-to-date and detailed information, please refer to the official Envoy documentation and source.[https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/arch_overview]
Posted on November 9, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.