Overview
This article analyzes the source code of the AI Agent bus gateway agentgateway and attempts to explain the components and collaboration related to service lifecycle management. From the startup and initialization of each service component, port listening, to how service termination signals are propagated between components, and the implementation related to graceful service shutdown (Drain).
Due to my limited understanding of Rust, especially the asynchronous programming style of tokio, please point out any errors or omissions.
AI Agent Bus Gateway agentgateway Implementation Analysis Series:
- AI Agent Bus Gateway agentgateway Implementation Analysis Part 1
- AI Agent Bus Gateway agentgateway Implementation Analysis Part 2 (this article)
It is recommended to read in order.
Introduction
If you were asked to develop a Reverse Http Proxy, what do you think would be the most difficult and complex design? The thread model? Protocol decoding? Flow control buffer? I think it’s how to gracefully shut down (Drain) the service, and even continuously hot-upgrade connections.
Just as most of an application’s code is actually spent on exception handling, a lot of a Proxy’s code is actually spent on graceful shutdown (Drain). And understanding the service’s:
- Core components or sub-services
- Initialization process of components or sub-services
- Graceful shutdown (Drain) of components or sub-services
These points are the key to understanding the service architecture, that is, the service lifecycle. Below, we will analyze these aspects.
Service Lifecycle
agentgateway has several major sub-services:
- admin service
- metrics server
- readiness service
- Gateway service
Among them, the last two run in the worker thread pool, and the others run on the main thread. The one that really does the core load is, of course, the Gateway Service.
When stopping the service, it is of course necessary to notify these sub-services. agentgateway makes extensive use of Rust’s channel mechanism to achieve communication between asynchronous futures.
app.rs > Bound::wait_termination(self)
is the service stop entry point and is also responsible for notifying each sub-service.- After it receives the operating system
SIGTERM
signal - Trigger the graceful shutdown (Drain) process:
start_drain_and_wait()
sends a message toSignal(DrainTrigger)
. - Each sub-service listens to
Watch(DrainWatcher)
and executes the Drain operation after receiving the message fromSignal(DrainTrigger)
: such as HTTP connection closing notification, etc. - After the Drain operation is completed, it will feedback to
Signal(DrainTrigger)
- Finally, after all sub-services have fed back, the overall service can be stopped.
The following is a detailed description of this process:
Figure: agentgateway component service lifecycle collaboration
The format of the above picture is incorrect, please click here to open it with Draw.io
Conclusion
I have studied the C++ code of Envoy Proxy in depth. It can be said that it heavily uses OOP and polymorphism-based design methods, and event-driven Callback-based component subsystem decoupling methods. This makes the code look rather long, with many concepts and terms, and sometimes it feels a bit like Java.
And agentgateway or its reference design project Istio ztunnel (these two projects have the common company Solo.io and the common developer John Howard) use Tokio + Rust async, which is a bit like Golang’s goroutines. There is a comparison on Reddit: How Tokio works vs go-routines?.
From the perspective of code reading alone, the simple, pragmatic, and unpretentious style of Tokio + Rust async seems to be easier to get started with. Of course, the premise is to understand the basics of Rust async + Tokio.
Why study agentgateway
This is a good question. I believe that AI applications have arrived, and the infrastructure for AI applications also needs to be improved. As a traditional infrastructure programmer (not an application programmer), instead of waiting for AI to replace my job, it is better to let AI rely on my work first, and then hope that everyone can coexist peacefully. And gateway-type infrastructure is destined to be indispensable in the AI governance of large enterprises.
Finally, I would like to say that I have been unemployed as a middle-aged programmer for a while, and I have faced a lot of unanswered messages in the process of looking for a job. If you think there is any position that can accommodate an old man like me who still has some enthusiasm, you may wish to do a good deed. After all, everyone gets old, thank you. Infinite merit 🙏