๐ Excerpt From:
This article is excerpted from the Flow Control section in an open source book I’m writing: Istio & Envoy Insider.
If the figures in this article not clear, you can go back to the original book.
My book: Istio & Envoy Insider includes: Envoy source code deep dive, in-depth Envoy fundamentals analysis , Istio fundamentals analysis. But it’s not a traditional “deep dive xyz source code” type of book. on the contrary, I have done my best not to directly paste source code in the book. Reading source code is a necessary step to grasp the details of the implementation, but browsing source code in a book is generally a very bad experience. So, this book uses source code navigation diagrams to let readers understand the full picture of the implementation, rather than getting lost in the details of fragmented source code snippets and forgetting the whole picture.
Flow Control
As any http proxy software, Envoy takes flow control very seriously. Because CPU/memory resources are limited, it is also important to avoid situations where a single flow can take up too much resources. It is important to note that, as with any software implemented in an asynchronous/threaded multiplexed architecture, flow control is never a simple task.
If someone asked me what was the hardest part of learning the Envoy implementation? My answer must be the flow control part. And there is very little information about it on the web. Or there are readers ask, so difficult, why study, this study has any value? In my opinion, this study has at least the following values:
- Envoy as an important part of the business traffic must pass through, can not be wrong. Its memory usage should be understood when we do service resource evaluation, so that we can evaluate it scientifically.
- Understanding the behavior of Envoy and service degradation when traffic is overrun can be a good precaution.
- because flow control involves all participants in the data flow path, the process of research itself is the process of understanding the relationship of Envoy flow components.
It should be notice that the “flow control” in this section does not mean that we generally do microservice APIs, control API TPS to prevent the service from crashing in the high-frequency API calls to protect the service from such overload. It’s more of a backpressure
based protection to prevent a single connection/http2 stream from using too much memory buffer when the Envoy is processing a data stream such as request body/response body.
Envoy has an Envoy Flow Control document that describes some of these details. In this section, I document the results of some of my study research based on this, but also added a lot of my interpretation.
Traffic control in Envoy is accomplished by limiting each Buffer with watermark callbacks
. When a Buffer contains more data than the configured limit, a high watermark callback
is triggered, which triggers a series of events that eventually notify the data source to stop sending data. This suppression may be immediate (e.g., stopping reads from sockets) or gradual (e.g., stopping HTTP/2 window updates), so all Buffer limits in the Envoy are considered soft limits
.
When the Buffer is finally processed (drains
) (usually halfway to the high water mark to avoid jittering back and forth), a low water mark callback is triggered to notify the sender that it can resume sending data.
The following is a simple TCP implementation detailing the flow control process, followed by a more complex HTTP2 flow control process.
Some flow control terms
back up
- A situation in which data is congested in one or more intermediate buffers due to slow or poor traffic flow to the destination, resulting in the buffer running out of space.buffers fill up
- the cache space reaches the upper limit.backpressure
- Stream backpressure is a feedback mechanism that allows the system to respond to requests rather than crashing under load when processing capacity is exceeded. This occurs when the rate of incoming data exceeds the rate of processing or outputting data, leading to congestion and potential data loss. For more details, see:Backpressure explained - the resisted flow of data through softwaredrained
- The emptying of a Buffer. Generally refers to the processing and draining of a buffer from above the low watermark, down to below the low watermark after consumption, or even empty.HTTP/2 window
- The HTTP/2 standard implementation of flow control that indicates, via theWINDOW_UPDATE
frame, the number of octets the sender may transmit in addition to the existing flow control window. See “Hypertext Transfer Protocol Version 2 (HTTP/2) - 5.2. Flow Control for details. "http stream
- The HTTP/2 standard for streams. For details, see “Hypertext Transfer Protocol Version 2 (HTTP/2) - 5. Streams and Multiplexing”- High/Low Watermark - High and low watermark design patterns for controlling memory or buffer consumption but not wanting to trigger control operations with frequent high-frequency jitter, see “What are high and low water marks in bit streaming” for details.
TCP flow control implementation
Flow control for TCP and TLS endpoints
is handled through the coordination between the Network::ConnectionImpl
Write Buffer and the Network::TcpProxy
Filter.
The flow control for Downstream
is as follows.
- Downstream
Network::ConnectionImpl::write_buffer_
buffers too much data. It callsNetwork::ConnectionCallbacks::onAboveWriteBufferHighWatermark()
. Network::TcpProxy::DownstreamCallbacks
receivesonAboveWriteBufferHighWatermark()
and callsreadDisable(true)
on the Upstream connection.- When the Downstream is finished processing (
drained
), it callsNetwork::ConnectionCallbacks::onBelowWriteBufferLowWatermark()
on the Upstream connection. Network::TcpProxy::DownstreamCallbacks
receivesonBelowWriteBufferLowWatermark()
and callsreadDisable(false)
on the Upstream connection.
The flow control forUpstream
is roughly the same.- Upstream
Network::ConnectionImpl::write_buffer_
buffers too much data. It callsNetwork::ConnectionCallbacks::onAboveWriteBufferHighWatermark()
. Network::TcpProxy::UpstreamCallbacks
receivesonAboveWriteBufferHighWatermark()
and callsreadDisable(true)
on the Downstream connection.- When the Upstream has finished processing (
drained
), it callsNetwork::ConnectionCallbacks::onBelowWriteBufferLowWatermark()
on the Downstream connection. Network::TcpProxy::UpstreamCallbacks
receivesonBelowWriteBufferLowWatermark()
and callsreadDisable(false)
on Downstream connections.
The subsystem and Callback mechanism can be found in this book in the section: Callback design pattern.
HTTP2 Flow Control Implementation
Because the various Buffers in the HTTP/2 technology stack are quite cumbersome, each segment of the path from Buffer exceeding the Watermark
limit to pausing data from the data source is described in a separate Envoy document.
If you don’t know much about Envoy’s http-connection-manager and http filter chain, you are advised to read http connection manager first. The following assumes that the reader already has this knowledge.
HTTP2 flow control general flow
Simplest Upstream connection congestion scenario
For HTTP/2, when filters, streams, or connections back up, the end result is
readDisable(true)
being called on the source stream. This results in the stream ceasing to consume window, and so not sending further flow control window updates to the peer. This will result in the peer eventually stopping sending data when the available window is consumed (or nghttp2 closing the connection if the peer violates the flow control limit) and so limiting the amount of data Envoy will buffer for each stream.
Figure: Upstream connection back up and backpressure
The Unbounded buffer
above does not mean that the buffer does not have a limit, it means that the limit is a soft limit
.
Upstream connection and Upstream http stream back-up at the same time
When
readDisable(false)
is called, any outstanding unconsumed data is immediately consumed, which results in resuming window updates to the peer and the resumption of data.
|
|
Note that
readDisable(true)
on a stream may be called by multiple entities. It is called when any filter buffers too much, when the stream backs up and has too much data buffered, or the connection has too much data buffered. Because of this,readDisable()
maintains a count of the number of times it has been called to both enable and disable the stream, resuming reads when each caller has called the equivalent low watermark callback.
For example, if the TCP window upstream fills up and results in the network buffer backing up, all the streams associated with that connection will
readDisable(true)
their downstream data sources.When the HTTP/2 flow control window fills up an individual stream may use all of the window available and call a second
readDisable(true)
on its downstream data source.When the upstream TCP socket drains, the connection will go below its low watermark and each stream will call
readDisable(false)
to resume the flow of data. The stream which had both a network level block and a H2 flow control block will still not be fully enabled.Once the upstream peer sends window updates, the stream buffer will drain and the second
readDisable(false)
will be called on the downstream data source, which will finally result in data flowing from downstream again.
Example:
- if the upstream TCP Write Buffer window fills and causes the network buffer to be full, all
streams
associated with thatconnection
willreadDisable(true)
their Downsteam data source. - At the same time, if the HTTP/2 flow control window fills up, a single stream may use all available windows and call a second
readDisable(true)
on its Downstream datasource. - Then, as the Upstream TCP Write Buffer continues to send and drain (drains), the
connection
will fall below its low water mark and each stream will callreadDisable(false)
to resume the data flow. However, astream
with both network-level hangs and H2 flow control-level hangs will still not be fully enabled. - Once the Upstream peer sends the HTTP2 window update, the
stream
buffer will empty and the Downstream data source will call a secondreadDisable(false)
, which will eventually cause the data to flow out of the Downstream again.
Figure: Upstream connection and Upstream http stream back-up at the same time
Collaboration of Router::Filter
during Upstream back-up
The two main parties involved in flow control are the router filter (
Envoy::Router::Filter
) and the connection manager (Envoy::Http::ConnectionManagerImpl
). The router is responsible for intercepting watermark events for its own buffers, the individual upstream streams (if codec buffers fill up) and the upstream connection (if the network buffer fills up). It passes any events to the connection manager, which has the ability to callreadDisable()
to enable and disable further data from downstream.
Figure: Collaboration of Router::Filter during Upstream back-up
Collaboration of Http::ConnectionManagerImpl when Downstream back-up
On the reverse path, when the downstream connection backs up, the connection manager collects events for the downstream streams and the downstream connection. It passes events to the router filter via
Envoy::Http::DownstreamWatermarkCallbacks
and the router can then callreadDisable()
on the upstream stream. Filters opt into subscribing toDownstreamWatermarkCallbacks
as a performance optimization to avoid each watermark event on a downstream HTTP/2 connection resulting in “number of streams * number of filters” callbacks. Instead, only the router filter is notified and only the “number of streams” multiplier applies. Because the router filter only subscribes to notifications when it has an upstream connection, the connection manager tracks how many outstanding high watermark events have occurred and passes any on to the router filter when it subscribes.
Figure: Collaboration of Http::ConnectionManagerImpl when Downstream back-up
HTTP decode/encode filter flow control detail
Each HTTP and HTTP/2 filter has an opportunity to call
decoderBufferLimit()
orencoderBufferLimit()
on creation. No filter should buffer more than the configured bytes without calling the appropriate watermark callbacks or sending an error response.Filters may override the default limit with calls to
setDecoderBufferLimit()
andsetEncoderBufferLimit()
. These limits are applied as filters are created so filters later in the chain can override the limits set by prior filters. It is recommended that filters calling these functions should generally only perform increases to the buffer limit, to avoid potentially conflicting with the buffer requirements of other filters in the chain.Most filters do not buffer internally, but instead push back on data by returning a FilterDataStatus on
encodeData()
/decodeData()
calls. If a buffer is a streaming buffer, i.e. the buffered data will resolve over time, it should returnFilterDataStatus::StopIterationAndWatermark
to pause further data processing, which will cause theConnectionManagerImpl
to trigger watermark callbacks on behalf of the filter. If a filter can not make forward progress without the complete body, it should returnFilterDataStatus::StopIterationAndBuffer
. In this case if theConnectionManagerImpl
buffers more than the allowed data it will return an error downstream: a 413 on the request path, 500 orresetStream()
on the response path.
Decoder filters
For filters which do their own internal buffering, filters buffering more than the buffer limit should call
onDecoderFilterAboveWriteBufferHighWatermark
if they are streaming filters, i.e. filters which can process more bytes as the underlying buffer is drained. This causes the downstream stream to be readDisabled and the flow of downstream data to be halted. The filter is then responsible for callingonDecoderFilterBelowWriteBufferLowWatermark
when the buffer is drained to resume the flow of data.Decoder filters which must buffer the full response should respond with a 413 (Payload Too Large) when encountering a response body too large to buffer.
The decoder high watermark path for streaming filters is as follows:
- When an instance of
Envoy::Router::StreamDecoderFilter
buffers too much data it should callStreamDecoderFilterCallback::onDecoderFilterAboveWriteBufferHighWatermark()
.- When
Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter
receivesonDecoderFilterAboveWriteBufferHighWatermark()
it callsreadDisable(true)
on the downstream stream to pause data.And the low watermark path:
- When the buffer of the
Envoy::Router::StreamDecoderFilter
drains should callStreamDecoderFilterCallback::onDecoderFilterBelowWriteBufferLowWatermark()
.- When
Envoy::Http::ConnectionManagerImpl
receivesonDecoderFilterAboveWriteBufferHighWatermark()
it callsreadDisable(false)
on the downstream stream to resume data.
Encoder filters
Encoder filters buffering more than the buffer limit should call
onEncoderFilterAboveWriteBufferHighWatermark
if they are streaming filters, i.e. filters which can process more bytes as the underlying buffer is drained. The high watermark call will be passed from theEnvoy::Http::ConnectionManagerImpl
to theEnvoy::Router::Filter
which willreadDisable(true)
to stop the flow of upstream data. Streaming filters which callonEncoderFilterAboveWriteBufferHighWatermark
should callonEncoderFilterBelowWriteBufferLowWatermark
when the underlying buffer drains.Filters which must buffer a full request body before processing further, should respond with a 500 (Server Error) if encountering a request body which is larger than the buffer limits.
The encoder high watermark path for streaming filters is as follows:
- When an instance of
Envoy::Router::StreamEncoderFilter
buffers too much data it should callStreamEncoderFilterCallback::onEncodeFilterAboveWriteBufferHighWatermark()
.- When
Envoy::Http::ConnectionManagerImpl::ActiveStreamEncoderFilter
receivesonEncoderFilterAboveWriteBufferHighWatermark()
it callsConnectionManagerImpl::ActiveStream::callHighWatermarkCallbacks()
callHighWatermarkCallbacks()
then in turn callsDownstreamWatermarkCallbacks::onAboveWriteBufferHighWatermark()
for all filters which registered to receive watermark eventsEnvoy::Router::Filter
receivesonAboveWriteBufferHighWatermark()
and callsreadDisable(true)
on the upstream request.The encoder low watermark path for streaming filters is as follows:
- When an instance of
Envoy::Router::StreamEncoderFilter
buffers drains it should callStreamEncoderFilterCallback::onEncodeFilterBelowWriteBufferLowWatermark()
.- When
Envoy::Http::ConnectionManagerImpl::ActiveStreamEncoderFilter
receivesonEncoderFilterBelowWriteBufferLowWatermark()
it callsConnectionManagerImpl::ActiveStream::callLowWatermarkCallbacks()
callLowWatermarkCallbacks()
then in turn callsDownstreamWatermarkCallbacks::onBelowWriteBufferLowWatermark()
for all filters which registered to receive watermark eventsEnvoy::Router::Filter
receivesonBelowWriteBufferLowWatermark()
and callsreadDisable(false)
on the upstream request.
HTTP and HTTP/2 codec upstream send buffer
Below I am using the original document directly. However, I have included diagrams that I have drawn to make it easier to understand.
The upstream send buffer Envoy::Http::Http2::ConnectionImpl::StreamImpl::pending_send_data_
is H2 stream data destined for an Envoy backend. Data is added to this buffer after each filter in the chain is done processing, and it backs up if there is insufficient connection or stream window to send the data. The high watermark path goes as follows:
- When
pending_send_data_
has too much data it callsConnectionImpl::StreamImpl::pendingSendBufferHighWatermark()
. pendingSendBufferHighWatermark()
callsStreamCallbackHelper::runHighWatermarkCallbacks()
runHighWatermarkCallbacks()
results in all subscribers ofEnvoy::Http::StreamCallbacks
receiving anonAboveWriteBufferHighWatermark()
callback.- When
Envoy::Router::Filter
receivesonAboveWriteBufferHighWatermark()
it callsStreamDecoderFilterCallback::onDecoderFilterAboveWriteBufferHighWatermark()
. - When
Envoy::Http::ConnectionManagerImpl
receivesonDecoderFilterAboveWriteBufferHighWatermark()
it callsreadDisable(true)
on the downstream stream to pause data.
For the low watermark path:
- When
pending_send_data_
drains it callsConnectionImpl::StreamImpl::pendingSendBufferLowWatermark()
pendingSendBufferLowWatermark()
callsStreamCallbackHelper::runLowWatermarkCallbacks()
runLowWatermarkCallbacks()
results in all subscribers ofEnvoy::Http::StreamCallbacks
receiving aonBelowWriteBufferLowWatermark()
callback.- When
Envoy::Router::Filter
receivesonBelowWriteBufferLowWatermark()
it callsStreamDecoderFilterCallback::onDecoderFilterBelowWriteBufferLowWatermark()
. - When
Envoy::Http::ConnectionManagerImpl
receivesonDecoderFilterBelowWriteBufferLowWatermark()
it callsreadDisable(false)
on the downstream stream to resume data.
Figure: Collaboration of Router::Filter during Upstream back-up
HTTP and HTTP/2 network upstream network buffer
Below I am using the original document directly. However, I have included diagrams that I have drawn to make it easier to understand. go further, I found a bug in the original document that should be fixed.
The upstream network buffer is HTTP/2 data for all streams destined for the Envoy backend. If the network buffer fills up, all streams associated with the underlying TCP connection will be informed of the back-up, and the data sources (HTTP/2 streams or HTTP connections) feeding into those streams will be readDisabled.
The high watermark path is as follows:
- When
Envoy::Network::ConnectionImpl::write_buffer_
has too much data it callsNetwork::ConnectionCallbacks::onAboveWriteBufferHighWatermark()
. - When
Envoy::Http::CodecClient
receivesonAboveWriteBufferHighWatermark()
it callsonUnderlyingConnectionAboveWriteBufferHighWatermark()
oncodec_
. - When
Http::Http2::ConnectionImpl
(the original document useEnvoy::Http::ConnectionManagerImpl
incorrectly) receivesonAboveWriteBufferHighWatermark()
it callsrunHighWatermarkCallbacks()
for each stream of the connection. runHighWatermarkCallbacks()
results in all subscribers ofEnvoy::Http::StreamCallback
receiving anonAboveWriteBufferHighWatermark()
callback.- When
Envoy::Router::Filter
receivesonAboveWriteBufferHighWatermark()
it callsStreamDecoderFilterCallback::onDecoderFilterAboveWriteBufferHighWatermark()
. - When
Envoy::Http::ConnectionManagerImpl
receivesonDecoderFilterAboveWriteBufferHighWatermark()
it callsreadDisable(true)
on the downstream stream to pause data.
The low watermark path is as follows:
- When
Envoy::Network::ConnectionImpl::write_buffer_
is drained it callsNetwork::ConnectionCallbacks::onBelowWriteBufferLowWatermark()
. - When
Envoy::Http::CodecClient
receivesonBelowWriteBufferLowWatermark()
it callsonUnderlyingConnectionBelowWriteBufferLowWatermark()
oncodec_
. - When
Envoy::Http::ConnectionManagerImpl
receivesonBelowWriteBufferLowWatermark()
it callsrunLowWatermarkCallbacks()
for each stream of the connection. runLowWatermarkCallbacks()
results in all subscribers ofEnvoy::Http::StreamCallback
receiving aonBelowWriteBufferLowWatermark()
callback.- When
Envoy::Router::Filter
receivesonBelowWriteBufferLowWatermark()
it callsStreamDecoderFilterCallback::onDecoderFilterBelowWriteBufferLowWatermark()
. - When
Envoy::Http::ConnectionManagerImpl
receivesonDecoderFilterBelowWriteBufferLowWatermark()
it callsreadDisable(false)
on the downstream stream to resume data.
As with the downstream network buffer, it is important that as new upstream streams are associated with an existing upstream connection over its buffer limits that the new streams are created in the correct state. To handle this, the Envoy::Http::Http2::ClientConnectionImpl
tracks the state of the underlying Network::Connection
in underlying_connection_above_watermark_
. If a new stream is created when the connection is above the high watermark the new stream has runHighWatermarkCallbacks()
called on it immediately.
Figure: Collaboration of Router::Filter when Upstream connection back-up