This page looks best with JavaScript enabled

Envoy WASM Network Filter to fix illegal HTTP Header

 ·  ☕ 5 min read

logo

The story

The story takes place in the summer of 2022 AD. God (pseudonym) found that the normal HTTP request before the introduction of Istio was responded with HTTP status code 400 after the introduction of Istio Gateway. The traffic in question has HTTP headers that do not comply to the HTTP 1.1 specification. For example, there is an extra space before the colon:

GET /headers HTTP/1.1\r\n
Host: httpbin.org\r\n
User-Agent: curl/7.68.0\r\n
Accept: */*\r\n
SpaceSuffixHeader : normalVal\r\n

After begging God to fix the problem, “innocent” programmers prepared for the worst, trying to build a Noah’s Ark.

The Plan - Two Noah’s Arks

When people talk about Istio, people are mostly talking about Envoy. The HTTP 1.1 parser used by Envoy is a library written in C language nodejs/http-parser that has not been updated for 2 years. The most straightforward idea is to let the parser be compatible with the problem HTTP Header. Well, the programmer turned on the search engine.

Ark 1 - Make Interpreter Compatible

If choosing a search engine is a matter of conditions, then the selection of search keywords is a matter of technology and experience. I won’t go into detail about how programmers search. I was taken by the engine to: White spaces in header fields will cause parser failed #297 finally.

Set the HTTP_PARSER_STRICT=0 solved my issue, thanks.

Above parameters need to be added during the istio-proxy / Envoy / http-parser compilation time, so that the header name with a space after it can be compatible.

Since the company where is a large company and has its own infrastructure department, generally large company will customize and compile open source projects instead of using binary Release directly. So the programmers tossed for a few days before customizing and compiling the istio-proxy of the company’s infrastructure department, adding HTTP_PARSER_STRICT=0. The test results also did solve the compatibility problem.

But this workaround has several problems:

  • The recompilation is a reason for the infrastructure department not to support other problem solving later. It is easy to introduce more unknown risks
  • An original principle of problem solving is to control the impact of the problem itself and the risk of the solution itself. Avoid introducing more bugs to fix one bug.
    • If the Istio Gateway allows the illegal header to be passthrough, then the sidecar proxy and application services of the upstream layers should also be compatible and passthrough the illegal header. Risk unknown.

Ark 2 - Fixed issues Header

Envoy claims to be a programmable proxy. Many people know that various functions can be achieved by adding custom-developed HTTP Filters to it, which of course includes the customization and rewriting of HTTP Header.

But, please think carefully. If you have read carefully what I wrote before “Reverse Engineering and Cloud Native Field Analysis Part2 - eBPF Tracks Istio/Envoy Startup, Monitoring and Thread Load Balancing(In Chinese)” or [Envoy Internals Deep Dive - Matt Klein, Lyft (Advanced Skill Level)] by Matt Klein, Lyft, the original author of Envoy:

image-20220306085636137

It shows that the parse error occurred in HTTP Codec, before HTTP Filter! So you can’t use HTTP Filter to fix it.

To verify this problem, I breakpoint the http_parser_execute function of http-parser with gdb and get the stack. For the gdb, see “gdb debugging istio proxy (envoy)”

HTTP Filter does not work, so what about TCP Filter? In theory, of course, you can use TCP Filter to correct the problem header before the Byte Buffer is sent to the HTTP Codec. Of course, instead of simply overwriting bytes, bytes may need to be cropped…

There are two ways to implement TCP Filter (hereinafter called Network Filter):

  • Native C++ Filter
    • Relatively good performance, no copy buffer is required. But have to recompile Envoy.
  • WASM Filter
    • Due to the sandbox VM, it is necessary to copy the buffer between the VM and the Native program, introducing cpu/memory usage and latency.

As mentioned above, recompile Envoy is not accepted in his company, the programmers can only choose WASM Filter.

If the “innocent” programmer is a pure architect, as long as he figured out the way, he can call it a day by writing a PPT architecture diagram, then it is a Happy Ending. Unfortunately, “innocent” programmers are destined to spend several days without sleep for the construction of “Ark No. 2”. Planks and needles have to DIY…

WASM Network Filter Quick Started

WASM Language

There are several optional languages for writing WASM Filters. Rust, Go, and C++. For memory safety and modern features, C++ is the last choice. However, “innocent” programmers chose C++. There is another reason after deep consideration:

Reuse Envoy’s http-parser with compatibility mode enabled compile-time configuration HTTP_PARSER_STRICT=0.

To fix a problematic HTTP header, first locate (or parse to) the header in the Byte Buffer. Of course a fancier HTTP parser could be used. All of the above languages have their own HTTP parser. But, who guarantees that the results of these parser are compatible with Envoy? Will new problems be introduced? Then, it is a good choice to use the same parser as Envoy. If there is a problem with the parser, even without this Fitler, Envoy itself will have problems too. That is, it is basically guaranteed not to introduce new problems on the parser.

The wild WASM Network Filter

The luckiest programmer can always find a copy/paste template code or an Issue workaround from the search engine/Stackoverflow/Github to easily complete the task. And “unlucky” programmers are often to solve problems that have no well known answer (although I love the latter), and finally toss themselves without a good KPI.

Obviously, a bunch of WASM HTTP Filter materials and reference implementations can be found on the Internet, but there are very few WASM Network Filters examples. Some of them only read Buffer Bytes and do simple statistics. None of them modify the byte stream at the L3/4 layer, let alone parse HTTP over the byte stream.

Proxy WASM C++ SDK

Open source not only open source code, but also an opportunity for people to seek truth. “Unlucky” programmers remember learning Visual C++ MFC in 2002 and seeing only the documentation on MSDN without knowing the pain.

No matter how small the niche WASM Network Filter is, it is also Open Source. Not only the SDK Open Source, but the interface definition ABI Spec is also Open Source. List the important references:

Designing WASM Network Filter

Stick to a habitual style, write less, and post more diagram:

WASM Network Filter 设计图

Figure: Design of WASM Network Filter

WASM Network Filter Implement

**For various reasons, I will not copy all the code, the following is just a pseudo code specially rewritten for this article. **

The source code of https://github.com/nodejs/http-parser is used, there are actually two files: http_parser.h and http_parser.c . Download and save to a new project directory first. Suppose it’s called $REPAIRER_FILTER_HOME . The biggest advantage of this http-parser is that it has no dependencies and is easy to import.

Now start writing the core code, which I assume is called: repairer_fitler.cc

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#include ...
#include "proxy_wasm_intrinsics.h"
#include "http_parser.h" //from https://github.com/nodejs/http-parser 

/**
Each Filter configuration corresponds to an object instance
**/
class ExampleRootContext : public RootContext
{
public:
  explicit ExampleRootContext(uint32_t id, std::string_view root_id) : RootContext(id, root_id) {}

    
  //Fitler launch event
  bool onStart(size_t) override
  {
    LOG_DEBUG("ready to process streams");
    return true;
  }
};

然后是核心类:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
/**
One object instance for each downstream connection
**/
class MainContext : public Context
{
public:
  http_parser_settings settings_;
  http_parser parser_;
  ...

  //Constructor, called when each new downstream connection becomes available. Such as after TLS handshake, or after TCP connection in Plain text. Note that HTTP 1.1 supports long connections, that is, this object needs to support multiple Requests.
  explicit MainContext(uint32_t id, RootContext *root) : Context(id, root)
  {
    logInfo(std::string("new MainContext"));

    // http_parser_settings_init(&settings_);
    http_parser_init(&parser_, HTTP_REQUEST);
    parser_.data = this;
    //Register callback events for HTTP Parser
    settings_ = {
        //on_message_begin:
        [](http_parser *parser) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_message_begin();        
        },
        //on_header_field
        [](http_parser *parser, const char *at, size_t length) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_header_field(at, length);
        },
        //on_header_value
        [](http_parser *parser, const char *at, size_t length) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_header_value(at, length);
        },
        //on_headers_complete
        [](http_parser *parser) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_headers_complete();
        },        
        ...
    }
  }
   
  //Called when a new Buffer event is received. Note that due to network reasons, an HTTP request can be broken up into multiple Buffers and called back multiple times.
  FilterStatus onDownstreamData(size_t length, bool end_of_stream) override
  {
    logInfo(std::string("onDownstreamData START"));      
    ...
        
    WasmDataPtr wasmDataPtr = getBufferBytes(WasmBufferType::NetworkDownstreamData, 0, length);

    {
      std::ostringstream out;
      out << "onDownstreamData length:" << length << ",end_of_stream:" << end_of_stream;
      logInfo(out.str());
      logInfo(std::string("onDownstreamData Buf:\n") + wasmDataPtr->toString());
    }

    //Various HTTP parsing are performed here, and the relevant HTTP parse callback functions are called. We implemented these functions to record the location of the problem header and fix it.
    size_t parsedBytes = http_parser_execute(&parser_, &settings_, wasmDataPtr->data(), length); // callbacks
	...      
        
    // because Envoy drain `length` size of buf require start=0 :
    // see proxy-wasm-cpp-sdk proxy_wasm_api.h setBuffer()
    // see proxy-wasm-cpp-host src/exports.cc set_buffer_bytes()
    // see Envoy source/extensions/common/wasm/context.cc Buffer::copyFrom()
    size_t start = 0;
        
    // WasmResult setBuffer(WasmBufferType type, size_t start, size_t length, std::string_view data,
    //                           size_t *new_size = nullptr)
    // Ref. https://github.com/proxy-wasm/spec/tree/master/abi-versions/vNEXT#proxy_set_buffer
    // Set content of the buffer buffer_type to the bytes (buffer_data, buffer_size), replacing size bytes, starting at offset in the existing buffer.
    // setBuffer(WasmBufferType::NetworkDownstreamData, start, length, data);
    setBuffer(WasmBufferType::NetworkDownstreamData, start, length, outputBuffer);
  }
    
  /**
   * on HTTP Stream(Connection) closed
   */
  void onDone() override { logInfo("onDone " + std::to_string(id())); }

Registration:

1
2
3
static RegisterContextFactory register_ExampleContext(CONTEXT_FACTORY(MainContext),
                                                      ROOT_FACTORY(ExampleRootContext),
                                                      "myparser_id");

Due to the parse of Buffer, HTTP Request/Header cross Buffer and other situations need to be considered. Also needs to support HTTP 1.1 keepalive long connections. The last time I did a C++ project was 17 years ago, and it took this programmer a week (overtime) to implement a working prototype. And, not optimized and tested for performance impact. The implementation of Sandbox VM is destined to have an impact on service latency. You can see my previous analysis:

My real-life Istio Performance Tuning - Part 1

Flame Graph(火焰图)中的 WASM

Figure: WASM in Flame Graph

WASM Network Filter usage

In fact, this Network Filter is added to Envoy’s FilterChain. Of course, I’m talking about Istio here. So use Istio’s EnvoyFilter

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
kubectl apply -f - <<"EOF"

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: myparser-envoy-wasm-net-filter
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
      app: istio-ingressgateway
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      context: GATEWAY
      listener:
        portNumber: 8443
        filterChain:
          filter:
            name: "envoy.filters.network.http_connection_manager"
    patch:
      operation: INSERT_BEFORE
      value:
        name: envoy.filters.network.wasm
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
          config:
            name: "myparser"
            root_id: "myparser_id"
            vm_config:
              runtime: "envoy.wasm.runtime.v8"
              code:
                local:
                  filename: "/etc/istio/myparser-envoy-wasm-net-filter/myparser.wasm"
              allow_precompiled: true 

EOF      

comprehend

This is the best of times, architects have a variety of open source components that simply glue together to implement requirements.

This is the worst of times, out-of-the-box spoiling the architects, using other people’s stuff we fly high and confident, thinking we have the magic. But when one unfortunately stepped on the pit and fell, he was also seriously injured because of his ignorance of reality.

My yysd - Brendan Gregg once said:

You never know a company (or person) until you see them on their worst day

The real test of a programmer or architect is not when he draws a grand blueprint (PPT) for a new project, nor how much he knows about new concepts and new technologies. Instead, when there is a problem with the existing architecture, without previous experience, how to explore a solution under the constraints of various technical and non-technical conditions, and get ready for solve new problems caused by the solution.

ending

Share on

Mark Zhu
WRITTEN BY
Mark Zhu
An old developer