探索 Apache, Tomcat, SpringBoot 对请求数据的解压缩

通常我们都会配置 Web 服务端对响应数据进行压缩,如用 Apache 的 mod_deflate 模块,或配置 Tomcat connector 启用压缩,又或者是在 Java Web 项目中加 Web Filter 来压缩特定的响应数据。这样客户端发送 HTTP 请求时在头中声明如 Accept-Encoding: gzip,服务端就可能会对响应数据进行压缩,同时带上 Content-Encoding: gzip 响应头。

有时候 HTTP Post 的数据太大同样会要求客户端在传输数据之前对请求数据进行压缩,本文主要关注服务端如何自动解压客户端发出的压缩数据。

先以 Apache2 为例,以 Ubuntu 20.04 为例,用命令 apt-get install apache2 安装 Apache 2.4.41, 它自动启用了 mod_deflate 模块。mod_deflate 模块的配置文件 /etc/apache2/modes-enabled/deflate.conf 内容如下

它表示只对以上特定的响应数据类型进行压缩,下面来测试下对 html 内容的压缩

$ curl -ivs -H "Accept-Encoding:gzip" http://localhost/index.html | more
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /index.html HTTP/1.1
> Host: localhost
> User-Agent: curl/7.68.0
> Accept: */*
> Accept-Encoding:gzip
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Tue, 04 May 2021 14:27:43 GMT
< Server: Apache/2.4.41 (Ubuntu)
< Last-Modified: Fri, 30 Apr 2021 14:53:21 GMT
< ETag: "2aa6-5c131c5e6bc5b-gzip"
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Content-Length: 3138
< Content-Type: text/html
<
{ [3138 bytes data]
* Connection #0 to host localhost left intact
HTTP/1.1 200 OK
Date: Tue, 04 May 2021 14:27:43 GMT
Server: Apache/2.4.41 (Ubuntu)
Last-Modified: Fri, 30 Apr 2021 14:53:21 GMT
ETag: "2aa6-5c131c5e6bc5b-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 3138
Content-Type: text/html
jN?�Hk���ǁ)�
o�6$KcjX<&'����p8�ڇd����Y-S�통�L3�,[�px2��������d4��['+f�.�4���
�;����w)�3�nS�#��:��BT�%��Tif�?Mo� =%Í`�R-ى�Ԛ�LrK �/�l��ד�-v�

mod_deflate 在工作了,只要请求中加上 Accept-Encoding: gzip, 回过来的响应就是压缩内容,并且头中有 Content-Encoding: gzip.

回过头来,现在开始体验下 mod_deflate 是如何对请求数据进行解压缩的,压缩过程是在客户端进行的。mod_deflate 默认并不会对请求进行解压缩,即使在请求头中加了 Content-Encoding: gzip。为了测试请求数据的处理,我们安装 PHP,命令为 apt install php, 完后不需要额外的配置即支持 php 文件。在 web 根目录下创建 test.php 文件,内容为

只是简单的把收到的 request body 打印出来。

curl -X POST http://localhost/test.php -d 'hello world!'
received post body: hello world!

如果我们传输入压缩的 post body 呢?

$ echo 'hello world!' | gzip > body.gz
$ cat body.gz -
�H���W(�/�IQ����
^C
curl -iv -H 'Content-Encoding: gzip' http://localhost/test.php --data-binary @body.gz --output -
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> POST /test.php HTTP/1.1
> Host: localhost
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Tue, 04 May 2021 14:45:09 GMT
Date: Tue, 04 May 2021 14:45:09 GMT
< Server: Apache/2.4.41 (Ubuntu)
Server: Apache/2.4.41 (Ubuntu)
< Content-Length: 54
Content-Length: 54
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
<
received post body: �H���W(�/�IQ����
* Connection #0 to host localhost left intact

加了 Content-Encoding: gzip,Apache 也没有解压缩请求数据。

为了让 Apache  在看到 Content-Encoding: gzip 后自动解压请求数据,还须在 /etc/apache/mod-enabled/deflate.conf 中加上一行 SetInputFilter DEFLATE

apachectl graceful 重启 Apache 后再次测试

curl -iv -H 'Content-Encoding: gzip' http://localhost/test.php --data-binary @body.gz --output -
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> POST /test.php HTTP/1.1
> Host: localhost
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Tue, 04 May 2021 14:48:23 GMT
Date: Tue, 04 May 2021 14:48:23 GMT
< Server: Apache/2.4.41 (Ubuntu)
Server: Apache/2.4.41 (Ubuntu)
< Content-Length: 34
Content-Length: 34
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
<
received post body: hello world!
* Connection #0 to host localhost left intact

没问题了,Apache 正确理解请求中的压缩数据。假如把头 Content-Encoding: gzip 拿掉会怎么样呢?

curl http://localhost/test.php --data-binary @body.gz --output -
received post body: �H���W(�/�IQ����

又是乱码,没什么惊讶。

OK, 我们移步到 Tomcat, 如果我们在  Tomcat 前端配置了一个 Apache 来分发请求,那么处理请求数据的压缩可以完全仰杖 Apache。而让 Tomcat 直接面对客户要如何自动解缩请求数据呢?

很遗憾,Tomcat 的 Connector 配置也只能对响应数据进行压缩,无法对解压缩请求数据,是否支持这一需求尚在讨论当中 Add support fo request compression。所以不得不进到自己的 Java 应用中用 Web filter 来解决。

既然要在自己的应用中解决,所幸就直接踏步到 SpringBoot 的 web 应用中来,SpringBoot 的  application.pr0perties 中有类似于 Tomcat Connector 的有关于对响应数据的压缩配置,见 https://docs.spring.io/spring-boot/docs/2.4.4/reference/html/appendix-application-properties.html#common-application-properties-server。相关属性有

  1. server.compression.enabled
  2. server.compression.excluded-user-agents
  3. server.compression.mime-types
  4. server.compression.min-response-size

SpringBoot 和  Tomcat 一样,都无法自动解压缩请求中的数据。

在 SpringBoot 中的配置可应用于除 Tomcat 外任何的 Servlet 容器, 如 Jetty, Undertow, Netty 等。

echo 'hello world!' | gzip > body.gz
curl -X POST -H "Content-Encoding: gzip" http://localhost:8081/ --data-binary @body.gz --output -
%1F%EF%BF%BD%08%00%EF%BF%BDi%EF%BF%BD%60%00%03%EF%BF%BDH%EF%BF%BD%EF%BF%BD%EF%BF%BDW%28%EF%BF%BD%2F%EF%BF%BDIQ%EF%BF%BD%02%00%EF%BF%BD%EF%BF%BD%EF%BF%BD%01%0D%00%00%00=
notebooks git:(:|)
notebooks git:(:|)
notebooks git:(:|) curl -ivs -X POST -H "Content-Encoding: gzip" http://localhost:8081/ --data-binary @body.gz --output -
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST / HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 200
HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
Content-Type: text/plain;charset=UTF-8
< Content-Length: 169
Content-Length: 169
< Date: Tue, 04 May 2021 15:36:31 GMT
Date: Tue, 04 May 2021 15:36:31 GMT
<
%1F%EF%BF%BD%08%00%EF%BF%BDi%EF%BF%BD%60%00%03%EF%BF%BDH%EF%BF%BD%EF%BF%BD%EF%BF%BDW%28%EF%BF%BD%2F%EF%BF%BDIQ%EF%BF%BD%02%00%EF%BF%BD%EF%BF%BD%EF%BF%BD%01%0D%00%00%00=
* Connection #0 to host localhost left intact

显示的是对压缩数据的 URL Encode 编码,反正是理解不了请求中的压缩数据。

在 Java 的 Web 应用中要能解压缩请求数据还得自定义 Web Filter, 在 filter 方法中把原本的 HttpServletRequest 置换掉,把其中的 InputStream 包装为 GZIPInputStream,后续从其中读取内容时就能自动解压了。

相关代码,先要创建一个 DelegatingServletInputStream, 继承自 ServletInputStream, 这个类的实现还是从 spring-test 包中拷贝过来的

然后创建 Filter, 在 SpringBoot 中把它声明为一个 SpringBean 就行

这儿大概的查看到头 Content-Encoding 中是否包含 gzip,然后就认为它是一个压缩的请求,包装为 new DelegatingServletInputStream(new GZIPInputStream(req.getInputStream())), 接着往下传递。实际实现中需要严格判断更复杂的 Content-Encoding 值进行不同方式的解压。如 

Content-Encoding: deflate, gzip
Content-Encoding: compress
Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0

现在来测试一下

$ curl -ivs -X POST -H "Content-Encoding: gzip" -H "Content-Type: text/plain" http://localhost:8081/ --data-binary @body.gz --output -
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST / HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Encoding: gzip
> Content-Type: text/plain
> Content-Length: 33
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 200
HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
Content-Type: text/plain;charset=UTF-8
< Content-Length: 14
Content-Length: 14
< Date: Tue, 04 May 2021 16:25:54 GMT
Date: Tue, 04 May 2021 16:25:54 GMT
<
hello world!

没问题,能成功解压缩请求中的压缩数据。注意前面除 Content-Encoding: gzip,还加了 Content-Type: text/plain,如果不加 Content-Type, 那么它默认的值就是 Content-Type: application/x-www-form-urlencoded,将得不到正确的解压缩。

curl -ivs -X POST -H "Content-Encoding: gzip" http://localhost:8081/ --data-binary @body.gz --output -
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST / HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 200
HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
Content-Type: text/plain;charset=UTF-8
< Content-Length: 169
Content-Length: 169
< Date: Tue, 04 May 2021 17:09:48 GMT
Date: Tue, 04 May 2021 17:09:48 GMT
<
%1F%EF%BF%BD%08%00%EF%BF%BDi%EF%BF%BD%60%00%03%EF%BF%BDH%EF%BF%BD%EF%BF%BD%EF%BF%BDW%28%EF%BF%BD%2F%EF%BF%BDIQ%EF%BF%BD%02%00%EF%BF%BD%EF%BF%BD%EF%BF%BD%01%0D%00%00%00=

因为在 Content-Type: application/x-www-form-urlencoded 时,应用将不会使用前面覆盖的 getInputStream() 方法,而是进入到 ServletRequestWrapper.getParameterMap() 方法,最后读取了 org.apache.catalina.connector.CoyoteInputStream, 而不是我们期待的  GZIPInputStream。

不过对非 application/x-www-form-urlencoded 时的请求数据进行解压缩足够满足我们的需求了,大约不会有人传大数据时用 form 的 key/value 格式连接再压缩的。因此可要求客户端总是带上 Content-Type 头。

在使用 SpringBoot 时 Filter 也可以继承自 OncePerRequestFilter, 以下 Filter 实现的效果是完全一样的

链接:

  1. Apache2.X下开启GZIP页面压缩

类别: Java/JEE. 标签: , , . 阅读(44). 订阅评论. TrackBack.
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x