通常我们都会配置 Web 服务端对响应数据进行压缩,如用 Apache 的 mod_deflate 模块,或配置 Tomcat connector 启用压缩,又或者是在 Java Web 项目中加 Web Filter 来压缩特定的响应数据。这样客户端发送 HTTP 请求时在头中声明如 Accept-Encoding: gzip,服务端就可能会对响应数据进行压缩,同时带上 Content-Encoding: gzip 响应头。
有时候 HTTP Post 的数据太大同样会要求客户端在传输数据之前对请求数据进行压缩,本文主要关注服务端如何自动解压客户端发出的压缩数据。
先以 Apache2 为例,以 Ubuntu 20.04 为例,用命令 apt-get install apache2
安装 Apache 2.4.41, 它自动启用了 mod_deflate 模块。mod_deflate 模块的配置文件 /etc/apache2/modes-enabled/deflate.conf 内容如下
1 2 3 4 5 6 7 8 |
<IfModule mod_deflate.c> <IfModule mod_filter.c> AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css AddOutputFilterByType DEFLATE application/x-javascript application/javascript application/ecmascript AddOutputFilterByType DEFLATE application/rss+xml AddOutputFilterByType DEFLATE application/xml </IfModule> </IfModule> |
它表示只对以上特定的响应数据类型进行压缩,下面来测试下对 html 内容的压缩
$ curl -ivs -H "Accept-Encoding:gzip" http://localhost/index.html | more
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /index.html HTTP/1.1
> Host: localhost
> User-Agent: curl/7.68.0
> Accept: */*
> Accept-Encoding:gzip
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Tue, 04 May 2021 14:27:43 GMT
< Server: Apache/2.4.41 (Ubuntu)
< Last-Modified: Fri, 30 Apr 2021 14:53:21 GMT
< ETag: "2aa6-5c131c5e6bc5b-gzip"
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Content-Length: 3138
< Content-Type: text/html
<
{ [3138 bytes data]
* Connection #0 to host localhost left intact
HTTP/1.1 200 OK
Date: Tue, 04 May 2021 14:27:43 GMT
Server: Apache/2.4.41 (Ubuntu)
Last-Modified: Fri, 30 Apr 2021 14:53:21 GMT
ETag: "2aa6-5c131c5e6bc5b-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 3138
Content-Type: text/html
jN?�Hk���ǁ)�
o�6$KcjX<&'����p8�ڇd����Y-S�통�L3�,[�px2��������d4��['+f�.�4���
�;����w)�3�nS�#��:��
BT�%��Tif�?Mo� =%Í`�R-ى�Ԛ�LrK �/�l��ד�-v�
mod_deflate 在工作了,只要请求中加上 Accept-Encoding: gzip, 回过来的响应就是压缩内容,并且头中有 Content-Encoding: gzip.
回过头来,现在开始体验下 mod_deflate 是如何对请求数据进行解压缩的,压缩过程是在客户端进行的。mod_deflate 默认并不会对请求进行解压缩,即使在请求头中加了 Content-Encoding: gzip。为了测试请求数据的处理,我们安装 PHP,命令为 apt install php
, 完后不需要额外的配置即支持 php 文件。在 web 根目录下创建 test.php 文件,内容为
1 2 3 4 |
<?php $body = file_get_contents('php://input'); echo 'received post body: ' . $body . "\n"; ?> |
只是简单的把收到的 request body 打印出来。
curl -X POST http://localhost/test.php -d 'hello world!'
received post body: hello world!
如果我们传输入压缩的 post body 呢?
$ echo 'hello world!' | gzip > body.gz
$ cat body.gz -
�H���W(�/�IQ����
^C
curl -iv -H 'Content-Encoding: gzip' http://localhost/test.php --data-binary @body.gz --output -
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> POST /test.php HTTP/1.1
> Host: localhost
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Tue, 04 May 2021 14:45:09 GMT
Date: Tue, 04 May 2021 14:45:09 GMT
< Server: Apache/2.4.41 (Ubuntu)
Server: Apache/2.4.41 (Ubuntu)
< Content-Length: 54
Content-Length: 54
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
<
received post body: �H���W(�/�IQ����
* Connection #0 to host localhost left intact
加了 Content-Encoding: gzip,Apache 也没有解压缩请求数据。
为了让 Apache 在看到 Content-Encoding: gzip 后自动解压请求数据,还须在 /etc/apache/mod-enabled/deflate.conf 中加上一行 SetInputFilter DEFLATE
1 2 3 4 5 |
<IfModule mod_deflate.c> <IfModule mod_filter.c> SetInputFilter DEFLATE </IfModule> </IfModule> |
apachectl graceful 重启 Apache 后再次测试
curl -iv -H 'Content-Encoding: gzip' http://localhost/test.php --data-binary @body.gz --output -
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> POST /test.php HTTP/1.1
> Host: localhost
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Tue, 04 May 2021 14:48:23 GMT
Date: Tue, 04 May 2021 14:48:23 GMT
< Server: Apache/2.4.41 (Ubuntu)
Server: Apache/2.4.41 (Ubuntu)
< Content-Length: 34
Content-Length: 34
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
<
received post body: hello world!
* Connection #0 to host localhost left intact
没问题了,Apache 正确理解请求中的压缩数据。假如把头 Content-Encoding: gzip 拿掉会怎么样呢?
curl http://localhost/test.php --data-binary @body.gz --output -
received post body: �H���W(�/�IQ����
又是乱码,没什么惊讶。
OK, 我们移步到 Tomcat, 如果我们在 Tomcat 前端配置了一个 Apache 来分发请求,那么处理请求数据的压缩可以完全仰杖 Apache。而让 Tomcat 直接面对客户要如何自动解缩请求数据呢?
很遗憾,Tomcat 的 Connector 配置也只能对响应数据进行压缩,无法对解压缩请求数据,是否支持这一需求尚在讨论当中 Add support fo request compression。所以不得不进到自己的 Java 应用中用 Web filter 来解决。
既然要在自己的应用中解决,所幸就直接踏步到 SpringBoot 的 web 应用中来,SpringBoot 的 application.pr0perties 中有类似于 Tomcat Connector 的有关于对响应数据的压缩配置,见 https://docs.spring.io/spring-boot/docs/2.4.4/reference/html/appendix-application-properties.html#common-application-properties-server。相关属性有
- server.compression.enabled
- server.compression.excluded-user-agents
- server.compression.mime-types
- server.compression.min-response-size
SpringBoot 和 Tomcat 一样,都无法自动解压缩请求中的数据。
在 SpringBoot 中的配置可应用于除 Tomcat 外任何的 Servlet 容器, 如 Jetty, Undertow, Netty 等。
1 2 3 4 5 6 7 8 |
@RestController public class HelloController { @PostMapping("/") public String hello(@RequestBody String body) { return body + "\n"; } } |
echo 'hello world!' | gzip > body.gz
curl -X POST -H "Content-Encoding: gzip" http://localhost:8081/ --data-binary @body.gz --output -
%1F%EF%BF%BD%08%00%EF%BF%BDi%EF%BF%BD%60%00%03%EF%BF%BDH%EF%BF%BD%EF%BF%BD%EF%BF%BDW%28%EF%BF%BD%2F%EF%BF%BDIQ%EF%BF%BD%02%00%EF%BF%BD%EF%BF%BD%EF%BF%BD%01%0D%00%00%00=
notebooks git:(:|)
notebooks git:(:|)
notebooks git:(:|) curl -ivs -X POST -H "Content-Encoding: gzip" http://localhost:8081/ --data-binary @body.gz --output -
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST / HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 200
HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
Content-Type: text/plain;charset=UTF-8
< Content-Length: 169
Content-Length: 169
< Date: Tue, 04 May 2021 15:36:31 GMT
Date: Tue, 04 May 2021 15:36:31 GMT
<
%1F%EF%BF%BD%08%00%EF%BF%BDi%EF%BF%BD%60%00%03%EF%BF%BDH%EF%BF%BD%EF%BF%BD%EF%BF%BDW%28%EF%BF%BD%2F%EF%BF%BDIQ%EF%BF%BD%02%00%EF%BF%BD%EF%BF%BD%EF%BF%BD%01%0D%00%00%00=
* Connection #0 to host localhost left intact
显示的是对压缩数据的 URL Encode 编码,反正是理解不了请求中的压缩数据。
在 Java 的 Web 应用中要能解压缩请求数据还得自定义 Web Filter, 在 filter 方法中把原本的 HttpServletRequest 置换掉,把其中的 InputStream 包装为 GZIPInputStream,后续从其中读取内容时就能自动解压了。
相关代码,先要创建一个 DelegatingServletInputStream, 继承自 ServletInputStream, 这个类的实现还是从 spring-test 包中拷贝过来的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
public class DelegatingServletInputStream extends ServletInputStream { private final InputStream sourceStream; private boolean finished = false; public DelegatingServletInputStream(InputStream sourceStream) { Assert.notNull(sourceStream, "Source InputStream must not be null"); this.sourceStream = sourceStream; } @Override public int read() throws IOException { int data = this.sourceStream.read(); if (data == -1) { this.finished = true; } return data; } @Override public int available() throws IOException { return this.sourceStream.available(); } @Override public void close() throws IOException { super.close(); this.sourceStream.close(); } @Override public boolean isFinished() { return this.finished; } @Override public boolean isReady() { return true; } @Override public void setReadListener(ReadListener readListener) { throw new UnsupportedOperationException(); } } |
然后创建 Filter, 在 SpringBoot 中把它声明为一个 SpringBean 就行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
@Component @Order(1) public class DecompressFilter implements Filter { @Override public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { HttpServletRequest req = (HttpServletRequest) request; String contentEncoding = req.getHeader("Content-Encoding"); if (contentEncoding != null && contentEncoding.toLowerCase().contains("gzip")) { HttpServletRequest delegatingRequest = new HttpServletRequestWrapper(req) { @Override public ServletInputStream getInputStream() throws IOException { return new DelegatingServletInputStream(new GZIPInputStream(req.getInputStream())); } }; chain.doFilter(delegatingRequest, response); } else { chain.doFilter(request, response); } } } |
这儿大概的查看到头 Content-Encoding 中是否包含 gzip,然后就认为它是一个压缩的请求,包装为 new DelegatingServletInputStream(new GZIPInputStream(req.getInputStream())), 接着往下传递。实际实现中需要严格判断更复杂的 Content-Encoding 值进行不同方式的解压。如
Content-Encoding: deflate, gzip
Content-Encoding: compress
Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0
现在来测试一下
$ curl -ivs -X POST -H "Content-Encoding: gzip" -H "Content-Type: text/plain" http://localhost:8081/ --data-binary @body.gz --output -
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST / HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Encoding: gzip
> Content-Type: text/plain
> Content-Length: 33
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 200
HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
Content-Type: text/plain;charset=UTF-8
< Content-Length: 14
Content-Length: 14
< Date: Tue, 04 May 2021 16:25:54 GMT
Date: Tue, 04 May 2021 16:25:54 GMT
<
hello world!
没问题,能成功解压缩请求中的压缩数据。注意前面除 Content-Encoding: gzip,还加了 Content-Type: text/plain,如果不加 Content-Type, 那么它默认的值就是 Content-Type: application/x-www-form-urlencoded,将得不到正确的解压缩。
curl -ivs -X POST -H "Content-Encoding: gzip" http://localhost:8081/ --data-binary @body.gz --output -
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST / HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Encoding: gzip
> Content-Length: 33
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 200
HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
Content-Type: text/plain;charset=UTF-8
< Content-Length: 169
Content-Length: 169
< Date: Tue, 04 May 2021 17:09:48 GMT
Date: Tue, 04 May 2021 17:09:48 GMT
<
%1F%EF%BF%BD%08%00%EF%BF%BDi%EF%BF%BD%60%00%03%EF%BF%BDH%EF%BF%BD%EF%BF%BD%EF%BF%BDW%28%EF%BF%BD%2F%EF%BF%BDIQ%EF%BF%BD%02%00%EF%BF%BD%EF%BF%BD%EF%BF%BD%01%0D%00%00%00=
因为在 Content-Type: application/x-www-form-urlencoded 时,应用将不会使用前面覆盖的 getInputStream()
方法,而是进入到 ServletRequestWrapper.getParameterMap() 方法,最后读取了 org.apache.catalina.connector.CoyoteInputStream, 而不是我们期待的 GZIPInputStream。
不过对非 application/x-www-form-urlencoded 时的请求数据进行解压缩足够满足我们的需求了,大约不会有人传大数据时用 form 的 key/value 格式连接再压缩的。因此可要求客户端总是带上 Content-Type 头。
在使用 SpringBoot 时 Filter 也可以继承自 OncePerRequestFilter, 以下 Filter 实现的效果是完全一样的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
@Component @Order(1) public class DecompressFilter extends OncePerRequestFilter { @Override protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws ServletException, IOException { String contentEncoding = request.getHeader("Content-Encoding"); if (contentEncoding != null && contentEncoding.toLowerCase().contains("gzip")) { HttpServletRequest delegatingRequest = new HttpServletRequestWrapper(request) { @Override public ServletInputStream getInputStream() throws IOException { return new DelegatingServletInputStream(new GZIPInputStream(request.getInputStream())); } }; chain.doFilter(delegatingRequest, response); } else { chain.doFilter(request, response); } } } |
链接: