postgres in (?,?) 和 =any(?) 用法/性能对比 | 隔叶黄莺 Yanbin Blog

2023-08-09 | 阅读(636)

刚刚回顾了一下 JDBC 操作 SQL Server 时如何传入列表参数，即如何给 in (?) 条件直接传入一个列表参数，然而本质上是不支持，最终不得不展开为 in (?, ?,...?) 针对每个元素单独设置参数，不定长的参数对于重用已编译 PreparedStatement 语句的帮助不大。

那么 JDBC 操作 PostgreSQL 是何种状态呢？展开为多个参数当然是有效的。继续尝试 Spring 提供的 NamedParameterJdbcTemplate 的操作方式

String query = "select * from users where id in (:ids)";
Map<String, Object> parameters = new HashMap<>();
parameters.put("ids", IntStream.rangeClosed(1, 5).boxed().collect(toList()));
List<Map<String, Object>> maps = namedParameterJdbcTemplate.queryForList(query, parameters);

执行后查看到实际执行的语句是

select * from users where id in (?, ?, ?, ?, ?)

使用 p6spy 捕获所执行的 SQL 语句

在 PostgreSQL 中试着查询视图 pg_stat_activity 来找到历史查询语句，但未成功，所以用了 p6spy 来打印底层执行的 SQL 语句。十数年之前写过如何用 p6spy 来监控 SQL 语句的文章，p6spy 发展到现在使用起来也谈到简单了，只需在项目中引入依赖 p6spy:p6spy:3.9.1，然后再辅以两步

JDBC URL 稍作变化，在 jdbc 与数据库类型之间加入 p6spy 即可。如 jdbc:postgresql://localhost/postgres 变身为 jdbc:p6spy:postgresql://localhost/postgres
启动 Java 程序时加上系统属性 -Dp6spy.config.appender=com.p6spy.engine.spy.appender.StdoutLogger

如此操作数据库时就会在控制台实时打印出实际执行的语句，如上面的完整输出是

1691564330238|79|statement|connection 0|url jdbc:p6spy:postgresql://localhost/postgres|select * from users where id in (?, ?, ?, ?, ?)|select * from users where id in (1, 2, 3, 4, 5)

in (?, ?) 是有参数个数限制的

到这里我们不难看出，所谓的 in (?, ?) 形式是最笨拙的实现方式，效率上感觉不太妙。

那么 PostgreSQL 在用 in (?,?) 的方式时是否有参数个数的限制呢？测试了 10000

parameters.put("ids", IntStream.rangeClosed(1, 10000).boxed().collect(toList()));

能正常返回结果。加到 10000

parameters.put("ids", IntStream.rangeClosed(1, 100000).boxed().collect(toList()));

这下报错了

Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:349)
......
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:651)
... 7 more
Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
at org.postgresql.core.PGStream.sendInteger2(PGStream.java:349)

说是超过了有符号的两字节整数(int2) 的最大小值, 也就是说最大 32767, 超过这个值就会得到同样的错误。

注：在使用 PostgreSQL 15, JDBC 驱动为 42.7.3 的情况下，参数个数可以达到 65535, 即两字节无符号整数大小(uint2)。限制在 PgPreparedStatement 类中找到(这是 postgresql 42.7.3 驱动中的代码)

  final int maximumNumberOfParameters() {
    return connection.getPreferQueryMode() == PreferQueryMode.SIMPLE ? Integer.MAX_VALUE : 65535;
  }

final int maximumNumberOfParameters() {

return connection.getPreferQueryMode() == PreferQueryMode.SIMPLE ? Integer.MAX_VALUE : 65535;

}

默认 preferQueryMode 是 extended, 所以限制是 65535, 如果在 JDBC URL 中配置了 ?preferQueryMode=simple, 则参数个数可达到 Java 整数的最大值 2,147,483,647, 但性能上会差

Specifies which mode is used to execute queries to database: simple means ('Q' execute, no parse, no bind, text mode only), extended means always use bind/ execute messages, extendedForPrepared means extended for prepared statements only.

使用 =any(?) 形式的查询

与 =any(?) 相配合的就是 PostgreSQL 驱动的 setArray() 和 createArrayOf() 方法实现，完整的代码如下

String jdbcUrl = "jdbc:p6spy:postgresql://localhost/postgres";
Connection conn = DriverManager.getConnection(jdbcUrl, "sa", "mypassword");

String query = "select * from users where id = any(?)";
PreparedStatement pstmt = conn.prepareStatement(query);
Object[] params = IntStream.rangeClosed(1, 3).boxed().toArray();
pstmt.setArray(1, conn.createArrayOf("int", params));
ResultSet rs = pstmt.executeQuery();

while (rs.next()) {
    System.out.println(rs.getString(1));
}

String jdbcUrl = "jdbc:p6spy:postgresql://localhost/postgres";

Connection conn = DriverManager.getConnection(jdbcUrl, "sa", "mypassword");

String query = "select * from users where id = any(?)";

PreparedStatement pstmt = conn.prepareStatement(query);

Object[] params = IntStream.rangeClosed(1, 3).boxed().toArray();

pstmt.setArray(1, conn.createArrayOf("int", params));

ResultSet rs = pstmt.executeQuery();

while (rs.next()) {

System.out.println(rs.getString(1));

}

执行上面的代码，p6spy 打印出

1691566338490|83|statement|connection 0|url jdbc:p6spy:postgresql://localhost/postgres|select * from users where id = any(?)|select * from users where id = any('{"1","2","3"}')

这里 params 有三个元素，无论多少个元素，原始的 select * from users where id = any(?) 是不会变的。我们也可以测试一下它能接收多少个元素

Object[] params = IntStream.rangeClosed(1, 10000).boxed().toArray();

10000 个元素，没问题; 100000 也行，1000000 也能应对。应该可以说 =any(?) 接收数组的大小无限制，和直接写 SQL 语句时的 in (1, 2, ...n) 一样的。

JdbcTemplate 使用 `=any(?)`

借助于 JdbcTemplate 时尝试过传递参数时同时传入 int[] argTypes 指定参数类型，无论是 Types.INTEGER 还是 Types.ARRAY 都未成功，最后只能用 PreparedStatementCreator 来显式调用 setArray() 方法，代码如下

String query = "select * from users where id = any(?)";
Object[] params = IntStream.rangeClosed(1, 5).boxed().toArray();
List<String> values = jdbcTemplate.query(
        conn -> {
            PreparedStatement pstmt = conn.prepareStatement(query);
            pstmt.setArray(1, conn.createArrayOf("int", params));
            return pstmt;
        }, (rs, idx) -> rs.getString(2)
);

String query = "select * from users where id = any(?)";

Object[] params = IntStream.rangeClosed(1, 5).boxed().toArray();

List<String> values = jdbcTemplate.query(

conn -> {

PreparedStatement pstmt = conn.prepareStatement(query);

pstmt.setArray(1, conn.createArrayOf("int", params));

return pstmt;

}, (rs, idx) -> rs.getString(2)

);

并未增加多少代码

in(?, ?) 与 =any(?) 的性能比较

这基本上二者还未上擂台就能猜出输赢来，比较的同时还得把另一种写法拉扯进来

select * from (values (1),(2),(3) ) tmp_users(id)

1	select * from (values (1),(2),(3) ) tmp_users(id)

在测试表 users 中有 439175 条记录，id 为主键，也就意味着有索引，然后随机从 users 表中取出若干 id 值作为查询条件。我们在使用 p6spy 输出 Query 时打印的第二列即为查询消耗的时间。

测试下 Schema

create table users(
   id serial,
   name varchar(32),
   age int
);

create table users(

id serial,

name varchar(32),

age int

);

完整的测试代码如下

public class TestPostgreSQL {

    public static void main(String[] args) throws Exception {
        String jdbcUrl = "jdbc:p6spy:postgresql://localhost/postgres";
        Connection conn = DriverManager.getConnection(jdbcUrl, "sa", "mypassword");

        Integer[] userIds = getRandomUserIds(100);
        testSelectValues(conn, userIds);
        testIn(conn, userIds);
        testAny(conn, userIds);
    }

    private static void testSelectValues(Connection conn, Integer[] params) throws Exception {
        String query = String.format("select id from (values %s) tmp_users(id)",
                Arrays.stream(params).map(i->"("+i+")").collect(Collectors.joining(",")));
        Statement stmt = conn.createStatement();
        stmt.executeQuery(query);
    }

    private static void testIn(Connection conn, Integer[] params) throws Exception {
        String query = String.format("select id from users where id in (%s)",
                IntStream.rangeClosed(1, params.length).boxed().map(i->"?").collect(Collectors.joining(",")));
        PreparedStatement pstmt = conn.prepareStatement(query);
        for (int i = 0; i < params.length; i++) {
           pstmt.setInt(i+1, params[i]);
        }
        pstmt.executeQuery();
    }

    private static void testAny(Connection conn, Integer[] params) throws Exception {
        String query = "select id from users where id = any(?)";
        PreparedStatement pstmt = conn.prepareStatement(query);
        pstmt.setArray(1, conn.createArrayOf("int", params));
        pstmt.executeQuery();
    }

    private static Integer[] getRandomUserIds(int num) throws IOException {
        List<Integer> userIds = Files.readAllLines(Paths.get("/Users/yanbin/user_ids.txt"))
                .stream().map(Integer::valueOf).collect(toList());
        Collections.shuffle(userIds);
        return userIds.subList(0, num).toArray(new Integer[]{});
    }
}

public class TestPostgreSQL {

public static void main(String[] args) throws Exception {

String jdbcUrl = "jdbc:p6spy:postgresql://localhost/postgres";

Connection conn = DriverManager.getConnection(jdbcUrl, "sa", "mypassword");

Integer[] userIds = getRandomUserIds(100);

testSelectValues(conn, userIds);

testIn(conn, userIds);

testAny(conn, userIds);

}

private static void testSelectValues(Connection conn, Integer[] params) throws Exception {

String query = String.format("select id from (values %s) tmp_users(id)",

Arrays.stream(params).map(i->"("+i+")").collect(Collectors.joining(",")));

Statement stmt = conn.createStatement();

stmt.executeQuery(query);

}

private static void testIn(Connection conn, Integer[] params) throws Exception {

String query = String.format("select id from users where id in (%s)",

IntStream.rangeClosed(1, params.length).boxed().map(i->"?").collect(Collectors.joining(",")));

PreparedStatement pstmt = conn.prepareStatement(query);

for (int i = 0; i < params.length; i++) {

pstmt.setInt(i+1, params[i]);

}

pstmt.executeQuery();

}

private static void testAny(Connection conn, Integer[] params) throws Exception {

String query = "select id from users where id = any(?)";

PreparedStatement pstmt = conn.prepareStatement(query);

pstmt.setArray(1, conn.createArrayOf("int", params));

pstmt.executeQuery();

}

private static Integer[] getRandomUserIds(int num) throws IOException {

List<Integer> userIds = Files.readAllLines(Paths.get("/Users/yanbin/user_ids.txt"))

.stream().map(Integer::valueOf).collect(toList());

Collections.shuffle(userIds);

return userIds.subList(0, num).toArray(new Integer[]{});

}

下面是 getRandomUserIds 取不同数量值时查询执行的时间(测试时所用驱动的参数个数限制是 32767)

Number of RandomUserIds	testSelectValues(ms)	testIn(ms)	testAny(ms)
100	83	79	75
1,000	168	92	81
10,000	560	330	123
20,000	734	410	171
32767	934	611	262
60,000	1140	N/A	544
100,000	1446	N/A	785
300,000	4828		3875

以上测试数据仅供参考

在 UserIds 量少的情况下分别不大，在多数时候 =any(?) 的方式还是最有效的。在超过 300,000 个 UserIds 时 SelectValues 和 =any(?) 没多大的差别，其实上面设计的测试本身就有问题了, SelectValues 总是从内存中查询数据，而 =any(?) 要从物理表中查询。

从简洁和效率上来讲, 在 JDBC 中还是优先考虑使用 PostgreSQL 的 =any(?) 处理列表参数。

本文链接 https://yanbin.blog/postgres-in-vs-any-comparison/, 来自隔叶黄莺 Yanbin Blog

本文采用署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 进行许可。

2 Comments

Inline Feedbacks

View all comments

JDBC 设置 PostgreSQL 查询中 any(?) 的参数 | 隔叶黄莺 Yanbin Blog - 软件编程实践

11 months ago

[…] postgres in (?,?) 和 =any(?) 用法/性能对比，其中关于如何向查询语句中 id = any(?) […]

ddd

好文章

best coffee on SciPy 最优化之最小化I wanted to take a moment to commend you on the outstanding quality of...
seetimee on 体验 Python FastAPI 的并发能力及线, 进程模型感谢
Yanbin on Mockito 3.4.0 开始可 Mock 静态方法有一个补救，新写了一篇 https://yanbin.blog/mockito-mock-static-method-in-multiple...
Yanbin on 升级到 Spring Boot 3 后 javax.inject.Named 不可用怎么，被抄袭了！算是被机器翻译引用的？
oleksi on 升级到 Spring Boot 3 后 javax.inject.Named 不可用https://www.springcloud.io/post/2023-02/springboot-3-javax-inject-name...

使用 p6spy 捕获所执行的 SQL 语句

in (?, ?) 是有参数个数限制的

使用 =any(?) 形式的查询

JdbcTemplate 使用 =any(?)

in(?, ?) 与 =any(?) 的性能比较

JdbcTemplate 使用 `=any(?)`