Skip to main content

How to batch INSERT and UPDATE statements with Hibernate


Courtesy: VLAD MIHALCEA

Introduction

JDBC has long been offering support for DML statement batching. By default, all statements are sent one after the other, each one in a separate network round-trip. Batching allows us to send multiple statements in one-shot, saving unnecessary socket stream flushing.
Hibernate hides the database statements behind a transactional write-behind abstraction layer. An intermediate layer allows us to hide the JDBC batching semantics from the persistence layer logic. This way, we can change the JDBC batching strategy without altering the data access code.
Configuring Hibernate to support JDBC batching is not as easy as it should be, so I’m going to explain everything you need to do in order to make it work.

Testing time

We’ll start with the following entity model:
PostCommentJdbcBatch
The Post has a one-to-many association with the Comment entity:
1
2
3
4
5
@OneToMany(
    cascade = CascadeType.ALL,
    mappedBy = "post",
    orphanRemoval = true)
private List comments = new ArrayList<>();
Or test scenario issues both INSERT and UPDATE statements, so we can validate if JDBC batching is being used:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
LOGGER.info("Test batch insert");
long startNanos = System.nanoTime();
doInTransaction(session -> {
    int batchSize = batchSize();
    for(int i = 0; i < itemsCount(); i++) {
        Post post = new Post(
            String.format("Post no. %d", i)
        );
        int j = 0;
        post.addComment(new Comment(
                String.format(
                    "Post comment %d:%d", i, j++
        )));
        post.addComment(new Comment(
                String.format(
                     "Post comment %d:%d", i, j++
        )));
        session.persist(post);
        if(i % batchSize == 0 && i > 0) {
            session.flush();
            session.clear();
        }
    }
});
LOGGER.info("{}.testInsert took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos
    ));
 
LOGGER.info("Test batch update");
startNanos = System.nanoTime();
 
doInTransaction(session -> {
    List posts = session.createQuery(
        "select distinct p " +
        "from Post p " +
        "join fetch p.comments c")
    .list();
 
    for(Post post : posts) {
        post.title = "Blog " + post.title;
        for(Comment comment : post.comments) {
            comment.review = "Blog " + comment.review;
        }
    }
});
 
LOGGER.info("{}.testUpdate took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos
    ));
This test will persist a configurable number of Post entities, each one containing two Comments. For the sake of brevity, we are going to persist 3 Posts and the Dialectdefault batch size:
1
2
3
4
5
6
7
protected int itemsCount() {
    return 3;
}
 
protected int batchSize() {
    return Integer.valueOf(Dialect.DEFAULT_BATCH_SIZE);
}

Default batch support

Hibernate doesn’t implicitly employ JDBC batching and each INSERT and UPDATEstatement is executed separately:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}

Configuring hibernate.jdbc.batch_size

To enable JDBC batching, we have to configure the hibernate.jdbc.batch_sizeproperty:
A non-zero value enables use of JDBC2 batch updates by Hibernate (e.g. recommended values between 5 and 30)
We’ll set this property and rerun our test:
1
2
properties.put("hibernate.jdbc.batch_size",
    String.valueOf(batchSize()));
This time, the Comment INSERT statements are batched, while the UPDATEstatements are left untouched:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
JDBC batch can target one table only, so every new DML statement targeting a different table ends up the current batch and initiates a new one. Mixing different table statements is therefore undesirable when using SQL batch processing.

Ordering statements

Hibernate can sort INSERT and UPDATE statements using the following configuration options:
1
2
properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");
While the Post and Comment INSERT statements are batched accordingly, the UPDATE statements are still executed separately:
1
2
3
4
5
6
7
8
9
10
11
12
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}

Adding version data batch support

There’s the hibernate.jdbc.batch_versioned_data configuration property we need to set, in order to enable UPDATE batching:
Set this property to true if your JDBC driver returns correct row counts from executeBatch(). It is usually safe to turn this option on. Hibernate will then use batched DML for automatically versioned data. Defaults to false.
We will rerun our test with this property set too:
1
properties.put("hibernate.jdbc.batch_versioned_data", "true");
Now both the INSERT and the UPDATE statements are properly batched:
1
2
3
4
5
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}

Benchmark

Now that we managed to configure Hibernate for JDBC batching, we can benchmark the performance gain of statement grouping.
  • the test case uses a PostgreSQL database installed on the same machine with the currently running JVM
  • a batch size of 50 was chosen and each test iteration increases the statement count by an order of magnitude
  • all durations are expressed in milliseconds
The results are:
Statement countNo batch Insert durationNo batch Update durationBatch Insert durationBatch Update duration
30218178191144
300311327208217
300010471089556478
300005889603226402301
30000051785578691605220954

Comments

Popular posts from this blog

তৈদুছড়া ঝর্ণা : খাগড়াছড়ি

(সংগৃহীত) তৈদুছড়া ঝর্ণা:তৈদুছড়া এলাকায় খুব কাছাকাছিই বড় ঝর্ণা আছে তিনটি।এছাড়া এক ঝর্ণা থেকে আরেক ঝর্ণায় যাওয়ার পথে আছে বেশ কয়েকটি ছোট ঝর্ণা এবং যাওয়ার ঝিরিপথটাও অসাধারণ সুন্দর। ঝর্ণা দেখার সাথে সাথে পাহাড়ের ঢাল বেড়ে ট্রেকিং- এর থ্রিলটাও পাওয়া যাবে এখানে। পুরো এলাকাতেই জুম চাষ করা হয়,তাই পাহাড়ের ঢালে ঢালে দেখা পাবেন অসংখ্য সুন্দর সুন্দর জুম ঘরের। যেভাবে যাবেন: ঢাকা থেকে শান্তি পরিবহনে সরাসরি আসতে পারেন দিঘীনালায়,অথবা শ্যামলী, এস আলম সেন্টমার্টিন পরিবহনে খাগড়াছড়ি এসে খাগড়াছড়ি থেকে সিএনজ ি তে আসতে পারবেন দিঘীনালা।খাগড়াছড়ি -দিঘীনালা সময় লাগবে ৪০ মিনিট। ভাড়া ৬০টাকা। দিঘীনালার জামতলি থেকে হেঁটে যেতে হবে ঝর্ণায়,দিঘীনালা থেকে জামতলি যেতে পারবেন অটোরিকশায়।ঝর্ণায় যেতে ২-৩ ঘন্টা লাগে,সকালে রওনা দিলে বিকেলের মধ্যেই ফিরে আসা যায়। থাকবেন কোথায়:দিঘীনালায় ফ্যামিলি নিয়ে থাকার মত হোটেল রয়েছে বাস স্টেশনের পাশেই। বাস স্টেশন থেকে তিন মিনিট হাঁটলে বোয়ালখালি বাজারেও পাবেন থাকার হোটেল। সাথে যা যা নিতে হবে:শুকনো খাবার নিয়ে নিতে হবে সাথে,পানিও নিয়ে নেবেন।দড়ি নেবেন,কাজে লাগবে।অবশ্যাই ভালো গ্রিপ আ...

How to automatically rotate catalina.out daily

How to automatically rotate catalina.out daily or when it becomes bigger than 5M 1. Create this file /etc/logrotate.d/tomcat   2. Copy the following contents into the above file /usr/tomcat/default/logs/catalina.out {    copytruncate    daily    rotate 7    compress    missingok     size  5M   }   About the above configuration: Make sure that the path  /var/log/tomcat/catalina.out  above is adjusted to point to your tomcat’s catalina.out daily  -  rotates the catalina.out  daily rotate  – keeps at most  7  log files compress –  compresses   the rotated files size  – rotates if the size of catalina.out is bigger than  5M copytruncate  – truncates the original log file in place after creating a copy, instead of moving the old log file and optionally creatin...