Introduction:
Using batch statements is closely related with prepared statement usage,
and it is also has huge performance benefit. If you need to insert,update or
delete more than one tables or one tables with more than one times in the same connection you should
use batching. By using batching , you will go to database only one time.If you
use more than one tables in a batch statements it is called heterogen batch,
otherwise it is called homogen batch. In heterogen batch you have to use
addBatch(String sql) overloaded method of
statement object and you cannot use prepared statement. Below is an
example of homogen batching.
String CONN_STRING = "jdbc:mysql://localhost:3306/DB_NAME?user=USER&password=PWD";
String UPDATE_SQL = "UPDATE
User SET age=? WHERE name=?";
Connection conn = null;
PreparedStatement pstmt = null;
try {
Integer[] ages = new
Integer[] { 5, 10, 15};
String[] names = new
String[] { “xxx”, “yyy”, “zzz”};
Class.forName("com.mysql.jdbc.Driver");
conn = DriverManager.getConnection(CONN_STRING);
conn.setAutoCommit(false);
conn.setAutoCommit(false);
pstmt = conn.prepareStatement(UPDATE_SQL);
for (int i =
0; i < ages.length; i++) {
pstmt.setInt(1,
ages[i]);
pstmt.setString(2,
names [i]);
pstmt.addBatch();
}
pstmt.executeBatch();
conn.commit();
conn.commit();
} catch
(SQLException e) {
try {
if
(conn != null && !conn.isClosed())
conn.rollback();
//
TODO logging...
} catch
(SQLException e1) {
//
TODO logging...
}
} catch
(Exception e) {
try {
if
(conn != null && !conn.isClosed())
conn.rollback();
//
TODO logging...
} catch
(SQLException e1) {
//
TODO logging...
}
} finally {
try {
if
(pstmt != null)
pstmt.close();
if
(conn != null && !conn.isClosed())
conn.close();
} catch
(SQLException e) {
//
TODO logging...
}
}
Some important points
of using batch statements are;
- You can get the number of rows that were effected by each SQL statement from the array that the executeBatch invocation returns. This number does not include rows that were affected by triggers or by referential integrity enforcement.
- Even if the autocommit mode is on, all the data will be committed to database after executing executeBatch.
- When you use batching, you go to the database only for one time. However the database engine doesn’t necessarily execute all queries in one process.For example in DB2 you have to enable multi-row insert property to execute all batched queries in one process.In this case the executeBatch will not return the effected rows, instead return Statement.SUCCESS_NO_INFO (-2).This will give better performance since the database engine doesn’t need to return the effected rows, and the whole batch will be executed in one process.
- If you have heavy insert or update statements, you can do the insert or update to memory, and later can write to database by an asynchron background thread.
Using single sql instead of batch:
You can use single sql for
insert operation instead of batch statement. In this way you do multi insert
with one sql statement, and again using prepared statement, you can pass your
parameters.Single sql statement executed about 10 times faster than batch
statements in my machine, since the database engine
execute it in a single process like multi-row insert enabled DB2.(mentioned
above in Batch statements)
For update or delete
operations you can use IN keyword to use a single sql for multi operation.
Oracle example:
INSERT ALL
INTO test (id, name) VALUES (1, 'xxx')
INTO test (id, name) VALUES (2, 'yyy')
INTO test (id, name) VALUES (3, 'zzz')
SELECT * FROM
dual;
Db2,Mysql, Mssql, Postgre Sql example:
INSERT INTO test (id, name) VALUES (1, 'xxx'), (2, 'yyy'), (3, 'zzz');
Note: In mysql you can
use rewriteBatchedStatements=true property in connection string.
It creates a single sql statement for all batch data, and send to database to
execute in one process. That means its faster than standart batching.However, its
slower than single sql statement approach because of the cost of creating
single sql statement from all batch data. You can also tune max_allowed_packet property pf mysql to limit number of rows that can be executed in a batch statement. Depending on the processor and database engine, number of rows that can be executed in a batch may vary and you may have to limit max number of rows in a batch.
Conclusion
Batch statements are crucial in performance related applications, since it reduces the database calls, and improve response times. You can also choose to run single sql statements instead of batching to get faster results. Be careful about that, batch statements are not necessarily executed in a single process in database side, that is you may need to enable database specific properties to enable multi-row operations in one process.