Open Source Software and Database Technologies

Friday, October 10, 2014

Using Batch Statements In Java

Introduction:

Using batch statements is closely related with prepared statement usage, and it is also has huge performance benefit. If you need to insert,update or delete more than one tables or one tables with more than one times in the same connection you should use batching. By using batching , you will go to database only one time.If you use more than one tables in a batch statements it is called heterogen batch, otherwise it is called homogen batch. In heterogen batch you have to use addBatch(String sql) overloaded method of statement object and you cannot use prepared statement. Below is an example of homogen batching.

String CONN_STRING = "jdbc:mysql://localhost:3306/DB_NAME?user=USER&password=PWD";

String UPDATE_SQL = "UPDATE User SET age=? WHERE name=?";

Connection conn = null;

PreparedStatement pstmt = null;

try {

Integer[] ages = new Integer[] { 5, 10, 15};

String[] names = new String[] { “xxx”, “yyy”, “zzz”};

Class.forName("com.mysql.jdbc.Driver");

conn = DriverManager.getConnection(CONN_STRING);
conn.setAutoCommit(false);

pstmt = conn.prepareStatement(UPDATE_SQL);

for (int i = 0; i < ages.length; i++) {

pstmt.setInt(1, ages[i]);

pstmt.setString(2, names [i]);

pstmt.addBatch();

}

pstmt.executeBatch();
conn.commit();

} catch (SQLException e) {

try {

if (conn != null && !conn.isClosed())

conn.rollback();

// TODO logging...

} catch (SQLException e1) {

// TODO logging...

}

} catch (Exception e) {

try {

if (conn != null && !conn.isClosed())

conn.rollback();

// TODO logging...

} catch (SQLException e1) {

// TODO logging...

}

} finally {

try {

if (pstmt != null)

pstmt.close();

if (conn != null && !conn.isClosed())

conn.close();

} catch (SQLException e) {

// TODO logging...

}

Some important points of using batch statements are;

You can get the number of rows that were effected by each SQL statement from the array that the executeBatch invocation returns. This number does not include rows that were affected by triggers or by referential integrity enforcement.
Even if the autocommit mode is on, all the data will be committed to database after executing executeBatch.
When you use batching, you go to the database only for one time. However the database engine doesn’t necessarily execute all queries in one process.For example in DB2 you have to enable multi-row insert property to execute all batched queries in one process.In this case the executeBatch will not return the effected rows, instead return Statement.SUCCESS_NO_INFO (-2).This will give better performance since the database engine doesn’t need to return the effected rows, and the whole batch will be executed in one process.
If you have heavy insert or update statements, you can do the insert or update to memory, and later can write to database by an asynchron background thread.

Using single sql instead of batch:

You can use single sql for insert operation instead of batch statement. In this way you do multi insert with one sql statement, and again using prepared statement, you can pass your parameters.Single sql statement executed about 10 times faster than batch statements in my machine, since the database engine execute it in a single process like multi-row insert enabled DB2.(mentioned above in Batch statements)

For update or delete operations you can use IN keyword to use a single sql for multi operation.

Oracle example:

INSERT ALL

INTO test (id, name) VALUES (1, 'xxx')

INTO test (id, name) VALUES (2, 'yyy')

INTO test (id, name) VALUES (3, 'zzz')

SELECT * FROM dual;

Db2,Mysql, Mssql, Postgre Sql example:

INSERT INTO test (id, name) VALUES (1, 'xxx'), (2, 'yyy'), (3, 'zzz');

Note: In mysql you can use rewriteBatchedStatements=true property in connection string. It creates a single sql statement for all batch data, and send to database to execute in one process. That means its faster than standart batching.However, its slower than single sql statement approach because of the cost of creating single sql statement from all batch data. You can also tune max_allowed_packet property pf mysql to limit number of rows that can be executed in a batch statement. Depending on the processor and database engine, number of rows that can be executed in a batch may vary and you may have to limit max number of rows in a batch.

Conclusion

Batch statements are crucial in performance related applications, since it reduces the database calls, and improve response times. You can also choose to run single sql statements instead of batching to get faster results. Be careful about that, batch statements are not necessarily executed in a single process in database side, that is you may need to enable database specific properties to enable multi-row operations in one process.

Using Prepared Statement In Java

Introduction:

Using prepared statements, we can get faster and more secure results when we send queries to the database systems. In this write we’ll discuss both the benefits and performance impact of using prepared statements.

Why more secure ?

Below is a java code written to execute an update statement in mysql database.Normally we should use connection pooling instead of directly connecting the database, but here we use direct connection just for explanation.

String CONN_STRING = "jdbc:mysql://localhost:3306/DB_NAME?user=USER&password=PWD";

String UPDATE_SQL = "UPDATE User SET age=? WHERE name=?";

Connection conn = null;

PreparedStatement pstmt = null;

try {

Class.forName("com.mysql.jdbc.Driver");

conn = DriverManager.getConnection(CONN_STRING);
conn.setAutoCommit(false);

pstmt = conn.prepareStatement(UPDATE_SQL);

pstmt.setInt(1, 30);

pstmt.setString(2, "xxx");

int rows = pstmt.executeUpdate();
conn.commit();

System.out.println(rows + " rows updated");

} catch (SQLException e) {

try {

if (conn != null && !conn.isClosed())

conn.rollback();

// TODO logging...

} catch (SQLException e1) {

// TODO logging...

}

} catch (Exception e) {

try {

if (conn != null && !conn.isClosed())

conn.rollback();

// TODO logging...

} catch (SQLException e1) {

// TODO logging...

}

} finally {

try {

if (pstmt != null)

pstmt.close();

if (conn != null && !conn.isClosed())

conn.close();

} catch (SQLException e) {

// TODO logging...

}

As we see from the above code, the parameters that sql sentence expect were sent by question mark. Lets think the age and name value in the sql sentence comes from external world, ie from screen.If this is the case, when we directly add age and name value to our sql sentence instead of using prepared statement, we allow the external system to modify our sql sentence. On the other hand, with the help of prepared statement, the value that comes from external word will not be part of the sql sentence, instead it will be only a parameter. This prevents sql injection.An injected sql sentence might be

UPDATE User SET age=5 WHERE name=’xxx’ or 1=1. Here the “name” value comes from external world and include value ‘xxx’ or 1=1 and it updates all table rows with age 5. To prevent this you need to add extra controls in your java code, but thanks to the prepared statement, the name parameter cannot be send to be included in sql sentence like ‘xxx’ or 1=1.

Why better performance ?

First, lets examine the steps that a query passes before executing. I try to be generic, but these steps may vary slightly depending on the database engine.

1.) Parsing:

In this step, the sql sentence is checked for correct syntax, spelling and keywords. Also its checked for the existing of tables and columns.Depending on the database engine there may also be some syntax optimizations like converting right outer joins to left outer joins and converting subqueries to equivalent joins.

2.) Compiling:

This step may be optional. If the sql sentence include database specific functions or expressions, they need to be compiled to standart DML language.

3.) Optimizing:

The most important step is optimizing since it directly effects the performance.In the optimizing step the query processor of the database engine try to find out the most effective way to run the query. If the sql sentence is executed for the first time it looks for the available statistics and extract an execution plan(access path) for the query.This execution plan is extracted by looking to statistics of indexes, relations on tables and data load on tables. The plan may include any kind of join techniques including nested loop join, merge join ad hash join .For details of join techniques you can look at here.

4.) Execution:

After determining an execution plan, the query is executed as communicating between the query processor and the database engine.As we said, if there is a previously created plan exists, it will be used by the query processor.That means the first 3 steps will be executed only one time.How is this plan found by the processor may vary database to database.A possible solution may be storing hash of query at the first execution, and later use this hash value to determine if there is a plan exist. The most used plans may also be cached in memory for fast lookup.

Now lets look for the benefit of prepared statement at this point.Creating a execution plan for a query may take longer than executing the query.Since its a heavy load process,after the first execution, the plan is saved for later invocations, and re-used. If you don’t use prepared statement and send the sql parameters as part of the query, each execution will be treated as a different query by the query processor because changed parameters will change sql sentence, and each time it will try to create and use an execution plan.

Application side of using prepared statement:

Up to now we look for the database side of using prepared statements. However there are also application and application server side effects.In java side, since prepared statement is a heavy load object, it can be cached. Most of the application servers have ability of caching prepared statements.If it is cached, not every call of preparedStatement method of Connection object will create a new statement, instead it will get from the cache if there exist one.

Another important point is closing the prepared statement after using. Most of the database implementations have both connection limit and max. number of statements limit per connection. That means if you forget to close prepared statement, and if you use more than one statement for one connection, you can get database error indicating there is no available prepared statement to open.

Conclusion:

Today we talk about prepared statement usage in java. It is so important using prepared statements if you want more secure and faster applications.Prepared statements are used with fine grained jdbc calls and although the first execution is relatively slow because of the impact of creating execution plan, the later invocation is as fast as using stored procedures or sqlj calls. As a related topic you may like this article about batching.

menu

Friday, October 10, 2014

Using Batch Statements In Java

Using Prepared Statement In Java

About Me

Blog Archive

Followers