menu

Monday, November 17, 2014

Improving Database Performance in a Multithreading Environment-3

Introduction:

     We mentioned about "Minimizing Contention" on improving database performance in the second part of this paper[2]. Now we continue with another concept , "Insert Only Systems and Optimizing Insert Performance" that one may consider when try to improve database performance. Update and Delete operations are very likely to cause locks. If we plan an insert only system and don't choose "Serializable" isolation level and choose to select uncommitted data we'll get no lock in our system and have the highest concurrency. However, insert only systems may cause some other problems. One of them is - since every operation like update and delete is done using an insert - the amount of data in the database will increase rapidly. If we don't consider that, because of huge amount of data, we can get performance degredation. To solve that issue we have to archive our data properly and rapidly. Also we can use partitioning to behave our data as partitioned causing a decrease in the amount of data being operated. Another problem with insert only systems appears when an application has too hight insert rate. The reason for that is, almost all tables generally have a clustered index in which the order of the data is same with the table. When an insert request comes to the DBMS it tries to find a good place to the new data to obey the clustered rule. It causes a search operation to find out the correct place to insert. If there is too big data to search, it will slow down our application. 

Now we'll mention about some methods to optimize insert performance.

Insert Performance Issues:

1.) Indexes on tables are usually created for performance reasons to decrease the search time when selecting data. However every new index added to the table also has a negative impact which appears when an insert, update or delete operation is done. With these DML operations every index must be rearranged. Thus, one could try to decrease the number of indexes on the tables. 
Also most of the indexes on today DMBS systems are B tree indexes and if the levels of the index tree or number of pages at levels are high we possibly get performance degradation. To solve these issues many DBMS systems have capability to keep indexes in memory so we have to try to keep affected indexes, at least non-leaf pages which are %1 of total index pages, in memory by changing the cached memory size. 

2.) If there are sequential inserts on our insert only system, we can apply some methods to insert at the end of the table without searching for a place to obey the clustered colums rule. By this way, one can eliminate read I/O since no search will be done, and reduce write I/O since whole data will be written to the same place.

3.) By inserting rows to empty buffers that is to the memory and scheduling disk write to a later time is a good option. By this way, disk I/O eliminated and DBMS will do the I/O when the system is not heavy loaded.

4.) A DBMS system always keeps some logs for recovery reasons. We have to minimize that log records since it will get another I/O cost to our system. To do this, we may have to do different preventions, for example for DB2 database, we can put frequently changed and variable columns to the end of the table, because a variable lenght row is logged from the first changed byte to the end of the row when an update operation is done on that row (This issue is solved by the DB2 release 9 by automatically placing variable columns to the end of the table). We can also decrease log size by changing some parameters on DBMS system, for example in DB2 "LOAD Log No" parameter means there will be no log created when data is loaded. 

5.) Another issue is batch inserts. That means there will be no need to go the database system for every DML operation in a transaction like Update, Delete or Insert. Instead every operation will be finished in one step. As an example, this can be done with java prepared statement batch option. See in this post.

6.) Usually increasing the commit frequency is the preferred way to minimize the lock wait time.

Conclusion:

     In this write we examine database application performance in a multithread environment in terms of insert only systems to prevent lock issues. If we can design our system with a good archive mechanism we may use insert only systems preventing locks and increasing concurrent access.

References and Resources:

Guy Harrison, Oracle® Performance Survival Guide A Systematic Approach to Database Optimization, Prentice Hall, Part IV- V (2009)

No comments:

Post a Comment