menu

Monday, November 17, 2014

Improving Database Performance in a Multithreading Environment-2

Introduction:

     We mentioned about some concepts on improving database performance in the first part of this paper[1]. Now we continue with another concept "Minimizing Contention" that one should consider when try to improve database performance.

Minimizing Contention:

     In a multithread environment, we can call every thread as a transaction. A transaction usually means a sequence of information exchange and related work (such as database update) that is treated as a unit for the purposes of satisfying a request and for ensuring database integrity. There are four important properties of a transaction, Atomicity, Consistency, Isolation and Durability. We call them by capital letters as ACID properties.

1.) Atomicity:

     All changes to data are performed as if they are a single operation. That is, all the changes are performed or none of them are. For example, in an application that transfers funds from one account to another, the Atomicity property ensures that, if a debit is made successfully from one account to another, the correspending credit is made to the other account.

2.) Consistency:

     Data is in a consistent state when a transaction starts and ends.For example, in an application that transfers funds from one account to another, the Consistency property ensures that the total value of funds in both accounts is same, both at the start and end of the transaction.

3.) Isolation:

     The intermediate state of a transaction is invisible to other transactions.As a result, transactions that run concurrently should be serialized.For example, in an application that transfers funds from one account to another, the Isolation property ensures that other transactions see the transferred fund in one account or the other, but not in both nor in neither.

4.) Durability:

     After a transaction successfully completes, changes to the data persists and are not undone, even in the event of a system failure.For example, in an application that transfers funds from one account to another, the Durability property ensures that the changes made to the each account will not be reversed.

We now describe the Anomalies that can be encountered in a multithread environment.

1.) Dirty Reads:

     A dirty read happens when a transaction reads data that is being modified or reads data that recently inserted by another transaction that has not yet been committed. As an example;
  • Transaction A begins,
  • Update Employee Set salary = 30450 Where empno = '000070', (by transaction A)
  • Transaction B begins,
  • Select * From Employee Where empno = '000070' (by transaction B),
  • Transaction A rollback.
Transaction B sees data updated by transaction A. This update has not yet been committed.

2.) Non-Repeatable Reads:

     Non-Repeatable reads happen when a query returns data that would be different if the same query were repeated in the same transaction. Non-Repeatable reads can occur when other transactions are modifying data that a transaction is reading. As an example;
  • Transaction A begins,
  • Select * From Employee Where empno = '000070', (by transaction A)
  • Transaction B begins,
  • Update Employee Set salary = 30000 Where empno = '000070' (by transaction B)
  • Select * From Employee Where empno = '000070', (by transaction A)
Transaction B updates row viewed by transaction A before transaction A commits. If transaction A issues the same Select statement the result will be different.

3.) Phantom Reads:

     Records that appear in a set being read by another transaction. Phantom reads can occur when other transactions insert rows that would satisfy the Where clause of another transaction. As an example;
  • Transaction A begins,
  • Select * From Employee Where salary > 30000, (by transaction A)
  • Transaction B begins,
  • Insert Into Employee (empno, firstname, midinit, lastname, job, salary) Values ('000350', 'TEST', 'TEST', 'TEST', 'TEST', 75000), (by transaction B)
  • Select * From Employee Where salary > 30000, (by transaction A)
Transaction B inserts a row that would satisfy the query in Transaction A if it were issued again.

All these anomalies can be prevented by using the third property of ACID which is Isolation. There are four possible Isolation Levels;

1.) TRANSACTION_READ_UNCOMMITTED: Allows dirty read, non-repeatable read and phantom read.

2.) TRANSACTION_READ_COMMITTED: Prevents dirty read, allows non-repeatable read and phantom read.

3.) TRANSACTION_REPEATABLE_READ: Prevents dirty read and non-repeatable read, allows phantom read.

4.) TRANSACTION_SERIALIZABLE: Prevents dirty read, non-repeatable read an phantom read. 

As we see, if we use TRANSACTION_SERIALIZABLE isolation level, we can get rid of all the anomalies that we may encounter. However, when we consider in terms of performance, using such a high level isolation would break down our system and we would get very high response time in our applications. What we have to do is to design the system as it will allow concurrent access to the database resources. To do this we can apply several steps;
  • Divide the physical database into partititons as they don't interrupt with each other. Using partitions may allow us to read committed data since no thread will wait others that deal with data that in different partitions.(Consider a case where all your clients data are in different partitions so they can work individually without waiting others, so no need to read uncommitted data, every client can read their committed data.) To be another example suppose you have partitioned table on a column, but you need to run parallel jobs on that table that select data on a different column's range. In that case the parallel jobs most likely lock each others jobs since the table is not partitioned on the column that used in the parallel jobs.Therefore, you need to carefully desing and choose the partition columns.
  • Redesign the application in a way that it won't allow to long transactions and increase commit frequency.
  • Put the data manipulation operations at the end of the transaction.
Even we apply all these steps, we may still get locks, deadlocks and livelocks if our application does some Update and Delete operations. Even if we select uncommitted data , the nature of Update, Delete and Insert operations still cause some locks on the system. There are basically three types of locks. These locks can be on row, page, table, partition or tablespace. Before going into the lock modes let's explain the Lock Scopes.

1.) Row Lock:

     You can lock one row of a table. A program can lock one row or selection of rows while other programs continue to work on others rows of the same table. This type of lock uses more CPU resource than other locks, so it has a drawback in usage.

2.) Page Lock:

     The database server stores data in units called disk pages. A disk page contains one or more rows. In some cases it is better to lock a disk page than to lock individual rows on it. For example, with operations that requires changing large number of rows, you might choose page-level locking because row level locking (lock per row) might not be cost effective.

3.) Table Lock:

     Table locks lock the entire table, as if one process continues on the table the other processes must wait. This lock is not suitable for online transactional processing applications, and usually used by online analytical processing applications.

Now we can look for Lock Modes.

1.) Shared Locks (S-Locks):

     The lock owner and any concurrent process can read, but not change the locked page or row. Concurrent processes can acqiure S or U lock on the page or row or might read data without acquiring page or row lock. For example, simple Select statement cause S-lock.

2.) Update Locks (U-Locks):

     The lock owner can read but not change the locked page or row. Concurrent processes can acqiure S lock or might read data without acquiring page or row lock. But no concurrent process can acquire U lock. For example, Select For Update statement cause U-lock.

3.) Exclusive Locks (X-Locks):

     The lock owner can read or change the locked page or row. A concurrent process cannot acqiure S, U or X lock on page or row. However, a concurrent process might read data without acquiring page or row lock. For example, Update, Delete or Insert statement cause X-lock.

Below is a summary table of allowed locks.


Locking
S
U
X
S
Y
Y
N
U
Y
N
N
X
N
N
N
                                                                  Figure 1

If we examine these locks, we see that an Exclusive lock caused by an Update operation with a non-unique predicate may lock whole table (in the case of a table lock and with no Transaction_Read_Uncommitted isolation level).

You have to be careful with the default isolation level of your database system. For example if your database's default isolation level is read_committed, your select statements have to wait the release of lock if there is an update or delete on the region that you selected. If the default isolation level is repeatable_read then all the update and delete oparations have to wait the release of lock if there is a select on the region that you update or delete. Finally if the dafault level is serializable then all the insert operations that satifsy any select statement's range have to wait the release of lock of the select statement even if the insert of data will be on a different space than the locked space. If you want to select uncommitted data you should change the isolation level to read_uncommitted or do some database specific select statements. For example, the default isolation level of DB2 is read_committed and if you want uncommitted data to be selected then use select statement ending with UR(Uncommitted read) in DB2. 
As an important point for MYSQL users, the default isolation level of MYSQL is repeatable_read, so if you encounter some performance problems because of lock wait time in database, this may be the cause.

Example 1: As an example consider the following scenario that run on a DB2 database instance. We have table T1 using page lock with the following columns and values.

T1 --> There is a unique index on (Col1, Col2)


Col1
Col2
1
1000
2
2000
3
3000
4
4000

Let's say there are two pages, the first 3 rows are on page one, and 4th row is on page two.Now consider the following steps.
  • Transaction A starts and execute Update T1.. Where Col1 = 1 (not committed yet)
  • Transaction B starts and try to execute Update T1.. Where Col1 = 4
Here we have two update statements, so no need to care about the isolation level of the system. An update will always lock the target row or page and concurrent transactions must wait the release of that lock.

Even the transaction B not affect the transaction A being on different system pages, transaction B may still have to wait the transaction A, since the first or second sql statement has not have a unique predicate in the where clause and in our example the DBMS run the queries without using an index instead use a table scan (this happens especially when there is a small amount of data). That means transaction A locks the whole table until it commits and so the transaction B have to wait the commit of transaction A.
If we run transaction B with Delete operation, we might not get a lock. This is because usually delete lock is considered to be a lighter lock than update lock, since most of the DBMS systems doesn't delete the row physically instead it puts a flag indicating the row deleted and the row will physically removed at a later time choosed by the optimizer.

Here another point is using a unique predicate in the where clause. If we use unique predicate in one of the update statements above, depending on the implementation of the DBMS we possibly get no lock since it is enough having an unique predicate to maintain the concurrency.

Example 2: Consider a scenario again on a DB2 database instance. We have a table T1 using page lock and TRANSACTION_SERIALIZABLE isolation level with the following columns and values.

T1 --> There is a unique index on (Col1, Col2)

Col1
Col2
1
1000
2
2000
3
3001

Now consider the following steps.
  • Transaction A starts and executes Select * From T1 Where Col1 > 1(not committed yet)
  • Transaction B starts and executes Insert Into T1 (3, 3000) 
Here transaction B will have to wait the transaction A, since the Select statement does a range scan, and the Insert statement touches the result of the Select statement. Notice that insert statement will insert new data in some place that is not locked at that moment, that means Serializable isolation level check a logical lock in addition to the physical lock by comparing the where clause of the select statement and the new data that will come with the insert statement.

As you see if we use TRANSACTION_SERIALIZABLE isolation level, all transactions are serialized and this really slow down our application performance. So we should try to avoid using this high level isolation by changing the application logic.

Conclusion:

     In this write, we examine the database application performance in a multithread environment in terms of contention problems. There are many aspects that affect the performance of such an application. Contention is one of the most critical point in database performance since it can cause lock, deadlock or livelock problems. To be able to solve the contention problems one must understand and apply correctly, transaction anomalies, transaction levels, lock scopes and lock modes. In the next paper we'll be talking about another important concept in database performance "Optimizing Memory".

References and Resources:

IBM documents, http://publib.boulder.ibm.com/infocenter/cicsts/v3r2/index.jsp?topic=%2Fcom.ibm.cics.ts.productoverview.doc%2Fconcepts%2Facid.html

 Guy Harrison, Oracle® Performance Survival Guide A Systematic Approach to Database Optimization, Prentice Hall, Part IV- V (2009)

No comments:

Post a Comment