Monday, October 27, 2014

Database locks with Spring, Hibernate and Postgres

In the current project we faced the problem of concurrent changes to database, for the data that should be accessed sequentially. Imagine you have the customer's bank account where he can withdraw the money. If the customer is not a person, but company, and if he can have multiple users accessing the bank application, without any locks there's a chance for situation where two or more users depute transfers, that exceed the account balance, but because data is accessed concurrently, they both can make payoff. 

This is the simplest example, but one can imagine a lot of more such cases, that influence a lot of different applications. The lock for the single customer is simple, but imagine if we need lock for more than one customer in the same time. For example if two customers make some transaction between them, and the transaction depends on the account balance on both accounts. In such instance in single database transaction there are both sides required to be locked while the transaction lasts, to avoid the same problem.

Regarding such case we need also to be aware of deadlock problem. If the single row lock is an atomic operation for database, two locks are two operations, and this can produce following deadlock:

  1. In transaction A customer 1 is locked.
  2. In transaction B customer 2 is locked.
  3. In transaction A customer 2 is locked (is waiting).
  4. In transaction B customer 1 is locked (deadlock: A is waiting for 2, and B is waiting for 1, both are locked).

But first let's review the possibilities of what can we use for dealing with this problem in usual database application. Our exemplary stack here is standard Java (Spring + Transactional AOP-s + Hibernate) and Postgres database.

  1. Language-level locks, by usage of object monitors or java.util.concurrent.locks package. Problems: doesn't work for clustered applications and dealing with deadlocks is not obvious and difficult.
  2. Transaction isolation level. Here we'd have following opportunities (see postgres reference):
    1. Read uncommited - this would be rather a gambling, but for postgres this level doesn't exist.
    2. Read commited -  this is the default transaction isolation level causing the problem, because both transactions we consider see only data not commited by other transactions, so they can both work on the same balance "snapshot" from transaction start, and they can read the same value for subtracting from database. This level introduces locks on database level by default, but only for update, not for select. After acquiring the lock during update by transaction A, the transaction B is waiting with its update, but after the transaction A is finished, the B proceeds with its update anyway. So we may have the situation: A reads 100, B reads 100, A subtracts 20 from 100 and updates db with 80, then B subtracts 20 from 100 and updates db with 80. In the result the value in db is 80, while should be 60.
    3. Repeatable read - looks even worse. Two transactions can never see their changes after they are started. But the update lock works in different way. If two transactions want to update the same row, one of them acquires the lock, and second one will fail with ERROR:  could not serialize access due to concurrent update.  So it should be feasible to deal with our problem using this level and update lock, and even better is...
    4. Serializable. It works like repeatable read, but postgres tries to simulate sequential transaction execution. It tries to sort all queries in transaction, so that it looks that transactions are executed one by one. But this is only simulation, and transactions are really executed concurrently, though. Moreover if it comes across the situation where it can't perform two transactions "sequentially", it throw the same exception as for repeatable read.
  3. Explicit database locking. There are many options, specified in postgres reference, but the most common is select for update. This solution may cause deadlocks in the way I described above. Postgres fortunately detects such deadlocks and throws appropriate exception when it happens.
So, it looks that we can use specific transaction isolation level here, or use the explicit locking. All these solutions require to have fail-safe scenario when exception is thrown (could not serialize or deadlock exception). But there are following things I don't like if I think about transaction isolation:
  • They crash after doing a job, and the job can be significant. For example let's assume that the work costs 1 sec for each transaction, and on the end of this work we have this update that fails. If we have for example 10000 users concurrently, we can waste a lot of CPU.
  • Serializable transaction level has its performance requirements. It consumes more RAM and CPU to fulfill its requirements. Again, assuming highly loaded application - do we really need to slow down the database with serializable isolation level to achieve our goals?
The solution devoid of these issues is explicit locking and I decided to carry on with this solution, what I'll describe in further part.

The lock


First let's implement the lock service. The select for update clause in postgres locks the concrete row in database against other select for update-s, but it is still unlocked for ordinary selects. So, for the code we need to acquire the lock we can use select for update, what esures that the rest of application works fine while operation in the meanwhile.

In the real world application we may have a lot of service methods that require lock, so I added here the code preventing to have more than one lock for given customer ID, using TransactionSynchronizationManager resources. On first lock there's transaction listener registered, which cleans these resources. If the deadlock happens in postgres, it is indicated by LockAcquisitionException in java code.


Retry on deadlock


OK, nothing really interesting happened for now, but here comes this interesting part. How can we handle deadlocks properly? It'd be the best to retry whole transaction on deadlock few times, until the lock can be acquired and second transaction finishes.

Here I need to mention the contributor from which I've caught some ideas for retrying transactions - Jelle Victoor. But I consider that his solution has some issues, so I extended it a little. I used the same idea, of using AOP aspect to repeat transaction, but I want to have this working without putting everywhere additional annotations, because in the proposed way I'd need to have a lot of double @Transactional and @DeadLockRetry annotations. So, finally I'd like to have it working transparently with @Transactional.

Moreover let's consider the situation of nested @Transactional services. The usual example is following:
  • @Transactional ServiceA.doJob() -> calls...
    • @Transactional ServiceB.doJob() -> calls...
      • @Transactional ServiceC.doJob() -> and here comes the deadlock exception.
The same problem is when you use separate @DeadLockRetry, because the method with this annotation can be called from other service, and the real method that started the whole transaction is somewhere else, so repeating the code from method with @DeadLockRetry annotation may not work, because we want to retry whole transaction.

The trick is how to detect the @Transactional ServiceA.doJob() service execution and after rolling back whole transaction, retry overall operation. I used similar idea to Jelle - to use AOP proxy (with Aspect4J annotations so remember to add <aop:aspectj-autoproxy /> to your Spring config) and to have transaction manager order 100 (<tx:annotation-driven order="100" transaction-manager="transactionManager" />) and my deadlock aspect with order 99 to ensure it runs first.

If the deadlock aspect comes before transactional, we may detect where is our ServiceA.doJob() using the same TransactionSynchronizationManager as previously. If we are on the top level method, the transaction no longer exists because transactional AOP proxy already rolled it back. If the transaction still exists, this mean that we are not on the top level method, but this is nested transactional method.

The complete source: