Thursday, December 25, 2014

BIRT with Hibernate using POJO-s

In previous BIRT versions there were three ways to use BIRT with your Hibernate datasource:
  1. Just plain SQL on Hibernate database.
  2. Scripting data source.
  3. Hibernate ODA data source.
Previously I've been using plain SQL, but with complex database model it becomes very difficult (a lot of properties, joins and other relationships). Scripting data source using javascript is a bit too awkward for me, and certainly adds another level of application complexity, and another level of layers where "the things may go wrong". Finally, Hibernate ODA data source from JBoss tools - this is what I wanted to use in older times, when I was using Eclipse, but for plain no-jboss application there were a lot of problem with these tools. They seem to be related to SEAM. Anyway AFAIR I had more problems with maintaining proper hibernate configuration in Eclipse (especially that it was partially annotation, and partially XML based) than the value of having working hibernate queries in IDE.

Recently, during a work on another application using BIRT, I started to think about another approach: how to delegate BIRT datasources to pure Java providers, that could be parametrized easily from runtime (without BIRT API) and which return my own arbitrary Java object models. It can be hibernate entity model, it can be something else, like for example DTO objects model, whatever I want to use in report and I consider better structured for particular reports. The most important part here is that these models would refactor easily with compiler support during the application development.

After a little research and tests it turned out feasible using BIRT POJO datasource, and now I consider it the best way of making BIRT reports from Java. The full source code of my example is on github, and the description is below.

Model + Data Set


First let's have a look on the model:



I wanted to use some associated objects in the model, to see how it works in BIRT, so what we have here is the company under which there are few departments, and our target report will be the list of departments grouped by companies, displayed directly from the database. To be more specific - directly from Hibernate queries.

The first thing is that we need some data set that can be used on BIRT design time. POJO data source is pretty badly documented and to reveal how to do it I needed to read some of BIRT sources. But what I discovered during this work is that the data set for the BIRT design time can be any class providing public Object next() method (it goes then through PojoDataSetFromCustomClass in the BIRT engine). So here is my mock data set implementation:

There are no bingings to Hibernate at all during this stage so we can prepare our object lists for the report design time independently from the database itself. My mock data set returns companies with "Mock" prefix to make it feasible to tell apart mock objects, from the real ones. Under every company there are three departments created (IT, HR and Sales).

Now we need to package everything (entity model and mock dataset) to a single jar that will be connected to BIRT for a while.

Report design


Now we can start preparing new BIRT report starting from the POJO data source:


Then we need to add our jar to data source config, and check the option below:



Now we need to configure POJO data set, giving our mock data set class name as the objects source:


The important part here is the key under which BIRT will look for the data set (APP_CONTEXT_KEY_MOCKCOMPANYDATASET). This is not important yet, but will be important in real runtime, where we will switch the mock data set to real one.

Now we can just select the POJO class, and configure column mappings:


I used property of associated Department entity here to see how the column mapping will be done. It looks good, and now on the Preview Results tab we can see how the column data is generated by BIRT from the underlying Company objects:


It fetches them as the list of departments that could be grouped by company, which seems to be good initial data set to create intended report.

On the layout view I just put the list bound to this data set, and grouped items by company id:


Now it's possible to generate mock report from the BIRT designer:

The runtime


In the runtime now what we want to do, is to replace the mock data set from the design stage, with some real data fetched from the database by Hibernate using entity model. A lot of this code is just to start the report engine, but I described below what's important here:


In line 2 we create BIRT application context, that is just a HashMap. To this map we need to pass our objects collection, under the key configured on design time. My companyRepository.findAll() method returns Hibernate Query. The "collection" passed to the application context can be Collection itself (so the list() is an option here), but I prefer to use Iterator (iterate() option), because for bigger reports it'd fetch data from database sequentially, not in a big single list.

For really inquisive people - BIRT can convert following objects to its internal IPojoDataSet representation:

  1. Any object providing public Object next() method, using PojoDataSetFromCustomClass.
  2. Collection, using PojoDataSetFromCollection.
  3. Array of objects, using PojoDataSetFromArray.
  4. Iterator, using PojoDataSetFromIterator.
In line 27, when the task is ready to run, we need to pass the application context to the task, to replace mock data set, with our real runtime data set.

The finally rendered report (all entities in database have "Company" prefix) looks this way:
Everything above can be easily built and tested from the command line from my birt-hibernate-example sources.

Monday, October 27, 2014

Database locks with Spring, Hibernate and Postgres

In the current project we faced the problem of concurrent changes to database, for the data that should be accessed sequentially. Imagine you have the customer's bank account where he can withdraw the money. If the customer is not a person, but company, and if he can have multiple users accessing the bank application, without any locks there's a chance for situation where two or more users depute transfers, that exceed the account balance, but because data is accessed concurrently, they both can make payoff. 

This is the simplest example, but one can imagine a lot of more such cases, that influence a lot of different applications. The lock for the single customer is simple, but imagine if we need lock for more than one customer in the same time. For example if two customers make some transaction between them, and the transaction depends on the account balance on both accounts. In such instance in single database transaction there are both sides required to be locked while the transaction lasts, to avoid the same problem.

Regarding such case we need also to be aware of deadlock problem. If the single row lock is an atomic operation for database, two locks are two operations, and this can produce following deadlock:

  1. In transaction A customer 1 is locked.
  2. In transaction B customer 2 is locked.
  3. In transaction A customer 2 is locked (is waiting).
  4. In transaction B customer 1 is locked (deadlock: A is waiting for 2, and B is waiting for 1, both are locked).

But first let's review the possibilities of what can we use for dealing with this problem in usual database application. Our exemplary stack here is standard Java (Spring + Transactional AOP-s + Hibernate) and Postgres database.

  1. Language-level locks, by usage of object monitors or java.util.concurrent.locks package. Problems: doesn't work for clustered applications and dealing with deadlocks is not obvious and difficult.
  2. Transaction isolation level. Here we'd have following opportunities (see postgres reference):
    1. Read uncommited - this would be rather a gambling, but for postgres this level doesn't exist.
    2. Read commited -  this is the default transaction isolation level causing the problem, because both transactions we consider see only data not commited by other transactions, so they can both work on the same balance "snapshot" from transaction start, and they can read the same value for subtracting from database. This level introduces locks on database level by default, but only for update, not for select. After acquiring the lock during update by transaction A, the transaction B is waiting with its update, but after the transaction A is finished, the B proceeds with its update anyway. So we may have the situation: A reads 100, B reads 100, A subtracts 20 from 100 and updates db with 80, then B subtracts 20 from 100 and updates db with 80. In the result the value in db is 80, while should be 60.
    3. Repeatable read - looks even worse. Two transactions can never see their changes after they are started. But the update lock works in different way. If two transactions want to update the same row, one of them acquires the lock, and second one will fail with ERROR:  could not serialize access due to concurrent update.  So it should be feasible to deal with our problem using this level and update lock, and even better is...
    4. Serializable. It works like repeatable read, but postgres tries to simulate sequential transaction execution. It tries to sort all queries in transaction, so that it looks that transactions are executed one by one. But this is only simulation, and transactions are really executed concurrently, though. Moreover if it comes across the situation where it can't perform two transactions "sequentially", it throw the same exception as for repeatable read.
  3. Explicit database locking. There are many options, specified in postgres reference, but the most common is select for update. This solution may cause deadlocks in the way I described above. Postgres fortunately detects such deadlocks and throws appropriate exception when it happens.
So, it looks that we can use specific transaction isolation level here, or use the explicit locking. All these solutions require to have fail-safe scenario when exception is thrown (could not serialize or deadlock exception). But there are following things I don't like if I think about transaction isolation:
  • They crash after doing a job, and the job can be significant. For example let's assume that the work costs 1 sec for each transaction, and on the end of this work we have this update that fails. If we have for example 10000 users concurrently, we can waste a lot of CPU.
  • Serializable transaction level has its performance requirements. It consumes more RAM and CPU to fulfill its requirements. Again, assuming highly loaded application - do we really need to slow down the database with serializable isolation level to achieve our goals?
The solution devoid of these issues is explicit locking and I decided to carry on with this solution, what I'll describe in further part.

The lock


First let's implement the lock service. The select for update clause in postgres locks the concrete row in database against other select for update-s, but it is still unlocked for ordinary selects. So, for the code we need to acquire the lock we can use select for update, what esures that the rest of application works fine while operation in the meanwhile.

In the real world application we may have a lot of service methods that require lock, so I added here the code preventing to have more than one lock for given customer ID, using TransactionSynchronizationManager resources. On first lock there's transaction listener registered, which cleans these resources. If the deadlock happens in postgres, it is indicated by LockAcquisitionException in java code.


Retry on deadlock


OK, nothing really interesting happened for now, but here comes this interesting part. How can we handle deadlocks properly? It'd be the best to retry whole transaction on deadlock few times, until the lock can be acquired and second transaction finishes.

Here I need to mention the contributor from which I've caught some ideas for retrying transactions - Jelle Victoor. But I consider that his solution has some issues, so I extended it a little. I used the same idea, of using AOP aspect to repeat transaction, but I want to have this working without putting everywhere additional annotations, because in the proposed way I'd need to have a lot of double @Transactional and @DeadLockRetry annotations. So, finally I'd like to have it working transparently with @Transactional.

Moreover let's consider the situation of nested @Transactional services. The usual example is following:
  • @Transactional ServiceA.doJob() -> calls...
    • @Transactional ServiceB.doJob() -> calls...
      • @Transactional ServiceC.doJob() -> and here comes the deadlock exception.
The same problem is when you use separate @DeadLockRetry, because the method with this annotation can be called from other service, and the real method that started the whole transaction is somewhere else, so repeating the code from method with @DeadLockRetry annotation may not work, because we want to retry whole transaction.

The trick is how to detect the @Transactional ServiceA.doJob() service execution and after rolling back whole transaction, retry overall operation. I used similar idea to Jelle - to use AOP proxy (with Aspect4J annotations so remember to add <aop:aspectj-autoproxy /> to your Spring config) and to have transaction manager order 100 (<tx:annotation-driven order="100" transaction-manager="transactionManager" />) and my deadlock aspect with order 99 to ensure it runs first.

If the deadlock aspect comes before transactional, we may detect where is our ServiceA.doJob() using the same TransactionSynchronizationManager as previously. If we are on the top level method, the transaction no longer exists because transactional AOP proxy already rolled it back. If the transaction still exists, this mean that we are not on the top level method, but this is nested transactional method.

The complete source:


Sunday, September 21, 2014

Tracking object references in JavaScript

It is a common case in AngularJS to have some model loaded on the main view (like list of objects) and to use these objects in other controllers (like the object details view). Usually it's done by holding the reference of the list object in other controller scope, to interact with this reference. Until these both objects point to each other (both references point to the same object) it's very fine. The changes from the details controller are reflected in a list and contrarywise.

But I frequently come across the situation where at least one of these objects is refreshed from the server (eg. in async comet event) and this relationship is lost. A lot of case-by-case code is required to be written to support such instances on the client side.

Today I've been thinking about tracking the object references generally in JavaScript and how it can be done. Unfortunately it looks that it can't. JavaScript doesn't really support real references like eg. in C language, that could be used to achieve this. In JS function there's no way to have access to reference that carries the object as the function argument, and thus to modify this reference.

But indeed there's a way to have access to the reference - using the closure. Closure holds references to all objects belonging to the closure scope. After a little time of playing I figured out some solution for such hypothetical reference tracker:

And it works :)

Now, we have some more complex case. We are just re-assigning single object holding whole list, and we may have only the item (from the example above) assigned elsewhere. This is not as simple as the previous example, but feasible if we think about the model as persistent objects model. All persistent objects have some unique ID assigned. For example if you use UUID on server side, it can be UUID. If you use the numeric ID, it can be combination of ID and object type (class) etc. You can always figure out easily some other method to have unique ID for you objects some way.

In such instance, to achieve the goal we need to have the previous JSON structure, and to scan new one, looking for objects with the same ID, and then reuse our "reference tracker" above. Here is full source including the new usage:

To have it clean, I  also added cleanup() function to clean the objects from $ref reference, to have clean objects for JSON representation to be sent to server.

Sunday, July 27, 2014

Tomcat, Atmosphere and Spring Security

Here I'd like to describe another interesting case I've been struggling with for recent few days. This involves the following use case: enable asynchronous events support for Tomcat/Spring multi-tenancy SaaS application, that can be pushed to listening client groups. To be specific, the event should be channeled to following groups: to specific user, to all users of specific tenant and to all users.

Atmosphere + Tomcat

The fancy new technology for async processing in Java world is Atmosphere, and I use it in this example. Unfortunately I started with horribly preconfigured Atmosphere which apparently locked Tomcat after making some number requests, and it wasn't able to serve more requests. It turned out that the working Tomcat + Atmosphere config is something not so obvious, so let's quickly describe all these problems to move on.

I started with the following maven dependency:

<dependency>
<groupId>org.atmosphere</groupId>
<artifactId>atmosphere-runtime</artifactId>
<version>2.1.7</version>
</dependency>

After a lot of struggling I came to conclusion that there's no way to properly run Tomcat with Atmosphere using this library (at least in 2.1.7 version). I started with standard Atmosphere configuration, which uses native Tomcat async implementation (Comet support). In this scenario there's a bug in Atmosphere which results in using Tomcat BIO support (blocking IO) instead of NIO (non-blocking IO). Finally, you have a thread created for each async request, which is then suspended and moved to waiting pool. When you reach the tomcat thread pool capacity (default is 200) you end up with completely frozen application.

Afterward I changed the implementation from native Tomcat async support to Servlet 3 specification, using following flags:

<init-param>
    <param-name>org.atmosphere.useNative</param-name>
    <param-value>false</param-value>
</init-param>
<init-param>
    <param-name>org.atmosphere.useWebSocketAndServlet3</param-name>
    <param-value>true</param-value>
</init-param>

Using this config, it started to work through Tomcat NIO, but the odd things started to happen as well. For example random freezes on standard request processing, and a lot of java.lang.IllegalStateException: Cannot forward after response has been committed exceptions. Something similar to this guy situation.

After a lot of debugging what is really happening in the Atmosphere and Tomcat threads I gave up and I found the solution with so called "native" atmosphere implementation, what apparently is the same lib with only one class changed with fixed native Tomcat support for Atmosphere (scenario 1), which is called:

<dependency>
<groupId>org.atmosphere</groupId>
<artifactId>atmosphere-runtime-native</artifactId>
<version>2.1.7</version>
</dependency>

And is describe here. It finally works well using Tomcat Comet support and/or native Tomcat websockets support. Additionally it requires /META-INF/context.xml with following content:

<Context>
<Loader delegate="true"/>
</Context>

Atmosphere + Spring

Now something which is simple and can be found in many examples on the net. How to configure Atmosphere so that it can route requests to Spring DispatcherServlet.  To skip unnecessary words, I'll make it quick:


Things that might be explained a little more are following:
  1. org.atmosphere.useNativeorg.atmosphere.useWebSocketAndServlet3 make it finally clear that we want to go using Tomcat native async support.
  2. org.atmosphere.cpr.broadcaster.maxProcessingThreads - this is the limitation to 10 for Atmosphere threads. Atmosphere spawns some threads sweeping suspended requests (eg. by flushing their response buffers).
  3. org.atmosphere.cpr.broadcasterLifeCyclePolicy=EMPTY_DESTROY is the lifecycle policy for Atmosphere Broadcaster objects. Usually Broadcaster has assigned some AtmosphereResource-s, representing opened async connections. When all connections for particular Broadcaster are closed, the Broadcaster object may still be held in memory and reused. For SaaS application, that may handle hundreds of tenants and thousands of users concurrently I consider it a bad pattern. EMPTY_DESTROY tells Atmosphere to relase all Broadcaster objects if they don't have assigned any resources, and remove them from memory.
  4. org.atmosphere.cpr.AtmosphereInterceptor is the important one here, because after Atmosphere invokes broadcasting operation, the response buffers are flushed periodically with all data written, so they could contain more than a single message at one flush operation. In such instance your client would receive two or more messages in one event listener notification, what is usually unwanted. This can be overcome by using TrackMessageSizeInterceptor on the server side, and trackMessageLength parameter in Atmosphere client.
  5. AtmosphereSpringControllerResolver enables direct AtmosphereResource injection to Spring controller.

Atmosphere + Spring Security

Now what we'd like to have is the Spring Security context injected to Atmosphere requests, in order to extract user from the SecurityContextHolder and to apply broadcasting operations on suspended requests. The answer on the question how to do it is simple: you can't.

There are two problems I came across with this subject. First the Spring Security filters aren't applied to MeteorServlet, because it's not a reguler servlet, but CometProcessor, supporting async requests. For such type of servlets only CometFilter can be applied, not a reguler Filter, which is implemented by Spring Security DelegatingFilterProxy. You can overcome this problem, though, by either wrapping the Spring Security filters with your own CometFilter-s, or by overriding the default FilterChain by your own implementation. Anyway, it doesn't work as well.

This is because the SecurityContextHolder default storage strategy is ThreadLocalSecurityContextHolderStrategy, which holds the SecurityContext in ThreadLocal (this is the only production implementation and one cannot imagine different working strategy for this problem). It works well for standard requests, processed in separate threads, but for suspended Atmosphere requests there's a problem. When the resources are swept and buffers are flushed, all this process happens in internal Atmosphere thread pool, and one thread supports many AtmosphereResource-s in single execution, so the SecurityContext can't be bound to the thread, because you end up with an exception, or much worse, with different user authorized than it should be.

So what I do, and I'll show in the further example, is how to extract user directly for HTTP session to be used with AtmosphereResource to create appropriate broadcasters.

There's another remark about this overall architecture. If you can run DispatcherServlet through Atmosphere, you might tend to run your whole application through MeteorServlet wrapper. But, when you consider above facts, that you can't apply normal filters to this servlet, and moreover you can't apply security filters on it, the conclusion is simple: just don't do it. Define your "async" Atmosphere servlet context separately from another regular "sync" DispatcherServlet, and everything will be fine.

Broadcast events to user groups

Before final implementation, we need to understand how Atmosphere and Atmosphere Broadcaster-s work internally. 

In regular request processing, when the request comes, Tomcat takes the free thread from the thread pool (or queues the job in the thread pool queue, if all threads from pool are busy), and delegates the request processing to this thread. The Spring Security filters extract the user from HTTP session and put him to the SecurityContext held in current request thread ThreadLocal. Then the work is delegated to your servlet, response buffers are filled, everything is cleaned out and thread is released back to the pool. The response buffer is then written to the client.

In async request processing Atmosphere waits for incoming requests with its own thread pool. When the request comes, one of these threads receives it (or, like in above situation, the job is queued waiting to release at least one thread from the pool), suspends it, and returns thread to the pool. The suspension figures on releasing the processing thread, while the TCP connection is still opened. All these suspended requests are stored in an internal storage, and can be accessed in any moment in application. For all of them the TCP connections are opened, and one can write to the opened Response object to send async events to the client.

But, how to tell apart one suspended request from another? For our example - how to find all requests sent from all logged users of specific tenant? For such use cases Atmosphere introduces the Broadcaster concept. With a single suspended request (AtmosphereResource) you can associate one or more broadcasters, and use these broadcasters to send events to choosen clients. With each AtmosphereResource there's a single Broadcaster created with random UUID. 

Using this idea and knowing this UUID you may send the async event to each suspended request separately, by choosing appropriate broadcaster. Another Atmosphere concept is MetaBroadcaster. It can be used to send event using all broadcasters fitting to the expression. For example:
  1. User A connects to async service, the broadcaster with ID="/UUID-1" is created.
  2. User B connects to async service, the broadcaster with ID="/UUID-2" is created.
  3. Using MetaBroadcaster you may send data to either first or second user by broadcastTo("/UUID-1", event) or broadcastTo("/UUID-2", event).
  4. Or you can send event to all users by broadcastTo("/*", event).
This well concept can be adapted to our use case. Let's assume we have a TENANT_ID and USER_ID, defining our tenant and its user. We need to assign only one broadcaster to each async request to achieve our goals:
  1. User connects to async service, the broadcaster with ID="/TENANT_ID/USER_ID" is created.
  2. To send event to this particular user, use broadcastTo("/TENANT_ID/USER_ID", event).
  3. To send event to all logged users of specific tenant, use broadcastTo("/TENANT_ID/*", event).
  4. To send event to all logged users, use broadcastTo("/*", event).
And here comes the implementation with all described above:


Finally, in AsyncDispatcher controller we just need to suspend request using AsyncService.suspend() method, to make it all working together.

Friday, May 9, 2014

Global state in AngularJS controllers

Suprisingly for me the controller state in AngularJS is not preserved between the controller invocations. I at least expected an option to switch it on and off on demand. For the classic application it was difficult to achieve that we may restore the state for a given view (eg. to be on the same page as we were leaving the view), what sounds great for me from the application usability point of view.

But, it's right there, so it can be implemented. However, I don't like using the $rootScope for this, what can be found in many examples on the net, in the same way I don't use globals, because they produce a mess. Here I'd like to propose an elegant solution for this.

I have the service that holds all controller states, and provide the initialization function that may be used when the state hasn't been cached yet. The code snippet: