Saturday, September 19, 2015

How to avoid subclasses join in Hibernate model

This article will be about multiple joins in Hibernate model during super class fetch in JOINED inheritance mapping strategy, and generally how I significantly improved our system performance using some simple patterns. There's a lot of stuff on internet about this (usually wrong approach), so here I'll present my working solution for this problem.

Few words about motivation. People from Hibernate claim that this is not necessary to avoid joins, because join in nowaday databases is fast. This might be not a problem for, for example, 4 subclasses to make 4 join on each query. But what I personally really like in Hibernate, is that you can really model your world with classes and automatically map it to database. Really modeling your world usually means that you have a really lot of super and subclasses, and you may end up with for example 30-40 joins in simple from BaseEntity query, or even you may hit 61 joins limit (if you use mysql).

And here comes the problem. First and foremost, even if join is fast, I don't want to have 40 joins on simple queries. Even if it didn't influence the performance (but after my tests on big system - it does), I would like to be able to debug some queries in plain SQL, and I wouldn't like to have queries extending to 5 screens for that. So, here start things like adjusting the model to queries, what means that our real world model is no longer a real world model, but it becomes hibernate oriented model. That's second thing I hate to have in the project. I like to keep things clean.

OK, now we can delve into the problem.

Joined strategy defaults


Let's consider following simple model:



This is how it's reflected in database (I use postgresql for this example):

aero=> \d super;
              Table "public.super"
   Column   |          Type          | Modifiers 
------------+------------------------+-----------
 classname  | character varying(255) | not null
 id         | bigint                 | not null
 super_prop | character varying(255) | 
Indexes:
    "super_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "suba" CONSTRAINT "fk_5yjvt9lkf5b24nyxv59n3kbgj" FOREIGN KEY (id) REFERENCES super(id)
    TABLE "subb" CONSTRAINT "fk_q4aajuitkvsk93mn07t66l0pi" FOREIGN KEY (id) REFERENCES super(id)

aero=> \d suba;
              Table "public.suba"
  Column  |          Type          | Modifiers 
----------+------------------------+-----------
 subaprop | character varying(255) | 
 id       | bigint                 | not null
Indexes:
    "suba_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "fk_5yjvt9lkf5b24nyxv59n3kbgj" FOREIGN KEY (id) REFERENCES super(id)

aero=> \d subb;
              Table "public.subb"
  Column  |          Type          | Modifiers 
----------+------------------------+-----------
 subbprop | character varying(255) | 
 id       | bigint                 | not null
Indexes:
    "subb_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "fk_q4aajuitkvsk93mn07t66l0pi" FOREIGN KEY (id) REFERENCES super(id)

So, we have a superclass table, and two subclasses tables (in JOINED inheritance mapping strategy).

I put some data there:

sessionFactory.getCurrentSession().save(new SubA());
sessionFactory.getCurrentSession().save(new SubB());

And can observe how this data is distributed in database:

aero=> select * from super; select * from suba; select * from subb;
 classname | id | super_prop 
-----------+----+------------
 SubA      |  1 | superProp
 SubB      |  2 | superProp
(2 rows)

 subaprop | id 
----------+----
 subAProp |  1
(1 row)

 subbprop | id 
----------+----
 subBProp |  2
(1 row)

So, as expected, the data is distributed around all tables, depending on subclass type.

Now I ask hibernate about superclasses list from database using from Super HQL query, and the final SQL query built by Hibernate looks as follows:

select [...] from SUPER super0_ left outer join SUBA super0_1_ on super0_.id=super0_1_.id left outer join SUBB super0_2_ on super0_.id=super0_2_.id

The example above shows the broached problem (we imagine now, that we have 50 subclasses here, of course :) ).

Explicit polymorphism


People on the internet often try to avoid multiple joins problem using explicit polymorphism. To be completely honest, I tried to use this hibernate feature to check if it solves the problem. The misunderstanding about this feature is probably located in Hibernate javadoc, and it claims that explicit polymorphism means: This entity is retrieved only if explicitly asked.

So if we don't ask about SubA class, for example, we should fetch only instances of Super. Let's try it. 

I tested two models, first one with explicit polymorphism on superclass:



And second one is with explicit polymorphism on subclass:



Both neither influence the table model, nor the data. And for both in response to from Super we have:

select [...] from SUPER super0_ left outer join SUBA super0_1_ on super0_.id=super0_1_.id left outer join SUBB super0_2_ on super0_.id=super0_2_.id

It just doesn't work as the expected.

So, what is the explicit polymorphism for? It looks that the only application of this features, is so called lightweight class pattern, and it may be used only in situation when two or more of classes are mapped to the same table. But it doesn't solve our N joins problem.

Single table + secondary tables


So how to solve the problem of N joins? I use following pattern, involving SINGLE_TABLE inheritance mapping strategy and secondary tables. Let's first look on the class model:



Everything is mapped to single table, but for all derived classes I define secondary table (using @SecondaryTable annotation) and tell Hibernate that this table should be fetched using additional select (using @Table annotation). The additional hassle here is that for all properties from subclasses, I need to mark them by @Column (or @JoinColumn - for single ended associations) to tell Hibernate to put this property to the secondary table. This really should be done automatically (if whole class has secondary table definition), but this is one of many other things that Hibernate people refuse to do.

Let's take a glance on the database model:

aero=> \d super; \d suba; \d subb;
              Table "public.super"
   Column   |          Type          | Modifiers 
------------+------------------------+-----------
 classname  | character varying(255) | not null
 id         | bigint                 | not null
 super_prop | character varying(255) | 
Indexes:
    "super_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "suba" CONSTRAINT "fk_5yjvt9lkf5b24nyxv59n3kbgj" FOREIGN KEY (id) REFERENCES super(id)
    TABLE "subb" CONSTRAINT "fk_q4aajuitkvsk93mn07t66l0pi" FOREIGN KEY (id) REFERENCES super(id)

              Table "public.suba"
  Column  |          Type          | Modifiers 
----------+------------------------+-----------
 subaprop | character varying(255) | 
 id       | bigint                 | not null
Indexes:
    "suba_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "fk_5yjvt9lkf5b24nyxv59n3kbgj" FOREIGN KEY (id) REFERENCES super(id)

              Table "public.subb"
  Column  |          Type          | Modifiers 
----------+------------------------+-----------
 subbprop | character varying(255) | 
 id       | bigint                 | not null
Indexes:
    "subb_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "fk_q4aajuitkvsk93mn07t66l0pi" FOREIGN KEY (id) REFERENCES super(id)

The really good thing here is that the model is not changed at all, so if you experience problems with multiple joins, you may easily convert your JOINED hierarchy mapping strategy, to such model.

To be completely clean let's take a look on the data:

aero=> select * from super; select * from suba; select * from subb;
 classname | id | super_prop 
-----------+----+------------
 SubA      |  1 | superProp
 SubB      |  2 | superProp
(2 rows)

 subaprop | id 
----------+----
 subAProp |  1
(1 row)

 subbprop | id 
----------+----
 subBProp |  2
(1 row)

And finally on from Super HQL translated to SQL:

select [...] from SUPER super0_
select super_1_.subaprop as subaprop1_83_ from SUBA super_1_ where super_1_.id=1
select super_2_.subbprop as subbprop1_84_ from SUBB super_2_ where super_2_.id=2

There's no any joins here anymore.

Is this right?


If we don't want to have a big join, we need to consider that Hibernate needs to load all entities somehow. Here, we replaced the join with subclasses fetch in secondary selects, what is called the "N+1 select problem". So is this right or not to use this pattern?

I'd say: it depends. If you have 4-5 subclasses probably the join would be faster. If you have 50 joins, it probably wouldn't. Moreover if you hit 61 max joins in mysql, you cannot execute any query anymore, so this is the only sensible solution.

What I really do when I see such big joins in a project, I try to measure performance of both approaches, and choose the best solution for real database with real data and real queries we execute on underlying model. I cannot say that "a lot of joins" is an antipattern, or "N+1 select" is an antipattern. It just depends on some conditions. For example if you fetch data for the view with pagination, you usually don't fetch more that 10-20 entities in a single query, so N+1 select problem is very little here. If you want to fetch big amount of data, this probably will be a problem, but also there's probably something wrong with you app, if you need to fetch so much data from database.

So everything should be tested and used with thinking (what can be said about clearly everything in software development :) ).

Further improvements


This article I made after huge system refactor in current project. Translation from JOINED to the proposed model above was only one of the things I refactored. It turned out that uncontrolled eager fetch of various associations can also produce unbelievable queries. You may say that if one have uncontrolled eager fetch, one also have a bad model. That's true, this is why the refactor was needed - to convert bad model to best performing model. I always follow the pattern that we focus on business system value in a first place, and the optimization stage comes at the end.

So what I did more to optimize the model. First and foremost now I consider eager fetch for collections as a bad pattern. But this is not a problem, because all to-many associations are lazy by default, and if they become eager, this is a developer choice, so he should know what is he doing and why. But the problem is with single ended associations. All single ended associations are eager by default, and they can produce something really unwanted regarding application SQL.

You may control this "join depth" using hibernate.max_fetch_depth property. If you set it to 0, hibernate wouldn't make any deep joins fetching your entities. But this doesn't mean that you convert your single ended associations to lazily fetched. They still will be fetched (at least the first level of single-ended associations in relation to entity you want to fetch) using additional selects. To really convert them into lazy one, you also need to use additional annotation @LazyToOne. That really converts the objects in the relations to lazy proxy (you have then very ugly lazy proxies in these relation ends, but this still can be overcome using hibernate instrumentation in compilation stage).

So, is this right to have everything lazy and how does it influence on queries performance? No, it's not right to say so. I just don't believe in good perfomance, when you once can say "this entity should have these relations fetched lazily, and these relations we will use frequently and they should be fetched eagerly". This simply depends on the given entity usage. Sometimes, when you develop your model, it seems you can say something like this (usually thinking about fetching the list for the view, where this entity is displayed), but then it turns out that you also use list of the same entities in various different services, and you just don't need these eagerly fetched association in these queries. 

So I believe this is just not OK to define on your model what should be fetched eagerly and what should not. There's a different place where you can do it - in repository queries. For each query you make, you usually know the exact usage of this query, and you can define appropriate joins in HQL or by using setFetchMode(), when you use Criteria API. And this is the right place where it should be defined.

So what I did, when I was doing this refactoring, can be limited to these activities:
  1. Convert some entity hierarchies from JOINED to SINGLE_TABLE with secondary table, to controll the subtree fetching.
  2. Make everything lazy, and
  3. Define appropriate joins on repository queries level, depending on the query usage.
After this job, with our high load database test we achieved about 560% of performance boost.

Friday, August 28, 2015

Statistics queries with time range for PostgreSQL and Hibernate

Few days ago I was asked to implement charts with some finance data using JS charting library, in the application working on Postgres database. There's a lot of finance stuff in the database stored every minute and I just needed to show this data on charts.

OK, but charts are nice if you can restrict amount of your data returned from the server to some limited number of items. Returning 1000 items would both - make chart very heavy and unreadable and consume a lot of net traffic. I assumed 50 as the average number of items that are nicely displayed on such charts and are also acceptable from the traffic point of view.

It would be nice to split time data to some periods that always assert no more than 50 items and then return only such statistics to the client side. And here I discovered very nice and applicable postgres function, which is date_trunc.

Equipped with this weapon I figured out universal solution to calculate such time-based statistics for postgres database using Hibernate. Firstly I designed a very simple enum to calculate right time-grouping criteria:

It'd be nice then to have everything working with whatever entity type and to be able to customize grouping query with Hibernate criteria, because this is right abstraction to build SQL query step by step from filters hypothetically associated with chart filters:

It's also quite simple to autodetect best data granulation for the chart (eg. if it is a hour, day, month or something else) depending on the base time range for the chart.

Sunday, July 26, 2015

Modal windows available by URL binding in AngularJS with ui-bootstrap and ui-router

It's well known how ui-bootstrap supports modal windows. These modals don't have any URL and cannot be accessed by URL, though. This is very good, because they should stay as the main view subview only, don't mess with the browser history and first and foremost they shouldn't kill underlying main view controller, what would reset its current state (like filters applied, pagination, etc). This would be a problem using URL-based modals, especially with ui-router which seems to be a standard for now.

On the other hand it's good to have a URL-bound modals for one usage. When you send to people notification emails with links to perform the requested action. I mean such emails like "Attention, you have to do something in the application. You can do it with this link.". In typical scenario this action is already available in the application, usually on modal window. You just want to send this modal link to the user, to trigger requested action, and if you can't bind your modal to the URL, you can only send him some more or less awkward message, like "Use this link to login to the application, and then ... dig yourself to find where you can do it.".

Here is a little ui-router hack to achieve both things - normal, unobtrusive modals, and URL bindings.

NOTE, that this is not about having modals with ui-router URL address. There're many implementations on stackoverflow, and also example in ui-router FAQ. But these solutions have a very big flaw - their modals kill underlying main view controller, and when you're back, you're really back (like with the back button) on the main view with new controller instance and reset whole state, what is unnacceptable for real world applications. Maybe next time I'll think a little about how to achieve this point without this problem, but now it's about having normal bootstrap modals also bound to URL addresses in ui-router.

Firstly, you need to define in your main HTML the location where the modal will be rendered, if it's URL-bound. Let's make it simple as modal.html (all codes are inserted at the bottom of this post). You can define surrounding view as main view, and use it for the modal target (when the modal window is rendered) or the whole application layout target (for views other than modal windows) by appropriate router substates configuration and CSS.

Afterwards, let's define simple module.js with modal content, one URL parameter and resolve function that provides the resolved object to the modal controller, from the server side. At this point your modal is exposed at /modal/{ID} URL, and you may use it in your notification emails.

Now, let's support our modal as the regular ui-boostrap modal for other application views. This is done in modal.service.js by simple hack and moving parameters from ui-router configuration to ui-bootstrap $modal instance. In any view now, instead of calling $modal, you can call modalService(viewName, stateParams) to open your modal window previously defined in ui-router configuration, and at the same time you have this modal available by URL.

Source code

Saturday, June 20, 2015

Distributed hibernate search with ApacheMQ and Spring

This catching title will be not about distributed hibernate search, but something really close :)

The case I've recently solved was a quite different. I have a frontend application that uses hibernate database and hibernate search, and then I needed to add additional application - let's call it an integration server, which exposes some API webservices for the overall system. Integration server uses the same database as the frontend application, and enables clients to put data to the database using its webservices. Both applications exist on two different physical servers, as well.

Everything looks simple unless you start thinking about hibernate search update from the integration server, while the index is located solely on the frontend application side, because only this part of the system uses it. When you put data to the database from the integration server, the frontend application full-text index is not updated, of course. I've been looking for simple solution to overcome this problem.

Firstly, let's take a look at what the hibernate search proposes. It supports distributed hibernate search index, with master-slave replication, where all nodes are connected using JMS. This solution was something I didn't really need because only one node uses the index for searching. Moreover this solution is based on periodical index replication, what causes the index is up-to-date on each node only after some interval. Finally, I didn't like this solution because it uses JNDI, what is not really Spring way to solve the problems (I don't really like JEE, I only like to work with lighweight Java application stacks).

So I figured out the solution with following prerequisites:
  1. I don't want to use replication because I only need to use search on frontend application side.
  2. I want to keep index physically on the side really using it, ie. on the frontend application side.
  3. When the integration server updates database data, I need to update the index.
  4. I don't want to use JNDI.
  5. We are already using ApacheMQ, I want to use it for this solution as well. AMQ broker is already located on the integration server side.
OK, let's delve into the solution. Here is the spring config of important beans on the frontend side:


What do we have here? Standard hibernate session factory on which I'm showing the hibernate search config, that creates and uses local lucene index. Then AMQ connection factory, that connects to the broker running somewhere else. Nothing special. The only interesting bean is RemoteHibernateSearchController, which is derived from standard AbstractJMSHibernateSearchController, that already is a JMS message listener, and only needs to provide hibernate sesssion from our session factory:


Now let's take a look at integration server config, which is a little more interesting:


Same session factory, but configured in the other way. As the backend we use AMQBackendQueueProcessor - our own implementation, shown for a while, not the standard "lucene" implementation. Our implementation will delegate all hibernate search insert/update requests to the listnening frontend RemoteHibernateSearchController, through the JMS queue named "queue.search".

I need to mention here a thing. The integration server doesn't use hibernate search for searching at all (it is only insert/update oriented). But if we have enabled hibernate session for integration server, we need to have at least some index to work. This index won't be updated ever (AMQBackendQueueProcessor will delegate all updates to frontend index through JMS) and will never be read. So I decided to use "ram" provider, which holds whole this few-bytes index in RAM memory. You can, anyway, use any implementation you want - this is only a fake index.

AMQ configuration comes then, and we define only one queue here (this config is redundant, but you can add some parametrization to the config made this way). This is the part really starting the broker using TCP transport (in test environment both servers are run on localhost). Note, that AMQ connection factory connects to the (local) broker using VM transport, and doesn't start its own broker itself.

Now few words about the initialization order. We will use a little trick to bind AMQBackendQueueProcessor to Spring for a while, so the order is important. When session factory bean creates the session factory, hibernate search worker backend needs to be able to work right away. Our backend will work using AMQ, so AMQ needs to be initialized before the session factory bean is initialized. In the example it is done by depends-on attribute, and sessionFactory bean depends here on amqBroker bean (indirectly, the dependency goes through appContextProvider bean, which is also required to be initialized when session factory bean starts).

ApplicationContextProvider bean just accomplishes the commons hack to access Spring beans from non-spring aware code:


And finally AMQBackendQueueProcessor overrides JNDI-related code of standard JmsBackendQueueProcessor JMS connection factory lookup with spring-based implementation, using our container config and appContextProvider hack: