Hibernate, @Fetch(FetchMode.SUBSELECT) and accidentially loading a whole database into memory

This week I was facing a very suspicious performance issue with @Fetch(FetchMode.SUBSELECT) and Hibernate. Let’s imagine the following Entities

@Entity
public class Parent {
 @Fetch(FetchMode.SUBSELECT)
 private List children;
}
@Entity
public class Child {
 private String something;
}

This comes with the following logic:

for (int i = 0; i < parentCount; i += 10) {
 List tenParents = entityManager.createQuery("from Parent").setOffset(i).setMaxResults(10);
 for (Parent parent : tenParents) {
  for (Child child : parent.children) {
   System.out.println(child.something);
  }
 }
}

Long pause at the start

Don’t try to make to much sense from the code above. What it does is loading the Parent-entities in batches of 10 and print the childrens member variable ’something‘. This looks pretty straight forward but was causing a long pause when accessing the children relation of the first Parent-entity. It just didn’t make much sense. We took a look at the SQL statements generated by Hibernate:

SELECT ... FROM Parent LIMIT 0, 10

This is our manual query which loads 10 parents.

SELECT ... FROM Child WHERE parentId IN (SELECT id FROM Parent)

This one is … suspicous. It is the query which caused the program to stop for a while. And it got me thinking.

The magic of SUBSELECT

What Hibernate with SUBSELECT does for SUBSELECT is a pretty brilliant idea. When you load entites using Hibernate, it will remember which query you used to load the data. This query is used to lazily load the child entities. If you used query X to load the data, it will use the same query to load the children using the query X as subquery (i.e. „SELECT … FROM Y WHERE parentId in (X)“).

This is helpful in certain situations. One example is where Fetch.SELECT fails. Fetch.SELECT will load the subentities using the ID of the parent, i.e. „SELECT … FROM Y WHERE parentId = ?“. Combined with @BatchSize(size = 10) this will generate optimized queries like „SELECT … FROM Y WHERE parentId IN (?, ?, ?, …)“. Adding large batch sizes will generate more questionmarks. This forces Hibernate to transfer many IDs to the database server, where using a subselect might simply be faster (of course only if the SQL query used in the subselect is fast).

The issue with SUBSELECTs

The issue in my specific example is: While Hibernate remembers the query used to load the data, it does not remember offset and maxResults! In my case this caused to load more that 1 million Children just to access a few of them in one batch. While it is correct behavior, it will consume a lot of memory which might be unexpected. If this is combined with flushes to the entity manager in the loops above, Hibernate will load the data multiple times.

My solution

In my case switching to @Fetch(FetchMode.SELECT) with @BatchSize was the best solution. My batch size is small so sending a few IDs to the database does no harm.

Copyright © christophbrill.de, 2002-2017.