Monday, February 3, 2014

Grails many to many associations and fetch with join


Lately I have been working on a custom written web solution using Grails MVC framework, that's stacked on top of Java / Groovy, and is parallel to RoR.  Since I've started working with Grails  and it's object-relational mapper GORM that's stacked on top of Hibernate, as someone that comes with no in-depth Hibernate knowledge I came up with following problem:

I had to create many-to-many mapping between entities A and B, let's called them Developer and Project, and wanted to lazy load developers when needed  but, using sql JOIN statement, rather than reading one-by-one which was database-intensive. Examine following definitions:


class Project {
 String title
 
 static hasMany = [ developers : Developer ]
}

class Developer {

 String title
 
 static hasMany = [projects : Project]
 static belongsTo = Project
}

Obviously, there is many to many relationship defined between Developer and Project domain definition, and by Grails official documentation, many to many associations need to have owning end of definition, Project in this case. Owning end means that all saves, and updates to Project will cascade to developer.

Database schema  is created using following statements (taken directly from mysql log)


create table developer (id bigint not null auto_increment, version bigint not null, title varchar(255) not null, primary key (id)) ENGINE=InnoDB
   
create table project (id bigint not null auto_increment, version bigint not 
null, title varchar(255) not null, primary key (id)) ENGINE=InnoDB
 
create table project_developers (project_id bigint not null, developer_id 
bigint not null, primary key (project_id, developer_id)) ENGINE=InnoDB

alter table project_developers add index FK700662CFD8DF43FA (project_id), add constraint FK700662CFD8DF43FA foreign key (project_id) references project (id)

alter table project_developers add index FK700662CF76D8825A (developer_id), 
add constraint FK700662CF76D8825A foreign key (developer_id) 
references developer (id)

Consider following code and log output when inserting values, if Grails is configured to log sql using dataSource.logSql  setting:


 def project = new Project(title:"Awesome web solution")
  
  project.addToDevelopers(new Developer(title:"John Doe"))
  project.addToDevelopers(new Developer(title:"Mark Smith"))
  project.addToDevelopers(new Developer(title:"Joanne Doe"))
  
  project.save()



Hibernate: insert into project (version, title) values (?, ?)
Hibernate: insert into developer (version, title) values (?, ?)
Hibernate: insert into project_developers (project_id, developer_id) values (?, ?)

It simply inserts values into mapped domain object tables, and mapper table. Since repeated rows are ignored in log, there isn't 3 statements for 2nd and 3rd row - there actually present in real mysql log:



 73 Query insert into project (version, title) 
                                      values (0, 'Awesome web solution')
     73 Query insert into developer (version, title) values                                    (0, 'Joanne Doe')
     73 Query insert into developer (version, title) values                                    (0, 'Mark Smith')
     73 Query insert into developer (version, title) values                                    (0, 'John Doe')
     73 Query insert into project_developers 
                                    (project_id, developer_id) values (1, 1)
     73 Query insert into project_developers 
                                    (project_id, developer_id) values (1, 2)
     73 Query insert into project_developers 
                                    (project_id, developer_id) values (1, 3)

And now, what happens with GORM mapping by default if we try to pull all developers that worked on a project. See piece of code below

EXAMPLE 0
def project = Project.findByTitle("Awesome web solution")
  
  project.developers.each { developer ->
   println "${developer.title} works on project '${project.title}'"
  }
  

- and sql log that follows it


// ...
//pulling out developer with given title before

//pulls out all developer IDs that work on a project
Hibernate: select developers0_.project_id as project1_0_0_, developers0_.developer_id as developer2_0_ 
    from project_developers developers0_ 
    where developers0_.project_id=?

//pulls out developer 1
Hibernate: select developer0_.id as id2_0_, developer0_.version as version2_0_, developer0_.title as title2_0_ 
    from developer developer0_ 
    where developer0_.id=?

//pulls out developer 2
Hibernate: select developer0_.id as id2_0_, developer0_.version as version2_0_, developer0_.title as title2_0_ 
   from developer developer0_ 
   where developer0_.id=?

//pulls out developer 3
Hibernate: select developer0_.id as id2_0_, developer0_.version as version2_0_, developer0_.title as title2_0_ 
   from developer developer0_ 
   where developer0_.id=?

Obviously, there is n+1 query executed for project with n developers, which is database-intensive even for projects with small number of developers, since main thread will be constantly put in I/O wait, waiting for database reply, and database will process larger number of queries, so MySQL server CPU is unnecessary utilized. Simple idea for anyone with some knowledge of SQL would be to perform a JOIN instead of row-by-row querying.

Techniques for loading more than one row at a time in GORM, by Grails documentation are

  1. Setting lazy loading to false (pulling all date at once), since GORM mapping is lazy loading associations by default
  2. Specifying 'join' fetch 

 Trying to set lazy loading to false on Project domain, simple does not work, and gives us same sql log:

EXAMPLE 1
class Project {
 String title
 
 static hasMany = [ developers : Developer ]
 
 static mapping = {
  developers lazy:false
 }
}


        select project0_.id as id0_0_, project0_.version as version0_0_, 
               project0_.title as title0_0_ 
        from project project0_ where project0_.id=1
        select developers0_.project_id as project1_0_0_,
             developers0_.developer_id as developer2_0_ 
        from project_developers developers0_ 
        where developers0_.project_id=1

        select developer0_.id as id2_0_, developer0_.version as version2_0_,
               developer0_.title as title2_0_ 
        from developer developer0_ where developer0_.id=2
 select developer0_.id as id2_0_, developer0_.version as version2_0_,
               developer0_.title as title2_0_ 
        from developer developer0_ where developer0_.id=1

        select developer0_.id as id2_0_, developer0_.version as version2_0_,
               developer0_.title as title2_0_ 
        from developer developer0_ where developer0_.id=3

If setting fetching associations using JOIN statement, only one developer is pulled out, since reading project statement is merged with reading developers statement.

EXAMPLE 2
class Project {
 String title
 
 static hasMany = [ developers : Developer ]
 
 static mapping = {
  developers fetch:'join'
 }
}


sql log (directly from MySQL log file):


               select this_.id as id2_0_, 
                       this_.version as version2_0_, 
                       this_.title as title2_0_,
                       developers2_.project_id as project2_2_2_, 
                      developers2_.developer_id as developer1_2_
               from project this_ 
               left outer join project_developers developers2_ 
                     on this_.id=developers2_.project_id 
               where this_.title='Awesome web solution' limit 1
        select developer0_.id as id0_0_, 
                     developer0_.version as version0_0_,
                     developer0_.title as title0_0_ 
               from developer developer0_ 
               where developer0_.id=1

Obviously, when reading entity by findBy, it's translated to 'limit 1' sql statement, and thus limits reading of project developers to only one. Even if reading statement changed to reading project directly, so there's no merging of two Hibernate statements, this approach won't make things better, since every developer is pulled out of database in it's own query, though it will give correct (read ALL) results

EXAMPLE 3:
 def project = Project.get(1)
  
  project.developers.each { developer ->
   println "${developer.title} works on project '${project.title}'"
  }

MySQL  LOG file:


select developer0_.id as id0_0_,
       developer0_.version as version0_0_, 
       developer0_.title as title0_0_ 
from developer developer0_ 
        where developer0_.id=1
 
select project0_.id as id2_0_, project0_.version as version2_0_, 
       project0_.title as title2_0_,
       developers1_.project_id as project2_2_2_, 
       developers1_.developer_id as developer1_2_ 
from project project0_ 
     left outer join project_developers developers1_
          on project0_.id=developers1_.project_id 
 where project0_.id=1
       
select developer0_.id as id0_0_, developer0_.version as version0_0_,
               developer0_.title as title0_0_ 
from developer developer0_ 
where developer0_.id=2

select developer0_.id as id0_0_, developer0_.version as version0_0_,
               developer0_.title as title0_0_ 
from developer developer0_ 
where developer0_.id=1

select developer0_.id as id0_0_, developer0_.version as version0_0_,
               developer0_.title as title0_0_ 
from developer developer0_ 
where developer0_.id=3


Still, large number of queries, for simple result. Analyzing all of above examples implies that mapper table 'projects_developers' is actually what's treated as 'developers' association in Project domain, in the key to achieving the goal is to set join behavior on  OWNED end of an association, that is on Developer domain class:

EXAMPLE 4:

class Project {
 String title 
 static hasMany = [ developers : Developer ]
}

class Developer {

 String title
 
 static hasMany = [projects : Project]
 static belongsTo = Project
 
 static mapping = {
  projects fetch:'join'
 }
}

And following SQL log

select project0_.id as id0_0_, project0_.version as version0_0_,
       project0_.title as title0_0_ 
from project project0_ 
where project0_.id=1

select developers0_.project_id as project1_0_1_, 
       developers0_.developer_id as developer2_1_, 
       developer1_.id as id2_0_, developer1_.version as version2_0_, 
       developer1_.title as title2_0_ 
from project_developers developers0_ 
     inner join developer developer1_ on developers0_.developer_id=developer1_.id 
where developers0_.project_id=1


As you can see only 2 sql queries are executed - one to pull out project, and second to pull out all of the project's developers, and no limit is implied.

It is lengthy post, but hope that this will save someone's time, as I spent quite some time understanding GORM and it's gotchas. For any furhter tweaking like CASCADE constraints on MySQL, pagination parameters, etc, please refer to official GORM documentation.