RavenDB and the LoadDocument feature
To make a long story short: handle with care.
The long story…
In a relational world you must live with joins, full stop. It is not a problem neither an advantage it is a sort of a non-functional requirement, this is it, a relational model most of the time requires joins.
If you have in your object model a Person class with a list of Addresses it will be stored in 2 different tables and in order to load them there are 2 options:
- shoot yourself in a foot and perform 2 queries;
- issue a join and in one single round-trip to the db get all the data;
When it comes to document modeling in a non-relational world the available options are a bit more and ranges from model your document in a relational fashion (most of the time shooting yourself in a foot) or model them in a document oriented manner.
Modeling in a document oriented manner open up to an infinite range of options because it barely depends on the context, the usage, the type of the application and much more. For the purpose of the article consider the above Person class, one solution in a document database is the following:
{ firstName: 'Mauro', lastName: 'Servienti', addresses: [{ country: 'Italy', city: 'Milan' },{ country: 'Ireland', city: 'Dublin' },] }
We can embed everything in one single document with the consequence that we do not need a join to load the entire data set, it is one single trip in all the cases.
Nothing forces us to model the solution as in the above sample, even if is a good approach, we can model the solution in a relational fashion even if in this case will be the worst solution but in the end I’m just trying to find a simple sample use case for the LoadDocument feature :-)
What we can do is store persons and addresses as separate documents introducing a relation (application level relation) between them, something like:
{ id: 'people/123', firstName: 'Mauro', lastName: 'Servienti', addresses: [ 'addresses/123' ] } { id: 'addresses/123' country: 'Italy', city: 'Milan' }
There are valid use cases for the above choice, what we immediately lose is the ability to easily search a person given an address because with the first design choice we can define an index such as the following:
from doc in docs.People select new { Content = new object[] { doc.FirstName, doc.LastName,
doc.Addresses.SelectMany( x => x.City ) } }
and we immediately get full text search capabilities even on addresses, and that is pretty cool. In the second scenario with separate documents the application must decide where to search and if we need to search as in the first scenario we need to issue 2 search queries.
LoadDocument to the rescue
from doc in docs.People let addresses doc.Addresses.Select( x => LoadDocument(x)) select new { Content = new object[] { doc.FirstName, doc.LastName, addresses.SelectMany( x => x.City ) } }
The above index definition solves the problem, we have basically defined a join in an index that at indexing time will load related documents and indexes also their content.
Handle with care
What we need to take into account, if we decide to use the above feature, are the side effects that the above decision introduces. Under the hood what RavenDB does is to keep track of the fact that we have defined the relation, in the index, between a bunch of documents, this is done so that whenever an address document changes the index will be marked as stale simply because this is the expected behavior otherwise the above index will always produce stale results.
Think about it for a minute: keeps track of the relation to re-evaluate the index at each related document change.
Once again: …at each related document change…
Why am I telling all this?
Simply because we have clearly outlined something that can be a real problem if not handled with care, imagine a scenario where you have a huge team of experienced developers migrating an application from a relational model to a document based model where they approach the migration in a relational manner, as in the second sample, introducing indexes with a lot of “load documents” (where “a lot” means more than 10 per index) everywhere.
trust me, the result is a nightmare :-) it surely works, on the developer machine only…
.m