Before moving on to some client related work I did some more work on TRAP this
weekend. Its actually going about as fast as I expected so far. As the design
fleshes itself out, I can start to see some of the challenges involved in
making this kind of tool. As is typically the case when getting involved in a
project, some things turn out to be far easier than anticipated and some things
are more difficult, or just annoying. For example, supporting multiple
databases with the same engine will be easy, all encapsulated and behind the
I see performance as the biggest challenge on this project. Its plain to me now
that there's no reason why I can't support all the features I want to support
and have a robust tool. There are two performance issues that I need to address, not necisarily right now but they need to stay in the
back of my mind so that the design doesn't stand in the way of these changes later. First, the engine is heavily reflection based right now. In a data intensive application the performance hits
incurred from inspecting classes at runtime could start to add up. The idea I have in mind to address this down the line is to borrow a cue from the core .NET framework and use Reflection Emit to build some helper assemblies on the fly that
implement the same functionality in non-relfective code. The second performance challenge is of course generating efficient SQL.
When Type Manager
has a one to many relationship with Type Employee
, the tool has to decide how to load that. By turning on the
maximum debug levels for OR tools I've used in the past I often found that when loading relationships the engine would:
- Issued one statement to load Manager data meeting the criteria.
- Then issued one statement per instance of Manager to read its list of employees
Obviously this is much simpler than writing code that can do the table join to get all the information in one call. Writing code that can handle the Manger-to-Employee situation is fairly easy, but what about when Employee has related types, and those types have related types, and so forth?
Writing an engine that can generate all of this may be complex, or not possible. The "one statement per parent" method and the performance hit was infamously known in the EJB CMP community as the "1+", meaning essentially it is an Order(n) + 1
operation. I've been thinking quite a bit about the three relationship load options I plan to offer:
- Lazy Load - related types are not loaded until the property is referenced. A proxy object is created that knows how to load its data once it is deferenced. This would likely require some Reflection emit code to extend a Type on the fly with a proxy, if used for 1:1 relationships. For 1:n and m:n relationships, a proxy class that extends a built in type is easy enough to create.
- Eager Load - The engine will populate the entire type tree with one call. In most cases the number of calls could be drastically reduced by using a single subselect for each level of relationships.
- Semi-Eager-Threaded Load (need a better name for this one) - Execute the main call, with the "1+" logic being handheld by an asynchronous delegate, giving the user the illusion of faster performance. To keep using the same example, the main thread returns as soon as Managers are loaded but a threadpool thread immediately begins the work of loading Employees, the collection of Employees is likey already populated before it is referenced. This would still beat up the database pretty good.
I have quite a bit of the design and 2,000 lines of code done for TRAP right now. The next step is to finish the Types assembly design, specifically to determine exactly how I want the mapping schema to look. The stuff in the Core assembly and Types assembly comprise the interface
projects using TRAP will interface with. At this point I could share a little bit of what that looks like right now:
So, essentially to find some instances of a type you would get a UnitOfWork, which represents both a connection to a data store and a transaction. You would pass a Criteria object and a System.Type to find instances of objects, and then possibly update
these objects via the same unit of work.
I'm also showing a preview of the main mapper form from the UI. Hopefully after
looking at this you will think "Oh yeah, a Designer like that would make me
very liley to use this tool to get an application up and running quickly." The essential idea is to use reflection to display your types and use the connection information from your Project to display database schema, and the Mapping Types are displayed in a property
grid when an item on either side is selected. Everything I show from the UI is working code, soup to nuts, so really this is likely to be in an "Alpha" stage in another week or two.
I have two websites to finish, next up I will display the design for the Types assembly. After that, Provider and Engine, these two are by far the most complex and closely associated. Stay tuned.