Wow, this is something that's been haunting me for a week or two, and I just couldn't figure it out. I've been on a major performance improvement spree on Project Unblowuppable, which has always been a very database intensive application. After building up a decent sized database in production through the private beta, I pulled it down to play around with locally and find the real bottlenecks. One particular page (the one that does by far the most work) for one user was doing over 1200 db queries! Ouch. I've been trying to optimize that for a few weeks by using :include (in several cases nested includes) in my original queries to eager load as much as possible up front, and by rearranging the code to do things like pull db queries out of loops where possible. On most of the pages this was a big win, but not on that One Evil Page. I used the same techniques there, and spent literally a few weeks (in my spare time) going over the code again and again, and looking at what was going on in a debugger (thank you, RubyMine!!). Nothing worked, but watching exactly what was happening in the debugger and doing lots of research on Google was very interesting.
If you watch what gets retrieved in an ActiveRecord find() call in a debugger, you'll see (of course) that by default it doesn't retrieve any of the associated records (via belongs_to, has_many, etc.) automatically. If you use the :include option, you can tell Rails that you want to pull those associations at the same time, so you're not doing more queries later--quite possibly one at a time in a loop, which is terrible if db queries are a bottleneck for your app. An interesting thing if you peek at the query result is that the :included record(s) may not actually be there in the model if you drill down through it. In fact, what you may see is an new attribute for an association you loaded that's nil! I kept seeing this and thinking "WTF?" But looking at the SQL in the server log, I could see that indeed the associated records were being loaded--but where did Rails put the data?
In most cases, using records with eager loaded associated data worked well, but not always. Eventually I was satisfied that the data was somewhere, even though I couldn't drill down to it and view it directly in the debugger. It turns out ActiveRecord does some method_missing magic to pull that data in. Ok, no problem--there's lots of PFM in AR. :) But on the One Evil Page, even though I was pre-loading the data, it was still doing a bunch of queries to pull it from the DB again. What I finally decided to do was switch from using an ActiveRecord find() (find_by() really) to a standard Ruby Array find(). Looking at the types in the debugger, my collection was an Array, so this should work, right? Not necessarily. Rails does some magic with the find() method on these collections (which are actually AssociationProxies) and instead of just doing an in-memory search through the array, it was still doing a db lookup. So I did some looking through the pickaxe book and saw that in Enumerable collections, find() is actually just an alias for detect(). So I replaced 'find' with 'detect' and PRESTO: with all the eager loading set up, I dropped the number of db queries on the Evil Page from over 1200 to under 150! (And I'm still not done--I think I can still cut it a lot more.)
So the moral of the story? I'm not sure. I don't know if this is a bug or a feature in AR, or if it's just a side effect of my design, but wanted to put this out there in case anybody else is having a similar issue. Do some playing around and see if using eager loading and Ruby's Array detect() can cut down your db queries. (Note: the query_reviewer plug-in is an essential tool for checking your db usage. Not only will it give you the total query count, but it will show you duplicate queries and give you the stack trace of exactly where the problem is! It also shows when indices are missing or not being used, which may just be because mysql decided there wasn't enough data yet to bother.) A related note about Array find_all(): according to this thread, it is also hijacked by AR and the equivalent thing to do is use select() instead. But in my case whether I use find_all() or select() the effect is the same. Not sure why this seems to work differently than find()/detect(). Anyway, just be aware of this sort of thing and play around. Hopefully this isn't some bug that future versions of Rails will fix and break in my app. ;)
A similar thing that may help cut down db queries is using Array length() (or size()) instead of count() to get the number of records in a collection. This is great if you've already loaded the records, especially if the count is being done in a loop, which was my case. I was able to shave off a lot of SQL count queries this way.
An aside about using :include: I guess Rails (pre 2.0?) used to do a big Cartesian join when you did includes, which I can imagine was pretty horrible. The things that caused me to do some head scratching following my AR model objects in the debugger are probably a side effect of the fix for that, which is to do separate (and simple) db queries to load the data separately instead of joined. Looking at the number of SQL queries this looks very efficient, and like a very clever solution. I'm betting there was a lot of work to make this happen so transparently--my hat's off to the Rails team for this and all their other hard work!
Saturday, June 13, 2009
Subscribe to:
Post Comments (Atom)


0 comments:
Post a Comment