Monday, November 2, 2009

Doing deep copies of arrays of ActiveRecord objects in Ruby

So in Kitsch'nware I have a big collection of ActiveRecord objects I pull from the db, and in a loop I do some processing on the collection where I modify some of the AR objects. Then on the next iteration of the loop, I need the collection restored to a clean state again. One option is to do another DB query to pull the same data, but that seemed silly and inefficient (though I would think Rails' SQL caching would take care of it, but I couldn't verify that it actually worked).

My first thought was to copy the original collection at the top of the loop, so I looked into copying options in Ruby: dup and clone. dup is designed for shallow copies, and what I wanted was a total deep copy of the collection, which I thought clone would do. Alas, no. Did some more looking into it and found on the ActiveRecord API documentation that when AR does a clone, it skips the id but copies all the attributes. In most cases, that would make sense--if you clone something to put back in the DB, you'd want to give it a new id. But in my case, I wanted an exact copy with the id.

One option was to override clone in my AR classes, but I found what seemed to me to be a cleaner, easier to follow method here and here. It's just a one line thing:

copy_of_your_array = Marshal.load(Marshal.dump(your_array))

and bam! you've got a deep copy of your array. This cut the number of SQL queries on the page (using a copy of the DB pulled from production, which is still pretty small) from almost 400 queries to about 70, and the amount of time using MySQL went down about 80% (the previous stats were courtesy of the awesome query_reviewer plugin). A few more optimizations to prevent doing the same work over and over in the loop, and I should have a nice performance improvement. One (small) worry is the Ruby overhead doing this clone might incur, but from what I've seen so far it looks fine. Not sure if that will change as my DB gets much bigger.