Filter in the Database, not the Application
Code smells are patterns in code that can indicate possible problem areas. One code smell is loading many records out of a database when you only want some of them and doing filtering in the application logic.
The other day, I sniffed out this smell in our suggested users feature. We assign each suggested user a weight that determines their probability of being suggested. A user with a weight that is double another user's has double the odds of being suggested.
To determine what users to suggest, we were loading all suggested users out of the database and using an algorithm that repeatedly picked users based on a weighted probability until it had the desired amount. To pick a set of 5 suggested users, we'd run all.map{|user| [user, user.weight]}.weighted_subset(5) with the following weighted_subset method:
module Enumerable
# given (item, weight) pairs, returns a subset
# with each item's probability of being included
# in the subset proportional to its weight
def weighted_subset quantity_to_include
return [] if quantity_to_include <= 0
subset = weighted_subset(quantity_to_include - 1)
available = reject {|item, weight| subset.include?(item)}
offset = Kernel.rand(available.sum(&:last))
available.inject(0) do |sum, (item, weight)|
sum += weight
next sum unless sum > offset
return subset + [item]
end
subset
end
end
I knew that it would be better to somehow do this in the database, but didn't know exactly how. I figured that we could order the users by a function of their weight in a way that produced the same probabilities of choosing different sets of suggested users as the above algorithm did. I didn't know what function would have that property, though, so I asked a question at mathoverflow. With this help, I was able to remove the weighted_subset method and the code to pick a set of 5 suggested users was now all(:order => '-1 * ln(rand()) / weight', :limit => 5).
A winner is me.