Direct Search
August 25th, 2010
For the past couple of years I’ve been primarily involved with engineering models used in search engines. At times I’ve run into situations where a model I’m using or developing has some parameters that need to be set. For example, a model might have a parameter that is a threshold on a number of times a keyword will be counted before we decide that additional occurrences are probably spam (and, yes, I’m talking about BM25 here). And, at times, either the cost function I would like to use to set the parameters is not differentiable (yeah, I’m thinking about DCG), or I’m perfectly happy to use a quick and dirty method. So I end up going with a direct search algorithm. Here’s what I’ve learned (and haven’t forgotten)…
- I don’t know of any direct search method that scales to more than a dozen-sih parameters.
- Apache Commons Math has two direct search algorithms implemented in its Optimization package that are great place to start. The package also provides a framework for defining the cost function. Check it out: http://commons.apache.org/math/userguide/optimization.html
- Implementations abound in which each parameter is iteratively changed, using a heuristic for direction and possibility momentum for the changes. Evaluation of the cost function usually happens after a single parameter is updated, rather than only after an epoch. Here is a good example lifted from a paper describing the winning solution to the Netflix Prize (http://www.netflixprize.com/assets/ProgressPrize2008_BigChaos.pdf)…
September 29th, 2010 at 6:44 am
I read this several times, still trying to understand fully…:)
Great bit of information on information retrieval … do you implement these things ? my method has been more pragmatic and empirical compared to a theoretical angle for IR.