I am not interested about the web search performed by some web spider or a web crawler here. Those automated sophisticated super fast programs may have many blows and whistles to boost their performance. I am interested here to formalize the adaptive search technique that we often use on a day to day basis without even knowing how strong it is. This relation will help you understand the formal approach behind the adaptive search and thus allow you to use it more rigorously.
What is an adaptive web search? As I mentioned in one of my previous articles it is a cute and smart technique to find what you want using a web search engine, like Google. Let me quote the search technique used to find some literatures for solving a problem described in my previous article.
Say I have a problem to solve that was assigned by some course teacher or my research adviser. I mark some keywords and Google for them. If I don’t find any relevant information I use combination of those keywords or use alternative keywords adapted from the search results. Once I start getting some keywords that produce relevant results in Google, I pass it to Google Scholar. Sometimes I go to some other subject specific search engines to search using those keywords.
I believe you can see the pattern used here. Although the procedure is described here for literature search, it is equally applicable to any other search. The procedure is, start with some initial guess, and depending on the outcome refine your guess after each query. As on each query you are adapting to the outcome, I tend to call it an adaptive web search.
How is it related to Genetic algorithm? Genetic algorithm has its root into our civilization and into the evolution process of whole world. We are all familiar with the phrase “survival of the fittest“. This means nature does not like unfit outcome, so nature refines itself with each evolution. Can you see the similarity with adaptive web search?
Genetic algorithm is a simulation technique that uses a formal approach to simulate above situation and finally come up with an approximate solution to a problem. Wikipedia has two nice articles on genetic algorithm and genetic programming. Quoting from wikipedia:
Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.
It should be clear by now that the adaptive web search we do is essentially an application of genetic algorithm. So you should feel confident about this searching technique. Here is a summary of the technique again:
- Start with some random or predefined initial guesses
- Search for those keywords
- Select “acceptable” results from your search results and mark down some keywords from it
- Do this until the results are approximately what you were looking for
- Stop if you are searching over and over for many times but you are not getting good results
As adaptive search is a genetic algorithm it also suffers from the shortcomings that genetic algorithm has. First of all adaptive search depends on you first initial guess, which may become so unrelated that your search results are really bad. Secondly, it depends on your judgment of accepting a result. If you expect results too soon and accept no result you may end up unhappy. On the other hand, if you except all results you may end up doing too many searches with all unrelated results.
You may get surprised with the efficiency of this adaptive search technique. If I remember correctly I never had to go for more than three iterations to get some good results. What I often do is, I go through this process two or three times and gather some acceptable results. I start working with the results and collect knowledge related to my problem and branch out my findings. Some of my questions may be answered by this set of results, but some may not be. So I go again searching solutions for the sub problem at hand using the same adaptive search technique I have discussed so far.
As long as you are aware of the pitfalls of adaptive search, this can be a powerful approach for web searching. So folks what approach do you follow for a web search?