Search is broken. Google, Yahoo, Ask, Alta-Vista, and on, and on the list goes.
Hundreds of companies, thousands of individuals. I know why search is broken, and I know what needs to be fixed. Now to figure out the how of fixing.
When you’re looking for information, you search on keywords. Google’s been nice enough to rank results by ‘popularity’ (yeah, it’s called PageRank, and it’s proprietary, but it’s a popularity/relevance ranking). The problem is that you have to know what keywords were used. Some places are nice enough to suggest spelling fixes (it’s not ‘brittany spears’, it’s ‘britney spears’).
But that’s not the issue. The issue is that you don’t know what word, term, or phrase to look for. You have the concept you need to find, like ‘module’. Except you don’t think of that word, you think of ‘chunk’. Bam! You’re out of luck: no author would use the word ‘chunk’ when they mean ‘module’, right?
To fix search, we need to search on not just the keyword, but the concept. In English, you’d use a thesaurus.
So, you’re thinking: “This is easy! I’ll just build a comparator that looks at the keyword and then goes through an index of a thesaurus and finds stuff. And we’ll all be rich!”
Hold it, buster. You missed something. This is a perfectly valid English sentence, and you can figure out what I’m saying, too: “Bring me the cooler cooler cooler from the cooler’s cooler.” Cooler is used five times, with the following meanings (at least): hip, less warm, box to keep things cool, jail cell, big refrigerator.
That’s the problem with trying to fix search. Words can mean far too many things in English. But here’s your big chance to figure out a solution: I’ve told you the problem, and I’ve given you the target.
Now go make it work.
Funny you should bring this up, I just had this exact experience while trying to figure out a sticky programming problem. It turns out it’s actually really easy to implement, and there’s even pre-existing Python code for what I wanted. But, I wasted three days of racking my brain to figure it out because I didn’t know what to call it, and therefore all the reference books, sample code, and Google searches in the world didn’t help.
The interesting thing to me is that it’s not just the wrong search phrases that matters, or the multiple meanings of a word. There’s also things that have very specific names, but they’re not necessarily synonyms for the keywords that you’d use to describe them. A simple example that comes to mind – what if you were looking for a ‘whammy bar’ for a Fender Stratocaster electric guitar. What are the odds that any combination of keywords you’d think to search on would translate to whammy bar / tremolo bar and find you what you wanted?
For that matter, suppose you weren’t familiar with a Fender stratocaster and you were trying to search for one. What would you call it? Without knowing the name ‘stratocaster’, or a whole lot of very specific guitar terminology to describe the appearance, good luck finding that with a search engine. Yet if you called up a guitar shop and described it or showed someone a picture of it, chances are pretty good you could find what you want.
That to me is where search has almost infinite room to grow. Until Google or AskJeeves or whoever can figure out what you *meant* to say and take input from anywhere near as many sources as a human being can, there will always be room for improvement.
Of course, it’s a whole different topic, but there’s also the sticky problem of weeding out the garbage results. As long as there is money to be made by drawing people to your illegitimate/shady site, there’s going to be jerks out there trying to game the search results. On top of that, there’s plenty of times where what you’re searching for brings a whole bunch of useless results. Ever tried searching a computer problem and gotten nothing but 50 pages of “me too!” forum posts? Again, almost infinite room for improvement on the filtering and ranking end. As far as we’ve come, it still seems like the very beginning of the road to me.
Whew, that was probably a bit much for a comment. I’m starting to wonder if I just wrote a comment longer than your blog post 😉
Yeah, methinketh your comment was longer than the post… but that’s ok.
I remember learning about these things called “knowledge systems” in school, and the idea was they were real-language tools where you’d describe what you were looking for in general terms (one at a time), and the system would tree down to what you really wanted. For example: animal, 4 legs, stripes, predator, asia, white – and you’d get to ‘bengal tiger’. Of course, it has to be able to tree from any start point, in any order, and still get there (something like a fully-connected graph, with rules about what to *not* go to next based on the next input history – so, a dynamic splay tree).
I’ve never seen such a system that worked for more than very small, focused apps – and even then, a lot of times it has to give you choices, it can’t take random input.
I agree, This has been a problem for a while. A question I have is, what drives search engines to fix perceived problems in functionality? I think you are correct, antipaucity regarding the tree methodology. I myself have written Elizaesque code in several of my projects to handle just such a problem. The main issue with them was the limit of the scope. I can agree with Jay as to searching for methods to correct relatively simple issues that require hours of my day when, If I had just hacked at it a bit, I would have completed my project in less time than the research. As an aside, I wonder at the complexity of a tree approach since the implementations I have been a part of were truly complex.
–JFo