The Misspelled Universe

18 Apr

How many times have you typed something in Google to be asked “Did you mean: ….” Next time you reach this page, stay a little longer and take a look at the pages that Google did find. This is your gateway to the parallel universe of misspelled words. Well let me correct myself — these “misspelled” words can belong to a different language altogether or they even might be rarely used genuine English words with close resemblance to the heavily used ones.

An entire gamut of information is being denied to us due to mere errors in spelling. To deride these spelling mistakes as “mere errors in spelling” is to ignore a small minority of people who deliberately misspell words so as to make their pages less publicly accessible. This works as an effective low-tech solution for every underground society has demands obscurity.

Then there are people who exploit misspellings to make their living e.g. People searching auction sites like eBay for misspelled (or mislabeled) items, and hence hopefully underbid items. (* eBay now offers a spell-check utility but surprisingly few people still refuse to use it.)

Excepting eBay entrepreneurs, one thing that is clear is that we are “losing”‘ this increasingly vast pool of information containing misspelled “keywords” (words we type in a search engine). There is an argument to be made that the quality of information source with misspelled words may itself be poor and hence we needn’t worry about the “lost” information. Arguably, the frequency of misspelled words in a peer-reviewed journal is much lower than say my blog. ;) The normative question is, Does that rightly consign my blog to obscurity?

Internet search is a classic case of finding a needle in a haystack, and search algorithms are built of dispense with as much “clutter” (hay) as fast as possible, leaving a very small minority of websites that are given genuine value. What we are seeing are two trends implicit in Google’s search algorithm — most of our search needs are about “popular”‘ items (given a higher rank by Google), and it is progressively harder to find “unpopular” sources. On the face of it the trend is innocuous and even sensible but the wider ramifications include information hegemony.

Let us turn the discussion around to sites that use “syntactically correct but meaningless verbiage including commons search terms” (a sentence like “Indeed, a blind crenelation blasphemously a player inside the stictomys. For example, a whopper behind a ferrocyanide indicates that the saccharinity behind a casino tropez another euphausiacea from another modem.”) People also “Google bomb” (mass posting on blogs/lists associating a search phrase with online address). Some sites have in fact automated this by writing programs that automatically go to different blogs/lists and post entries/comments like “poker chips poker – [web address].” This problem is much worse as it is making it progressively harder for us to find “genuine” (or most popular/reliable) information.

So will there be too much seemingly reliable unreliable information or will we miss a lot of seemingly unreliable reliable information? Chances are that both will happen.