A hundred years ago, when I was a kid, I imagined myself my very own robot friend. This critter would do things for me like keep me on time to the next exciting adventure, remind me of relevant bits of minutiae concerning what makes the perfect throwing rock, as well as every page of every book that I ever read and could even make cross links between topics so that everything was always exciting and new and mattered to me.
30 years later and there’s still nothing quite like the buddy of my dreams. Of course, now that I’m thinking about it, I’d really like for it to read all of my emails, keep track of my surfing, grab snippets of stuff that ends up on my computer and puts it into an infinitely variable, intelligent soup.
We’re not there yet. Hell, we peek and poke through shitty email interfaces and use up post-it notes in vain attempts to keep track of the infinite torrent of information that visits us on a daily basis. And we still get lost in the middle of a million things, just wanting to go back to being a kid again when the most important thing in the world to you was getting home for dinner before it was dark.
Stopping Things
When I take a look at a chunk of text, I try to imagine what a computer needs to perform. The first step is to attempt to whittle the amount of information which a system needs to pay attention to. Enter stopwords.
Stopwords are exactly what you think they are. Words which the system can toss away and cheerfully ignore. Words like “a”, “and”, “the”, and so on make semantic sense for a human, but are perfectly useless to a computer. Let’s kick around an example a bit to illustrate what a stopword would do for a computer.
We read:
“The quick brown fox jumps over the lazy dog”
A computer reads:
“quick brown fox jumps over lazy dog”
While the “syntactic sugar” of the English language contains a lot of extra fluff to make it more palatable for humans to communicate, quite a bit of it can be discarded and the gist of the communication is still perfectly valid.
Well, that’s the first part of things. Stay tuned for more as it comes.
tom
Comments are closed.