Besides being the year of war, terrorism, corporate fraud, and blackouts, 2003 was also the year of spam. As more users found their legitimate e-mail vastly outnumbered by spam, spammers and antispam vendors played a constant Tom-and-Jerry game, frantically coming up with evermore-sophisticated techniques to outfox each other.
As recently as a year ago, many antispam solutions relied on keyword recognition to separate spam from legitimate e-mail. Spammers outwitted such strategies by interspersing commas, spaces, exclamation points, and deliberate misspellings (such as V!agra) in headers and message content to get through. Weve all seen such tricks, but you may not be aware of less obvious ploys that rely on HTML features to foil spam filters. For example, a spammer may intersperse white-on-white text or zero-font-size characters in between visible text. You wont see such characters unless you select them with your mouse, but filters take them into account. Other tricks include using the
HTML entity to place a space between letters, adding phony HTML style tags, or indicating each letter with an HTML entity. When a keyword filter sees HTML entities and style tags, it simply reads them as text. So if a spammer uses HTML entities for letters and spaces, the filter reads
V i a g r a
What a user sees is Viagra.