Technology Review - Published By MIT
Advertisement click here...

A Better Way to Shoot Down Spam

Junk mail can now be identified based on a single packet of data.

By Rachel Kremen

Wednesday, July 29, 2009

smaller text tool iconmedium text tool iconlarger text tool icon

New software developed at the Georgia Institute for Technology can identify spam before it hits the mail server. The system, known as SNARE (Spatio-temporal Network-level Automatic Reputation Engine), scores each incoming e-mail based on a variety of new criteria that can be gleaned from a single packet of data. The researchers involved say the automated system puts less of a strain on the network and minimizes the need for human intervention while achieving the same accuracy as traditional spam filters.

Credit: Technology Review

Separating spam from legitimate e-mail, also known as ham, isn't easy. That's partly because of the sheer volume of messages that need to be processed and partly because of e-mail expectations: users want their e-mail to arrive minutes, if not seconds, after it was sent. Analyzing the content of every e-mail might be a reliable method for identifying spam, but it takes too long, says Nick Feamster, an assistant professor at Georgia Tech who oversaw the SNARE research. Letting spam flow into our in-boxes unfiltered isn't a sensible option, either. According to a report released by the e-mail security firm MessageLabs, spam accounted for 90.4 percent of all e-mail sent in June.

"If you're not concerned about spam, I would suggest you turn off your spam filter for about an hour and see what happens," says Sven Krassen, senior director of data-mining research at McAfee. The Santa Clara, CA, company provided raw data for analysis by the Georgia Tech team.

The team analyzed 25 million e-mails collected by TrustedSource.org, an online service developed by McAfee to collate data on trends in spam and malware. Using this data, the Georgia Tech researchers discovered several characteristics that could be gleaned from a single packet of data and used to efficiently identify junk mail. For example, their research revealed that ham tends to come from computers that have a lot of channels, or ports, open for communication. Bots, automated systems that are often used to send out reams of spam, tend to keep open only the e-mail port, known as the Simple Mail Transfer Protocol port.

Story continues below

Furthermore, the researchers found that by plotting the geodesic distance between the Internet Protocol (IP) addresses of the sender and receiver--measured on the curved surface of the earth--they could determine whether the message was junk. (Much like every house has a street address, every computer on the Internet has an IP address, and that address can be mapped to a geographic area.) Spam, the researchers found, tends to travel farther than ham. Spammers also tend to have IP addresses that are numerically close to those of other spammers.

Dean Malmgren, a PhD candidate at Northwestern University whose work includes identifying new methods for identifying spam, says he finds the research interesting. But he wonders how robust SNARE will be once its methodology is widely known. IP addresses, he notes, are easy to fake. So, if spammers got wind of how SNARE works, they might, for example, use a fake IP address close to the recipient's.

Comments

  • The odds are in your favor!
    Well, if 90+% of all email is SPAM, just delete all emails and you'll be right 90+% of the time - a helluva lot better than other SPAM filters!
    C'mon, based on my recent experiences, SPAMers are a lot smarter than the people who run the Internet. And those who profit from building ever more Internet bandwidth have no interest in stopping SPAM. If it was stopped, they'd lose a lot of business selling product and services to build more capacity for a couple of years.
    We can stop SPAM in 10 minutes - just charge a penny per email. Make it economically unfeasible - like the US Post Office has with most types of junk mail - and you stop it immediately. And implementing it is simple - charge everyone a few extra bucks a month for the first couple of hundred emails they send so only the big users get billed.
    But it will never happen - as Internet profiteers will never go for it - as I said - they make too much money off the Spammers.
    Maybe we need to compare them to the Big Pharma or the Health Insurance Industry - everybody knows what profiteers they are!!!
    Rate this comment: 12345

    fiberman
    07/29/2009
    Posts:59
    Avg Rating:
    3/5
    • Re: The odds are in your favor!
      We can stop SPAM in 10 minutes - just charge a penny per email. Make it economically unfeasible - like the US Post Office has with most types of junk mail - and you stop it immediately. And implementing it is simple - charge everyone a few extra bucks a month for the first couple of hundred emails they send so only the big users get billed.

      But it will never happen - as Internet profiteers will never go for it - as I said - they make too much money off the Spammers.


      It has nothing to do with profit, and everything to do with implementation.

      Using the post office as an example is disingenuous. The post office is a monopoly. You cannot just set up your own post office, issue your own stamps, and expect letters you sent to actually get anywhere. The post office is able to charge people money for sending letters because you simply have no other way to send said letters. Yes, you could always use UPS or Brown or DHL, but these are massive companies that have required billions of dollars to set up. And you still have to pay to use their services. They can charge you because they control EVERYTHING from receipt of your letter from you, to its delivery at its destination. At no stage does it ever leave their control.

      E-mail servers are a whole different ball of wax. When I set up an e-mail server, I am (quite literally) setting up my own post office. Using that server, I can send e-mail to anyone else in the world without having to rely on anyone else to actually "carry" that e-mail as an e-mail. In other words, the ISP's which own the fibre optic lines that make up the Internet wouldn't be able to tell my e-mail from a web page without directly inspecting the data packet. To do that with every data packet passing through their system would be cost-prohibitive, mainly because any data over a certain size (usually a fraction of what an e-mail or web page usually is) is broken up into many packets which can take any possible path between source and destination. To track and confirm an e-mail would require re-assembling the entire e-mail from its individual packets -- and packets may have bypassed their network entirely (by using a competitor's), rendering any re-assembly impossible. The Internet is based on a LACK of control... and that is its strength. If a network from, say, AT&T goes down, there is always the network from Sprint which can allow data to be re-routed around the blackout area. That way, no one ISP actually "controls" what goes over its network... it can throttle the data, but it cannot control it in any way which is reliable for everyone on the Internet. This lack of control - this flexibility, actually - is the very keystone of what makes the Internet reliable and resistant to interruption.

      Plus, the barrier to entry for e-mail is so low. Setting up your own post office and postal network would require millions - if not billions - of dollars. Setting up a single e-mail server costs me nothing in terms of software (Linux) and may even cost me nothing in terms of hardware (cast-offs from people upgrading).

      Then we get to the issue of who would collect your "penny per e-mail". Since I have just created my own postal service (e-mail server), and since the e-mail server connects directly to the recipient server to deliver e-mail, who would step in to demand payment? If the recipient is another "independent" like me, why should we even acknowledge the transaction? Who would be the central arbiter for collecting these funds, where would they go, and what rights would this central arbiter have to force independents like me to pay? Essentially, forcing senders of e-mail to pay would fracture and break up the Internet into the "haves" and the "have-nots".

      And finally, there is the issue of inertia. E-mail is now so pervasive, and so broadly spread around the world, that any attempted implementation would be stillborn. Entire countries would refuse to play along (especially those poor countries struggling to improve their lot with the Internet - even a penny per e-mail would be too costly for citizens who might earn only a few hundred dollars per year). Businesses would balk. And individual consumers - especially those who are tech savvy - would be the biggest protesters. They would find ways around the pay system, rendering it impotent. Without 100% compliance from day one (and we are 34 years too late for that), setting up a "pay system" for e-mail would be impossible.
      Rate this comment: 12345

      smithsomian
      07/29/2009
      Posts:38
      Avg Rating:
      3/5
  • spammers send free
    If you set up a system to charge for emails sent, it will have to be administered by our ISPs.  The legitimate users will have to pay the charges to the ISPs.  However, spammers use bots, open SMTP servers and/or ISPs in Russia that don't care about this issue.  They will still send spam for free.
    Rate this comment: 12345

    sorgfelt
    07/29/2009
    Posts:7
    Avg Rating:
    4/5
Technology Review July/August 2009

Current Issue

Search Me
Inside the launch of Stephen Wolfram’s new “computational knowledge engine.”
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News
Advertisement
click here...
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement click here...
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.
Quantcast