About two years ago we re-implemented greylisting, sender verification and RBL blacklisting on our MTAs. As a result, we had a very steep drop in unsolicited email, but also very interesting mailbounces from applications which didn't set a proper envelope-from address. It taught us that we needed two things: a whitelist service and a blocked-email reporting service.
The blocked-email reporting service was easy, just parse the Postfix log files once a day and check all the NOQUEUE entries. Then match the temporary failed deliveries against the successful deliveries and the leftover is a failed delivery.
The whitelist service is not too difficult, thanks to the check_policy_service feature which gives the black/whitelist daemon the sending MTA, and the envelope-from and envelope-to email addresses. A lookup against the black/whitelist database and all goes fine.
It is not really one lookup, it is a set of seven lookups:
Recently we were getting strange errors in the blocked-email reporting service, talking about Internal server configuration errors which were happening on our primary MTA. Not that the email got blocked by it, it just was delivered to our backup MTA. But still it was something which didn't make sense. We found out very soon that it was caused by a slow black/whitelist daemon. Not really a slow black/whitelist daemon, more a slow database. And not really a slow database, just a busy database.
A busy database? All it is doing is euhmmm... let's see... 70 queries per second?!?!?! A quick look at the Cacti graphs showed me that over the last three weeks mail servers went from an average of two messages per second to an average of six messages per second (message delivery attempts that is), and since every message has seven database queries it kind of shot through the roof and the black/whitelisting daemon we had couldn't handle it fast enough.
Time for a redesign. We have about 1500 entries in the black/whitelist database, and a hit-rate of about zero (most of the rejecting of email is done via sender verification and via DNS-RBLs). With such a low hit-rate, it shouldn't be a bad thing to locally cache a copy of it and refresh it every 15 minutes. That will save about 5400 database queries during that time period.
Three hours later, some rewriting of the black/whitelist daemon and the database server is happy again with the two queries per 15 minutes now. And the mail? It kept flowing...| Share on Facebook | Share on Twitter