As I mentioned in this post I’m running Exim with Spamassassin and ClamAV. Because we’re running on a virtual machine the loads could get quite high because other VMs on the same physical host was using too much resources. Loads could get as high as 20 on a 15minute average. Beeing the great guys they are, Bytemark switched our whole virtual machine to another physical host upon request. And it all took just took 20 minutes! Slick.
Our load problems are pretty much over now because of the physical host move. But when we were suffering high loads I discovered that to handle mail reliably under high loads Exim should have some special settings. This is especially important when using Spamassasin, because under high loads it would use execessive amounts of time to scan an email, and Exim timed out the Spamassassin process.
To avoid timeouts I set Exim to not try to process, but just queue the incoming mails until loads reached reasonable levels:
deliver_queue_load_max = 1.0
This could probably be a bit higher, but we set it to 1.0 to be extra cautious.
Under extremely high loads Exim had problems just accepting the mail and putting it in the queue. Other mailservers would get an timeout after a smtp connection had been established, and that would actually lead to the failure of delivering the mail, ending all retries later. In that case it was better if one of our backup MXes would received the mail, and pushed it to the main server when loads lowered. The following setting makes sure Exim does not accept SMTP connections if the load crosses 5.0:
smtp_load_reserve = 5.0
This could also probably be a bit higher, but we set it to 5.0 to be extra cautious. A problem for us was that the load fluctuated too fast, so even thoug the load would be below 5.0 when an connection was accepted it could easily be 15 before the mail had been completely delivered.
As a final precaution I also added this line to the Spamassassin router, which tells it to continue processing even though the Spamassassin transport fails (times out):
pass_on_timeout
That’s it. 🙂 I’ve learnt a lot from tweaking Exim. It’s been frustrating but fun too. Our system is fairly stable, and none of us should ever miss out on a mail again. 😉