Opened 10 years ago

Closed 9 years ago

#12 closed task (worksforme)

Framework to fix and fix again mail logs for Poststat

Reported by: Vincent Caron Owned by: Valentin SCHMITT
Priority: minor Component: Postfix log tools
Keywords: Cc: Benjamin Linet

Description

Poststat (see PostfixLogTools) tends to break depending on some configuration details (mail filtering setup, LDA, etc) and must be tuned regularly even for Bearstech only usage.

If other users adopt Poststat, we must assume wilder and wilder log files and be prepared to parse them more or less proprely. The idea is to make sur we can quickly find a fix for some user case without breaking the other ones.

First step: gather some log files, anonimyze them. Any Postfix user can contribute its log. But we need an anonymizer which scrambles in non-reversibles ways email addresses, host names and IP addresses.

Second step: have a reference parsing result for every log file (poststat outputs a list of counters, so we'll use that), build a simple non-regression test for that in a 'make test' target.

Change History (16)

comment:1 Changed 10 years ago by Vincent Caron

For the first step we need to modify some mail.log files to:

  • Only keep postfix/* related lines (one can find imap and pop service messages too)
  • Scramble names (host and email addresses) in a non-reversible way. Ideally the result produces kind of readable names too (maybe keep the TLD as they are). Msgid can be preserved IMHO, it looks safe from a privacy point of view
  • Optionnaly: move dates to a fixed offset, say 2010 Apr 1 - a bit tricky for day wrappings

I think we'll use approx. one day of activity or 10,000 connection entries per log, wichever is larger. Then scramble it, gzip it and commit it.

Should I mention Perl ? :)

comment:2 Changed 10 years ago by Benjamin Linet

Cc: Benjamin Linet added

comment:3 Changed 10 years ago by obibi

(In [496]) Add Perl script to anonymize Postfix mail logs for poststat, see #12.

comment:4 Changed 10 years ago by Vincent Caron

It's fine till there. It only forgets to match the 'host' field added by syslog ("<date> <host> postfix/smtp ...").

Now to implement some nice transformation, you may use the /e modifier: in this case, the right-hand-side (rhs) part of the s/ regex is Perl code. Combine with /g to substitute any number of occurences per line. Maybe add /x to allow whitespaces and readble regexes ('man perlrequick' is handy: lots of information, but no as cryptic as 'man perlre'):

s/ ([0-9]{1,3}\.){3}[0-9]{1,3} / anon_ip($1) /egx;
sub anon_ip {
  my $ip = shift;
  $ip =~ s/ (\d+) / int($1/3) /egx;  # Divide all found numbers by 3
  return $ip;
}

After a while you'll love twist those things until it fits in some magical one-liner:

s| (\d+)\.(\d+)\.(\d+)\.(\d+) | join(".", map {int($_/3)} $1,$2,$3,$4) |egx;

... which "reads" (really) "If you find a int.int.int.int pattern, put distinct 'int' in a list, integer-divide them by 3 then join them with a dot". Which is more or less the intuitive algorithm suggested from the start. The basic idea is to capture the elements you want to transform (IP address bytes in our case), then it goes smoothly in the RHS: compute something with $1,$2,etc.

comment:5 Changed 10 years ago by obibi

(In [501]) Scramble ip addresses by dividing them by 3, see #12,

comment:6 Changed 10 years ago by obibi

(In [503]) Regexps optimizations, see #12.

comment:7 Changed 10 years ago by obibi

(In [504]) Licence added, see #12,

comment:8 Changed 10 years ago by Vincent Caron

Next step is to check the way source:/posstfix-logtools/poststat/poststat parses and count various events. Ex:

zcat mail.log.2.gz |./anonymizer.pl |./poststat 
ce_hangup           : 9226
ce_helo_no_dns      : 3924
ce_helo_not_fqdn    : 890
ce_proto_error      : 102
ce_psbl             : 10151
ce_sorbs            : 3466
ce_spamhaus         : 6529
ce_ssl_error        : 98
ce_timeout          : 154
ci_connect_ext      : 33178
ci_connect_mx       : 1079
me_no_relay         : 3
me_no_target        : 1114
me_sender_no_dns    : 571
mi_count            : 32790
mi_size             : 1229269473

Although the code looks quite simple (a regular list of regexes), it's a bit tricky:

  • The order in which they are evaluated is important
  • Not all log files show the same events

The basic flow of events has been classified along those lines:

ci_*: "connection input", the number of incoming SMTP connections. It's only split in two categories: those from our own MX backups, and others. TODO: for this to work correctly we must be able to accept a list of "--backup-mx <ip>" options in poststat (and even handle the lack of). I also suggest to rename s/ci_connect_mx/ci_connect_int/ (internal, our servers). We should also maintain a ci_connect counter (=ci_connect_ext+ci_connect_int), that will be easier to graph.

ce_*: "connection errors", actually. We account here all events which rejected the whole SMTP transactions (before the DATA stanza), like DNSRBL, greylisting, timeouts, etc. I think we should add a ce_ok counter (oxymoron I know), in order the sum of all ce_* counters = ci_connect; again nicer to graph and show (How many connections made it through my filters ?). One problem I encountered is that the same connections might generate several errors; the line-oriented parsing only pick the first error which is maybe not the best. Some errors are hard to classify to ("Lost connection after...": why the client disconnected ? Because we forced him to ?). TODO: carefully check all error occurences, and tune their parsing; at least every error should be taken into account, I historically did this by die()'ing as soon as I didn't expicitly matched a log line (add a die() at the end of the while() loop).

comment:9 Changed 10 years ago by Vincent Caron

Example where order is important:

    $stat{ci_connect_mx}++,      next if /^connect from ns3\.bearstech\.net/;
    $stat{ci_connect_ext}++,     next if /^connect/;

First check if it's a client connection from one of our backup MX. Otherwise account as a _ext connection.

BTW:

    # Don't count local connections, most of the are reinjections from a/v
    #$stat{ci_connect_local}++,   next if /^connect from localhost/;

It's still true, in most cases mails are forwarded via LMTP to an AV filter (we use Amavisd-new), and the filter re-injects the mail if it's okay with it. That very information is not important, however:

  • I would still like to count locally generated emails, like those sent via the webmail; But I can't see how you can tell them apart (log says postfix/smtpd[17743]: connect from localhost[127.0.0.1]).
  • That would be interesting to have the AV stats. Postfix only knows it correctly has forwarded the mail to Amavis, but doesn't care if it's reinjected back for final delivery. And amavis seems to only log successes, so we would have to track all LTMP forwards and see if they have a matching CLEAN status from Amavis. Quite doable but we dive into the joy of stateful parsers :). That's another reason why I was thinking about Milters and filter integration in Postfix chains: better accountability. Mmm, another ticket :)

comment:10 Changed 10 years ago by Vincent Caron

Last counters:

mi_count: incoming messages, actually "recipients". For one connection, you may have many RCPTs. I try to account all of them here.

mo_delivered: outgoing messages, which means they are delivered (to a local maildir or another mailhub).

me_*: messages rejected, by reason. I think some counters are wrong: me_sender_* should be ce_sender_* (there is only one sender per connection AFAIK - I don't think ESMTP Pipelining is ever used...). Anyway, at this stage, either the recipient exists and/or is relayable, either it's not. Well I'm not sure, needs rethinking, I'm confused.

If everything goes well, you could check that mi_count = mo_delivered + sum of me_* for any parsed mail.log. That's a challenge I think :).

comment:11 Changed 10 years ago by obibi

(In [509]) Improvements on the count of the input connections, see #12.

comment:12 Changed 10 years ago by obibi

(In [521]) Continuing work on MX backup option, see #12.

comment:13 Changed 10 years ago by obibi

(In [522]) "connexion input" functions implemented, see #12.

comment:14 Changed 10 years ago by obibi

(In [536]) Draft to fix the last counters, see #12.

comment:15 Changed 10 years ago by obibi

(In [539]) poststast script updated, see #12.

comment:16 Changed 9 years ago by Vincent Caron

Resolution: worksforme
Status: newclosed

Not really maintained but still sort of working (https://mail.bearstech.com/stat/).

Let's keep it that way, we have so much other stuff to work on.

Note: See TracTickets for help on using tickets.