Japanese

top report presentation (PDF) article (Ja.) white list black list watch tool applyers Q&A blog (Ja.) links history contact

Rejection log sorting script

(the former version)

   Here is a shell script useful for finding out mail servers which are mistakenly rejected by the S25R anti-spam system. If your computer combines a web server with a mail server, you can easily watch rejection records with a web browser by means of installing this script together with a password under a directory under the cgi-bin directory. You can also run it as a command.

Function
   This script inputs a Postfix mail log and extracts records of rejection with the response code "450" (meaning "try again later") by client restriction (rejections by other reasons are not extracted), and displays them sorted so that the records of retry accesses are arranged in a sequence. That is, accesses with the same client IP address, sender address and recipient address are arranged in a sequence. If all of them are not the same, the records are separated by a blank line with each other.
   It also displays the following data at the end:
Practical use
   Legitimate mail servers always retry transfer at appropriate intervals against rejection with the response code "450". The records of those rejections are displayed in a sequence with this script. Therefore, it will help you to find out accesses from legitimate mail servers which should be listed in the white list.
   If accesses displayed in a sequence satisfy all of the following conditions, the client is probably a legitimate mail server which should be listed in the white list.    Meanwhile, if accesses displayed in a sequence fall under any of the following conditions, the client is probably or surely illegitimate.    In case you are hard to decide whether the client is legitimate or not, you should once list it on the white list, and then unlist it if the recipient complains of spam.

Necessary configuration
   You need to configure the access mode of the mail log files so that they are readable with the authority of the HTTP daemon. In many systems, you can configure it with the commands as follows:
chgrp nobody /var/log/maillog*
chmod g+r /var/log/maillog*
Alterations
Shell script code
#!/bin/sh
echo "Content-Type: text/plain"
echo
echo "Mail rejection log (450 Client host rejected) - sorted"
echo
#
# (1) Input mail log.
#
cat /var/log/maillog.1 /var/log/maillog | \
#
# (2) Extract records indicating "450 Client host rejected".
#
egrep 'reject:.+ 450 .*Client host rejected:' | \
#
# (3) Extract essential items.
#
gawk '
{
  client=substr($0, match($0, /from [^]]+\]/)+5, RLENGTH-5)
  sub(/\[/, " [", client)
  sender=substr($0, match($0, /from=<[^>]*>/), RLENGTH)
  rcpt=substr($0, match($0, /to=<[^>]*>/), RLENGTH)
  helo=substr($0, match($0, /helo=<[^>]*>/), RLENGTH)
  printf "%s %2d %s %s %s %s %s\n", $1, $2, $3, client, sender, rcpt, helo
}
' | \
#
# (4) Convert month names into month numbers.
#
gawk '
BEGIN {
  month_num["Jan"]=1
  month_num["Feb"]=2
  month_num["Mar"]=3
  month_num["Apr"]=4
  month_num["May"]=5
  month_num["Jun"]=6
  month_num["Jul"]=7
  month_num["Aug"]=8
  month_num["Sep"]=9
  month_num["Oct"]=10
  month_num["Nov"]=11
  month_num["Dec"]=12
  max_month_num=0
}
{
  $1=month_num[$1]
  if ($1>max_month_num)
    max_month_num=$1
  else if ($1<max_month_num)
    $1+=12
  printf "%3d %2d %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8
}
' | \
#
# (5) Sort according to IP address, sender address and recipient address.
#
sort -k 5,7 | \
#
# (6) Insert a blank line between records with a different triplet.
#
gawk '
BEGIN {
  prev_triplet=""
}
{
  if (prev_triplet!="") {
    if (prev_triplet!=$5 $6 $7)
      print ""
  }
  print
  prev_triplet=$5 $6 $7
}
' | \
#
# (7) Convert retry records in a sequence into one line.
#
gawk '
BEGIN {
  RS=""
}
{
  gsub(/\n/, "\036")
  print
}
' | \
#
# (8) Sort according to date and time.
#
sort -k 1,3 | \
#
# (9) Reconvert retry records in a sequence into multiple lines.
#
gawk '
{
  gsub(/\036/, "\n")
  print
  print ""
}
' | \
#
# (10) Reconvert month numbers into month names.
#
gawk '
BEGIN {
  month_name[1]="Jan"
  month_name[2]="Feb"
  month_name[3]="Mar"
  month_name[4]="Apr"
  month_name[5]="May"
  month_name[6]="Jun"
  month_name[7]="Jul"
  month_name[8]="Aug"
  month_name[9]="Sep"
  month_name[10]="Oct"
  month_name[11]="Nov"
  month_name[12]="Dec"
}
{
  if ($0!="") {
    $1=month_name[($1-1)%12+1]
    printf "%s %2d %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8
  }
  else
    print ""
}
' | \
#
# (11) Output sorted records with counting.
#
gawk '
BEGIN {
  Suppress_single_access_records=0
  RS=""
  acc_count=0
  host_and_rcpt=""
  msg_count=0
  seq_count=0
}
{
  retry_count=gsub(/\n/, "\n")
  acc_count+=1+retry_count
  if (index(host_and_rcpt, $5 $7)==0) {
    ++msg_count
    host_and_rcpt=$5 $7 host_and_rcpt
  }
  if (retry_count>0)
    ++seq_count
  if (!(retry_count==0 && Suppress_single_access_records)) {
    print
    print ""
  }
}
END {
  print "access count =", acc_count, \
      ", estimated message count =", msg_count, \
      ", retry sequence count =", seq_count
}
'
top report presentation (PDF) article (Ja.) white list black list watch tool applyers Q&A blog (Ja.) links history contact