p3sc.pl

p3sc.pl is a POP3 SPAM Checker Perl script. It is designed to login to POP3 email accounts and remove SPAM emails from the server before a "real" email client downloads the email. The script determines which emails are SPAM  from a "white list" of good domains, email addresses, and subject lines.

Caveat: This methodology works well for personal email accounts where you know who the senders are going to be or what the subject lines are going to contain. For business emails this script might not work too well.

This script was written as I was getting annoyed having to download the SPAM emails from my POP3 email accounts. I did not see why I needed to use up my bandwidth to download the email to only delete it.

This script logs in, downloads the headers of the messages, determines from a filter list if any email is SPAM, marks the SPAM email for deletion on the server and then logs out. Your normal email client can then come along and download the non-SPAM emails which are left.

The script is designed to be run from a cron job that schedules the scan.

The script uses 2 additional files written in the XML format. One is the filter file that lists the good, and the bad, email criteria. The other is a configuration file that specifies things like the email accounts and debug levels etc.

I have placed examples of the configuration and filter files at the bottom of this page.

There are a couple of TO DOs noted in the script comments. I would like to implement them when I can find the time.

One of the TO DOs is to manage the filter file via special emails. This has some of the ground work in place.

Enjoy

Note: The script uses the XML::Simple XML parser from CPAN. Which in turn requires either the XML::Parser or XML::SAX. Both also available from CPAN.

#!/usr/bin/perl -w -t
#------------------------------------------------------------------------------------
#
#  p3sc.pl
#
# pop3 spam checker
#
# Script to login to and check POP3 email accounts for SPAM emails
#
#  (C) 2006 Jim Pye - PyeNet Universal
#
# This script is based on the rawpop3.pl script from CPAN.
# Therefore it is released under the GPL as per that script.
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; either version 2 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
#
#
# This script is intended to do what it states to do. However as I have no control
# over your machine or the email service providers you use. If this script breaks
# something you have my permission to pick up the pieces. It would be good to let me
# know if something has gone wrong. I will see if my Perl skills are up to fixing it.
#
# This uses a filter file that is designed to list Good email addresses, Domains
# and Subject lines. A White List if you like.
# The File is called by default p3scfilter.xml
#
# A configuration file is used to specify the POP3 servers to use and authentication
# details etc. This file is called by default p3sc.conf
#
# The script uses the XML::Simple XML parser from CPAN.
# http://search.cpan.org/search?query=XML%3A%3ASimple&mode=all
# Which in turn requires either XML::Parser or XML::SAX Both also available from CPAN
#
# Version 0.1 - Original Script
#
# Version 0.2 - Reformatted the output to make it easier to read
#
# Version 0.3 - XML formatted config file and to cycle through multiple email accounts
#
# Version 0.4 - Modified the regex match to use the \Q and \E quoting to allow for
#               metacharacters in the data in the filters file specifically [ and ].
#               Also added the -t to the perl command line above. This is the default.
#
# Version 0.5 - Modifed the search for bad email to skip rest of the tests once a
#               SPAM or GOOD message was found.
#
# Version 0.6 - Started to add the abiltiy to send commands to p3sc to add or
#               remove items from the filter file via emails. Script will look
#               for subject line as specified in config file and treat the email
#               as commands.
#
# TO DOs
#  Issue with disconnecting - see the note before the foreach loop
#
#  Would like to add the downloading of GOOD emails (this was in the original
#  rawpop3.pl script)
#
#  Would like to be able to send special emails to manipulate the filter list
#
#
#  Pye, James Pye, chmod 007
#  The Ultimate Open Source
#
#------------------------------------------------------------------------------------
use strict;
use Socket;
use XML::Simple;

 # These name the current location and versions of the
 # config and filter rules files
 # Configuration file
my $CONFIG_FILENAME = \"p3sc.conf";
 # Expected version of current configuration file
my $CFG_CUR_VERSION = \"1.0";

 # Filter rules file
my $FILTER_RULES;
 # Expected version of current filter rules file
my $FILT_CUR_VERSION = \"1.0";

 # Version tag within Filter and Config files
my $VERSIONTAG = \"version";

 # File name when a message is saved
my $MSG_DOWNLOAD_FILE;

 # Testing mode "1" or Real mode switch "0"
my $g_testing_mode;

my $EOL = \"\015\012";

 # Thousand separator
my $THOUSAND_SEPARATOR = \",";

 # Possible log levels
my $LL_NONE = \0;
my $LL_ERROR = \1;
my $LL_WARNING = \2;
my $LL_NORMAL = \3;
my $LL_VERBOSE = \4;
my $LL_DEBUG = \5;

## " This is an attempt to clean my editor's syntax highlighting due to
## the \ escaping of the references above - JTP

my $g_cur_log_level;

 # Is the log information being output to screen ?
my $g_log_display;

 # Default POP3 account informations (server, port, user, password)
 # These should be overridden by entries from the configuration file
my $g_pop3_server = "";
my $g_pop3_port = 110;
my $g_pop3_username = "";
my $g_pop3_password = "";

my $g_curconn_server = "";
my $g_curconn_port = 0;

 # This needed here as the &read_config routine uses it but will override it
$g_cur_log_level = $$LL_NORMAL;

 # Some configuration check boxes
my $g_getlist_when_connect = 0;
my $g_detail_all_when_connect = 0;

 # State of the POP3 connection
my $g_pop3_is_connected = 0;

 # Number of messages on server
my $g_nbmsgs;

 # variables for the filter rules
my $filter_rule_data;
my $addr_data;

 # variables for the configuration data
my $config_data;
my $account_details;
my $account_field;

 # variable to hold the expected subject line of a command message
my $command_subject;

#---------------------------------------------------------------------------
#
# The heart of the script
#
#---------------------------------------------------------------------------

# Read in the configuration file to setup things like logging levels and
# eMail accounts to parse etc.
&read_config;

# If g_testing_mode = 1 then print a notification that we are running in testing mode
if ($g_testing_mode) {
    &my_log("\n====\nRunning in Testing Mode, Messages will be left on server\n====\n",$$LL_NORMAL);
}

# Read in the Filter Rules
&read_filter_list;

# TO DO
# switch_conn_state does not seem to actually change state it only
# changes a variable so this is why I do not see disconnected message when
# running in testing_mode. The disconnected message is printed only in
# switch_pop3_connect which logs out as well

# Loop through each email account configured in the configuration file
foreach $account_details (@{$config_data->{"email-accounts"}}) {
    foreach $account_field (@{$account_details->{"email-account"}}){
        $g_pop3_server = $account_field->{"server"};
        $g_pop3_port = $account_field->{"port"}[0];
        $g_pop3_username = $account_field->{"login"}[0];
        $g_pop3_password = $account_field->{"password"}[0];

        # set intial connection state to disconnected
        &pop3_switch_conn_state(0);

        # try to connect and login to server
        &c_switch_pop3_connect;

        &c_get_list;

        # If running in testing mode we do not QUIT from server as this
        # would purge the messages flagged as deleted
        if (!$g_testing_mode) {
            &c_switch_pop3_connect;
        }
        &pop3_switch_conn_state(0);
    }
}

# Finished
exit;

# Subroutines here on down

#
# Get a list of messages on the server
# Usage:
#           &c_get_list;
#
sub c_get_list {
    my $e;
    my $err_str;
    my $i;

    if (!$g_pop3_is_connected) {
        &my_log("unable to get list of messages: no POP3 connection is established", $$LL_ERROR);
        return;
    }

    if ($e = &pop3_update_msg_list($err_str)) {
        &my_log($err_str, $$LL_ERROR);
    } else {
        for ($i = 1; $i <= $g_nbmsgs; $i++) {
            &my_log("Entering delete or leave function", $$LL_DEBUG);
            &msg_delete_or_leave($i);
        }
    }
}

#
# Format a number with a separator each 3 digits.
# Usage:
#
#        $s = &fnb($number);
#
sub fnb {
    my $n = $_[0];

    $n =~ s/(\d)(?=(\d\d\d)+(\D|$))/$1$$THOUSAND_SEPARATOR/g;
    return $n;
}

#
# Update the list of server messages.
# Usage:
#
#        $errno = &pop3_update_msg_list($err_str);
#
# This proc assumes the connection with the POP3 server has been established.
# Return 0 if the connection was successful.
# Return a non-zero value if the connection failed. In that case, $err_str contains
# a description of the error.
#
sub pop3_update_msg_list {
    my $e;
    my $nb;
    my $cont;
    my $l;
    my $msg_idx;
    my $msg_size;
    my $total_size = 0;

    if (!$g_pop3_is_connected) {
        $_[0] = "unable to get list of messages: no POP3 connection is established";
        return 1997;
    }

    return $e if $e = &pop3_send_recv_and_ctrl("LIST", "+OK", $_[0]);
    $cont = 1;
    $nb = 0;
    while ($cont) {
        return $e if $e = &sock_recv($l, $_[0]);
        if (($msg_idx, $msg_size) = $l =~ m/(\d+)\s+(\d+)/) {
            $nb++;
            $total_size += $msg_size;
        } elsif ($l =~ m/^\.$/) {
            $cont = 0;
            $g_nbmsgs = $nb;
            &my_log("$g_nbmsgs message(s) of " . &fnb($total_size) . " byte(s)\n", $$LL_NORMAL);
        } else {
            $_[0] = "unable to parse answer from server, answer = \"$l\"";
            return 1999;
        }
    }
}

#
# Delete selected message
# Usage:
#
#        &delete_selected_message($msg_pop3_number);
#
sub delete_selected_message {
    my $msgi = $_[0];
    my $err_str;
    my $e;
    my $test_string = "";

    if ($g_testing_mode) {
        $test_string = "Would have";
    }

    &my_log("Deleting message \#$msgi", $$LL_VERBOSE);
    if ($e = &msg_delete($msgi, $err_str)) {
        &my_log("unable to delete message \#$msgi: $err_str", $$LL_ERROR);
    } else {
        &my_log("$test_string DELETED message \#$msgi\n", $$LL_NORMAL);
    }
}

#
# Delete a given message, identified by its POP3 number.
# Usage:
#
#         $errno = msg_delete($msg_pop3_number, $err_str);
#
sub msg_delete {
    my $msgi = $_[0];
    my $e;

    $e = &pop3_send_recv_and_ctrl("DELE $msgi", "+OK", $_[2]);

    return 0;
}

#
# Delete or leave a given message, identified by its POP3 number.
# This is the heart of this script
# the decision to either delete a message on the POP3 server, or leave it
# for download by a email client is made here
# Usage:
#
#        &msg_delete_or_leave($msg_pop3_number);
#
sub msg_delete_or_leave {
    my $msgi = $_[0];

    my $e;
    my $cont;
    my $line1;
    my $err_str;
    my $l;
    my ($fld_from, $fld_sender, $fld_return_path, $fld_subject, $fld_date);
    my $string;
    my $bad_msg_flag;
    my $suspect_msg_flag;
    my $command_section;
    my $command_inst;
    my $command_flag;
    my $command_type;
    my $command_data;

    $cont = 1;
    $line1 = "";
    $fld_from = "";
    $fld_return_path = "";
    $fld_sender = "";
    $fld_subject = "";
    $fld_date = "";
    $bad_msg_flag = 0;  # if set message is bad
    $suspect_msg_flag = 1; # until cleared message is suspicious

    if ($e = &pop3_send_recv_and_ctrl("TOP $msgi 1", "+OK", $err_str)) {
        &my_log("unable to read message \#$msgi: $err_str", $$LL_ERROR);
        return;
    }

    &my_log("Start to parse header...", $$LL_VERBOSE);
      # The following loop analyzes a given line only after concatenation (lines
      # beginning with a space (or tab) character are merged with the
      # preceeding line).
    while ($cont) {
        if ($e = &sock_recv($l, $err_str)) {
             &my_log($err_str, $$LL_ERROR);
             $cont = 0;
        } else {
               # stop once the header has been read, indicated by CRLF CRLF
             $cont = 0 if $l =~ m/^\.$/;
               # Remove trailing spaces
             $l =~ s/\s+$//;
             if ((!$cont || $l =~ m/^\S/ || $l eq "") && $line1 ne "") {
                   # Identify fields
                 $fld_from = $1 if $line1 =~ m/^From:\s+(.*)$/i;
                 $fld_return_path = $1 if $line1 =~ m/^Return-path:\s+(.*)$/i;
                 $fld_sender = $1 if $line1 =~ m/^Sender:\s+(.*)$/i;
                 $fld_subject = $1 if $line1 =~ m/^Subject:\s+(.*)$/i;
                 $fld_date = $1 if $line1 =~ m/^Date:\s+(.*)$/i;
                 $line1 = "";
             }
               # Replace leading space or tab sequences with a single space character
             $l =~ s/^\s+/ /;
             $line1 .= $l;
         }
    }

    &my_log("Parsed Header", $$LL_VERBOSE);

    &my_log("Subject Line: ".$fld_subject,$$LL_VERBOSE);
    &my_log("From: ".$fld_from,$$LL_VERBOSE);
    &my_log("Return path: ".$fld_return_path,$$LL_VERBOSE);
    &my_log("Sender: ".$fld_sender,$$LL_VERBOSE);

    &my_log("Testing message \#$msgi", $$LL_NORMAL);

# Test here if message is p3sc command message. NOTE This is not completed.
# This is where it would be implemented. JTP 20060531
    &my_log("Testing for p3sc command message",$$LL_VERBOSE);
    if ($fld_subject =~ m/\Q$command_subject\E/i) {
        &my_log("Found command message from: ".$fld_from,$$LL_VERBOSE);

# split off the command part of the subject line based on the :: delimiter
# Instruction (add, delete),
# Flag (good,bad),
# Type (domain, subject, email),
# Data (xxx@xxxx.xxx)

#        $command_section = split $command_subject on ::

#        $command_instr first upto ,
#        $command_flag second upto ,
#        $command_type third upto ,
#        #command_data last

#       if add {
#            write data into new type within filter file

#       if delete {
#            remove data from type within filter file


#       mark message as good, carry onto next message
        $suspect_msg_flag = 0;
        $bad_msg_flag = 0;
        &my_log("Command Message Found: ".$fld_subject,$$LL_NORMAL);
        next;

    }

    if (!$bad_msg_flag) {
        &my_log("Testing Bad domain names:",$$LL_VERBOSE);
        foreach $addr_data (@{$filter_rule_data->{"domain-names"}}) {
            foreach $a (@{$addr_data->{"domain-name"}}) {
                if ($a->{"accept"} eq "yes") {
                    next;
                }
                $string = $a->{"content"};
                &my_log("$string",$$LL_DEBUG);
                if ($fld_from =~ m/\Q$string\E/i) {
                    $bad_msg_flag = 1;
                    &my_log("Bad Domain Found: ".$fld_from,$$LL_NORMAL);
                    last;
                }
            }
        }
    }

    $string = "";

    if (!$bad_msg_flag) {
        &my_log("Testing Bad subject lines:",$$LL_VERBOSE);
        foreach $addr_data (@{$filter_rule_data->{"subject-lines"}}) {
            foreach $a (@{$addr_data->{"subject-line"}}) {
                if ($a->{"accept"} eq "yes") {
                    next;
                }
                $string = $a->{"content"};
                &my_log("$string",$$LL_DEBUG);
                if ($fld_subject =~ m/\Q$string\E/i) {
                    $bad_msg_flag = 1;
                    &my_log("Bad Subject Line Found:  ".$fld_subject,$$LL_NORMAL);
                    last;
                }
            }
        }
    }

    $string = "";

    if (!$bad_msg_flag) {
        &my_log("Testing Bad email addresses:",$$LL_VERBOSE);
        foreach $addr_data (@{$filter_rule_data->{"email-addresses"}}) {
            foreach $a (@{$addr_data->{"email-address"}}) {
                if ($a->{"accept"} eq "yes") {
                    next;
                }
                $string = $a->{"content"};
                &my_log("$string",$$LL_DEBUG);
                if ($fld_from =~ m/\Q$string\E/i) {
                    $bad_msg_flag = 1;
                    &my_log("Bad email Address Found:  ".$fld_from,$$LL_NORMAL);
                    last;
                }
            }
        }
    }

    $string = "";

    if (!$bad_msg_flag && $suspect_msg_flag ) {
        &my_log("Testing Good subject lines:",$$LL_VERBOSE);
        foreach $addr_data (@{$filter_rule_data->{"subject-lines"}}) {
            foreach $a (@{$addr_data->{"subject-line"}}) {
                if ($a->{"accept"} eq "no") {
                    next;
                }
                $string = $a->{"content"};
                &my_log("$string",$$LL_DEBUG);
                if ($fld_subject =~ m/\Q$string\E/i) {
                    $suspect_msg_flag = 0;
                    $bad_msg_flag = 0;
                    &my_log("GOOD Subject Line Found: ".$fld_subject,$$LL_NORMAL);
                    last;
                }
            }
        }
    }

    $string = "";

      # Message is still suspect here, unless $bad_msg_flag is set
    if (!$bad_msg_flag && $suspect_msg_flag ) {
        &my_log("Testing Good domain names:",$$LL_VERBOSE);
        foreach $addr_data (@{$filter_rule_data->{"domain-names"}}) {
            foreach $a (@{$addr_data->{"domain-name"}}) {
                if ($a->{"accept"} eq "no") {
                    next;
                }
                $string = $a->{"content"};
                &my_log("$string",$$LL_DEBUG);
                if ($fld_from =~ m/\Q$string\E/i) {
                    $suspect_msg_flag = 0;
                    $bad_msg_flag = 0;
                    &my_log("GOOD Domain Name Found: ".$fld_from,$$LL_NORMAL);
                    last;
                }
            }
        }
    }

    $string = "";

    if (!$bad_msg_flag && $suspect_msg_flag ) {
        &my_log("Testing Good email addresses:",$$LL_VERBOSE);
        foreach $addr_data (@{$filter_rule_data->{"email-addresses"}}) {
            foreach $a (@{$addr_data->{"email-address"}}) {
                if ($a->{"accept"} eq "no") {
                    next;
                }
                $string = $a->{"content"};
                &my_log("$string",$$LL_DEBUG);
                if ($fld_from =~ m/\Q$string\E/i) {
                    $suspect_msg_flag = 0;
                    $bad_msg_flag = 0;
                    &my_log("GOOD email Address Found: ".$fld_from,$$LL_NORMAL);
                    last;
                }
            }
        }
    }

#
# Call delete_selected_message with POP3 # of message to go
#
#      &delete_selected_message($msg_pop3_number);

    if ($bad_msg_flag || $suspect_msg_flag) {
        &my_log("Message \#$msgi is SPAM !!!", $$LL_NORMAL);
        &my_log("SPAM From: $fld_from and Subject: $fld_subject", $$LL_NORMAL);
        &my_log("Deleting message \#$msgi from server...\n",$$LL_VERBOSE);
        &delete_selected_message($msgi);
    } else {
        &my_log("Message \#$msgi is GOOD !!!\n", $$LL_NORMAL);
    }
}

#
# Connect to a POP3 server.
# Usage:
#
#    $errno = pop3_connect($remote, $port, $user_name, $user_password, $err_str);
#
# Return 0 if the connection was successful.
# Return a non-zero value if the connection failed.
# In that case, $err_str contains a description of the error.
#
sub pop3_connect {
    my $remote = $_[0];
    my $port = $_[1];
    my $uname = $_[2];
    my $upwd = $_[3];

    my $e;
    my $nb_messages;
    my $answer;
    my $is_ok = 0;

    &my_log("will attempt to connect to $remote:$port", $$LL_VERBOSE);
    return $e if $e = &tcp_connect($remote, $port, $_[4]);
    &my_log("Connected to $remote:$port", $$LL_NORMAL);
    $g_curconn_server = $remote;
    $g_curconn_port = $port;

    if (!($e = &pop3_recv_and_ctrl("+OK", $_[4]))) {
        if (!($e = &pop3_send_recv_and_ctrl("USER $uname", "+OK", $_[4]))) {
            if (!($e = &pop3_send_recv_and_ctrl("PASS $upwd", "+OK", $_[4], $answer, "PASS xxxxxxxx"))) {
                if (($nb_messages) = $answer =~ m/^\+OK\s+(\d+)/i) {
                    &my_log("Logged in\, $nb_messages message(s) on the server\n", $$LL_NORMAL);
                } else {
                    &my_log("Logged in\, unknown message count on the server\n", $$LL_NORMAL);
                }
                $is_ok = 1;
            }
        }
    }
    if (!$is_ok) {
        &tcp_close;
    }
    return $e;
}

#
# Close the current POP3 connection.
# Usage:
#
#        $errno = &pop3_disconnect($err_str);
#
# Return 0 if it is OK.
# Return a non-zero value otherwise, and write an error message in $err_str.
#
sub pop3_disconnect {
    my $e;

    &my_log("Logging out - purging deleted messages", $$LL_VERBOSE);
    $e = &pop3_send_recv_and_ctrl("QUIT", "+OK", $_[0]);
    &tcp_close;
    &my_log("Logged out - purged deleted messages", $$LL_NORMAL);

    return $e;
}

#
# Change the state of the variable $g_pop3_is_connected.
# Usage:
#
#        &pop3_switch_conn_state($new_state);
#
sub pop3_switch_conn_state {
    my $new_value = $_[0];
    my $new_state = " ";

    $g_pop3_is_connected = $new_value;
    $new_state = $new_value ? "connected to " : "disconnected from ";
    &my_log("Currently $new_state$g_curconn_server:$g_curconn_port", $$LL_VERBOSE);
}

#
# TO DO
# Clean up why 2 disconnected messages are displayed when in real mode but not
# in testing mode. In testing mode it seems to run this one once
#
#
# Connect (if not connected) or disconnect (if connected) to/from the POP3 server,
# using pop3_connect or pop3_disconnect procedure.
#
sub c_switch_pop3_connect {
    my $e;
    my $err_str;

    if (!$g_pop3_is_connected) {
        $g_pop3_server = "" if !defined($g_pop3_server);
        $g_pop3_port = 0 if !defined($g_pop3_port);
        $g_pop3_username = "" if !defined($g_pop3_username);
        $g_pop3_password = "" if !defined($g_pop3_password);

        if ($e = pop3_connect($g_pop3_server, $g_pop3_port, $g_pop3_username, $g_pop3_password, $err_str)) {
            &my_log($err_str, $$LL_ERROR);
            return;
        }
        &pop3_switch_conn_state(1);
    } else {
        if ($e = &pop3_disconnect($err_str)) {
            &my_log($err_str, $$LL_ERROR);

          # don't return here since we consider there is no POP3 connection any longer
        } else {
            &my_log("Disconnected from $g_curconn_server:$g_curconn_port\n", $$LL_NORMAL);
        }
        &pop3_switch_conn_state(0);
    }
}

#
# Receive a string from the server and control the server response.
# Usage:
#
#        $errno = &pop3_recv_and_ctrl($expected_answer, $err_str [, $answer]);
#
# Return 0 if it is OK.
# Return a non-zero value otherwise, and write an error message in $err_str.
# $answer is optional, if specified, it gives the string returned by the server.
#
sub pop3_recv_and_ctrl {
    my $expected_answer = $_[0];
    my $l;
    my $e;

    return $e if $e = &sock_recv($l, $_[1]);
    $_[2] = $l;
    return $e if $e = &pop3_ctrl($expected_answer, $l, $_[1]);
}

#
# Send a string, receive the answer and control whether the answer is correct.
# Usage:
#
#    $errno = &pop3_send_recv_and_ctrl($sent_str, $expected_answer,  \
#                                     $err_str [, $answer] [, $log_string]);
#
# Return 0 if it is OK.
# Return a non-zero value otherwise, and write an error message in $err_str.
#
sub pop3_send_recv_and_ctrl {
    my ($sent_str, $expected_answer) = @_;
    my $e;

    return $e if $e = &sock_send($sent_str, $_[2], $_[4]);
    return $e if $e = &pop3_recv_and_ctrl($expected_answer, $_[2], $_[3]);
}

#
# Control whether the response of the POP3 server is the one requested.
# Usage:
#
#        $errno = $pop3_ctrl($expected_answer, $answer, $err_str);
#
# Return 0 if it is OK.
# Return a non-zero value otherwise, and write an error message in $err_str.
#
sub pop3_ctrl {
    my ($expected_answer, $answer) = @_;

    if ($answer =~ m/^\Q$expected_answer\E/i) {
        return 0;
    } else {
        $_[2] = "Expected \"$expected_answer\" from remote but received \"$answer\"";
        return 1;
    }
}

#
# Close the current established TCP connection.
# Usage:
#
#        &tcp_close()
#
sub tcp_close {
    close SOCK;
}

#
# Connect to a remote host.
# Usage:
#
#    $errno = &tcp_connect($remote, $port, $err_str)
#
# Return 0 if connection succeeds.
# Return a non-zero value otherwise, and if so, $err_str is the error description.
#
sub tcp_connect {
    my ($remote, $port) = @_;

    my $iaddr;
    my $paddr;
    my $proto;
    my $oldfh;

    $iaddr = inet_aton($remote) or $_[2] = "no host: \"$remote\"", return 1;
    $paddr = sockaddr_in($port, $iaddr);

    $proto = getprotobyname('tcp');
    socket(SOCK, PF_INET, SOCK_STREAM, $proto) or $_[2] = "socket: $!", return 2;
    connect(SOCK, $paddr) or $_[2] = "connect: $!", return 3;

    $oldfh = select(SOCK); $| = 1; $/ = $$EOL; select($oldfh);

    $_[2] = "";
    return 0;
}

#
# Send a line to the SOCK fh. Do NOT include final newline sequence in the parameter.
# Usage:
#
#    $errno = &sock_send($l, $err_str [, $log_string]);
#
# If write is successful, return 0.
# If write fails, return a non-zero value and $err_str contains an error description.
#
sub sock_send {
    my $l = $_[0];

    print(SOCK "$l$$EOL") or $_[1] = $!, return 1;

    $l = $_[2] if defined($_[2]);
    &my_log("\>\>\> $l", $$LL_DEBUG);

    return 0;
}

#
# Receive a line from the SOCK fh.
# Strips final newline sequence from the return value.
# Usage:
#
#    $errno = &sock_recv($l, $err_str));
#
# If reading is successful, return 0 and $l contains the line.
# If reading fails, return a non-zero value
# and $err_str contains an error description.
#
sub sock_recv {
    my $l;

    $l = <SOCK>;
    defined($l) or $_[1] = $!, return 1;
    chomp $l;

    &my_log("\<\<\< $l", $$LL_DEBUG);

    $_[0] = $l;
    return 0;
}

#
# Read the configuration file and populate the variables from this
# Usage:
#
#    &read_config;
#
#
sub read_config {
    my $config_log_level;
    &my_log ("Reading configuration file: ".$$CONFIG_FILENAME, $$LL_VERBOSE);
    my $config_params = new XML::Simple (KeyAttr=>[], ForceArray=>1);

# read XML file
    $config_data = $config_params->XMLin($$CONFIG_FILENAME);

# parse for version # here to make sure no new tags have been added etc.
    my $v;
    $v = $config_data->{$$VERSIONTAG}[0];
    if ($v ne $$CFG_CUR_VERSION) {
        die "Configuration File version is not correct. Should be $$CFG_CUR_VERSION"
    } else {
        &my_log("Configuration File version ".$v." is OK", $$LL_VERBOSE);
    }

    $g_testing_mode = $config_data->{"testing-mode"}[0];
    $FILTER_RULES = $config_data->{"filter-rules"}[0];
    $MSG_DOWNLOAD_FILE = $config_data->{"message-out-file"}[0];
    $config_log_level = $config_data->{"log-level"}[0];
    $g_log_display = $config_data->{"log-display"}[0];
    $command_subject = $config_data->{"command-subject"}[0];

# Select the logging level from the log-level entry in configuration file
# If the configuration file has an error then the default of 3 (NORMAL) is used
    SWITCH: {
        if ($config_log_level =~ m/NONE/i) { $g_cur_log_level = $$LL_NONE }
        if ($config_log_level =~ m/ERROR/i) { $g_cur_log_level = $$LL_ERROR }
        if ($config_log_level =~ m/WARNING/i) { $g_cur_log_level = $$LL_WARNING }
        if ($config_log_level =~ m/NORMAL/i) { $g_cur_log_level = $$LL_NORMAL }
        if ($config_log_level =~ m/VERBOSE/i) { $g_cur_log_level = $$LL_VERBOSE }
        if ($config_log_level =~ m/DEBUG/i) { $g_cur_log_level = $$LL_DEBUG }
     }
}

#
# Read the Filter list and populate the Filter Array
# Usage:
#
#    &read_filter_list;
#
#
sub read_filter_list {
    &my_log ("Reading filter list file: ".$FILTER_RULES, $$LL_VERBOSE);
    my $filter_rules = new XML::Simple (KeyAttr=>[], ForceArray=>1);

# read XML file
    $filter_rule_data = $filter_rules->XMLin($FILTER_RULES);

# parse for version # here to make sure no new tags have been added etc.
    my $v;
    $v = $filter_rule_data->{$$VERSIONTAG}[0];
    if ($v ne $$FILT_CUR_VERSION) {
        die "Filter File version is not correct. Should be $$FILT_CUR_VERSION"
    } else {
        &my_log("Filter File version ".$v." is OK", $$LL_VERBOSE);
    }
}

#
# Log a line to screen based on the level of logging
# Usage:
#
#    &my_log($line, $log_level);
#
sub my_log {
    my ($l, $level) = @_;
    my $prefixe = "";
    my @when = localtime();
    my $header;

    if ($g_log_display) {
        $header = sprintf("%02d/%02d/%04d %02d:%02d:%02d",
        $when[3], $when[4] + 1, $when[5] + 1900, $when[2], $when[1], $when[0]);

        if ($level <= $g_cur_log_level) {
            $prefixe = "**  ERROR: " if $level == $$LL_ERROR;
            $prefixe = "**  WARNING: " if $level == $$LL_WARNING;
            print "$header $prefixe$l\n";
        }
    }

    if ($level <= $$LL_NORMAL) {
    }
}
#########################################################

The configuration file; p3sc.conf


<configuration-file>
<version>1.0</version>
<email-accounts>
<email-account server="pop3.example.com">
<port>110</port>
<login>login.id</login>
<password>somesecret</password>
</email-account>
<email-account server="pop3.another.example.org">
<port>110</port>
<login>another.id</login>
<password>secretsome</password>
</email-account>
</email-accounts>
<testing-mode>1</testing-mode>
<filter-rules>p3scfilter.xml</filter-rules>
<message-out-file>messages.txt</message-out-file>
<log-level>DEBUG</log-level>
<log-display>1</log-display>
<command-subject>p3sc command message</command-subject>
</configuration-file>

The following explains the options in the configuration file:

version: This is "hardcoded" into the script so that the script will only use the correct version of the configuration file

email-account  server: This is the host name of the POP3 server

port: specifies the TCP/IP port that the server is listening on. Default 110 (POP3)

login: The user id for this POP3 account

password: The password for this account. Note, as this is plain text it will be a very good idea to make sure that this file, once generated on your system, is restricted to user read only. chmod u=rw p3sc.conf

The email-account stanza can be repeated, within the email-accounts stanza, for as many email accounts you require.

testing-mode: If set to 1 the messages are left on the server, basically the script does everything it would do normally but does not log out of the server, it just closes the connection. According to the POP3 RFC this should leave all email on the server, even if it was marked for deletion. Once testing is completed set this to 0.

filter-rules: This specifies the name of the file to be used as the filter file. This file is expected to be in the same directory as the script itself.

message-out-file: This is not currently implemented, it will be used for the TO DO that will download good emails.

log-level: This is the verbosity of the information that will be output to the display. 6 Levels are supported each one giving more information than the previous.

  • NONE
  • ERROR
  • WARNING
  • NORMAL
  • VERBOSE
  • DEBUG

Common levels to use are NONE, NORMAL and DEBUG. While setting up and testing it is probably best to use the DEBUG level until it is confirmed that all is working correctly.

Because the logging is output to the screen when the script is run via the cron utility if you place the MAILTO= directive in the crontab file the log will be sent to the email address specified.

log-display: This is either 1 or 0. 1 means the log is displayed to screen. A 0 means it is not. A quick way to turn off logging in other words.

command-subject: This is the string of text that will be looked for to determine if the email is a command for the p3sc script to modify the filter file. This is not implemented as yet but having the string here means that you will be able to define your own personal string for this function.

The Filter file; p3scfilter.xml


<filter-rules>
<version>1.0</version>
<!-- We have issue when data is empty, therefore cannot have empty fields below -->
<email-addresses>
<email-address accept="yes">user@somewhere.net</email-address>
<email-address accept="yes">good.guy@wantedemails.co.uk</email-address>
<email-address accept="no">bad.dude@nowhere.com.au</email-address>
<email-address accept="no">ugly@email.site.co.tw</email-address>
</email-addresses>
<subject-lines>
<subject-line accept="yes">[lpi-discuss]</subject-line>
<subject-line accept="yes">[Familiar]</subject-line>
<subject-line accept="no">SomeUnwantedSubjectLineText</subject-line>
<subject-line accept="no">[BadSubject]</subject-line>
</subject-lines>
<domain-names>
<domain-name accept="yes">novell.com</domain-name>
<domain-name accept="yes">handhelds.org</domain-name>
<domain-name accept="no">sina.com.cn</domain-name>
<domain-name accept="no">go2.pl</domain-name>
</domain-names>
</filter-rules>


The following explains the options in the filter file:

version: This is "hardcoded" into the script so that the script will only use the correct version of the filter file

email-address  accept="yes": These entries are the email addresses of the senders that are considered acceptable. These emails will be left on the server to allow your email client to download. There probably should be a lot of these entries :-)

email-address  accept="no": These entries are the email addresses of the senders that are definitely SPAM. Probably not too many entries as explained below.

subject-line  accept="yes": These entries specify the contents of the subject lines that are acceptable. The entry here does not need to be the whole subject line. Good candidates for these are the mailing list header used by a lot of mailing lists in the subject line.

subject-line  accept="no": Again these entries are the unacceptable subject lines.

domain-name  accept="yes": These specify the domains from which emails are acceptable. Only the portion after the @ is needed, as per the examples above.

domain-name  accept="no": Domains that are not acceptable.

Remember that the unacceptable entries are not really needed as all emails that are not explicity accepted are considered SPAM emails and will be deleted. Specifying entries with the accept="no" option just means the scan process works faster as once an entry is found the processing loop stops and restarts on the next email.