Page 1 of 3
Introduction to regex filters
Posted: Sat Apr 10, 2010 4:55 am
by stan_qaz
I'm far from an expert on regular expressions or using them in MW filters but this should be enough to get you started.
The regex engine that MW is using can be found here along with the documentation and a testing/learning utility:
http://www.regexlab.com/en/deelx/
This is the main Syntax page that links to the details of each expression type:
http://www.regexlab.com/en/deelx/syntax.htm
There is an introductory text here:
http://www.regexlab.com/en/regref.htm
Here's a link to the older copy of MTracer, a regular expression testing tool. This is the version not getting picked up by AVG
http://s3.amazonaws.com/Firetrust/MTracer.zip
The main thing to keep in mind is that in regex mode some characters are used as commands and not as text to be matched, for example "|" is treated as the "OR" command and to actually match a "|" found in your text it must be escaped with a "\" to be treated as text so it becomes "\|" in your filter rule.
You may notice that regex code is not very easy to read if you come back to it after a few days, it is much easier to write than it is to read and understand just what it is doing. I work around this by keeping a text file with my regex filter rules along with enough comments on what I was thinking when I wrote it so that I don't have to puzzle out what the regex code is actually doing.
There are a few examples of working regex filters in these posts if you want to see what they look like before you dive in:
http://forum.firetrust.com/viewtopic.php?p=24888#p24888
http://forum.firetrust.com/viewtopic.php?p=24989#p24989
http://forum.firetrust.com/viewtopic.php?p=24991#p24991 (not perfect but a good complex example)
http://forum.firetrust.com/viewtopic.php?p=25105#p25105
Reading the whole topic for regex related posts will also give you some additional hints and tips:
http://forum.firetrust.com/viewtopic.php?f=48&t=5575
The most important may be that the case sensitivity of the default regex engine has been changed for the MW engine:
http://forum.firetrust.com/viewtopic.php?p=25117#p25117
Edit: Added link to introduction.
Re: Introduction to regex filters
Posted: Wed Aug 18, 2010 8:50 am
by Wizcrafts
Stan;
I am looking in vain for a place within the MWP filters section to test RegExpr filter rules, like exists in v6.5.4. Do you know if this built-in test input field exists at this time in v 2010.1.0.10?
If there is no test box in this version, where does one go to test RegExpr as they pertain to this incarnation of the program?
Thanks
Re: Introduction to regex filters
Posted: Wed Aug 18, 2010 9:04 am
by stan_qaz
You can go here:
http://www.regexlab.com/en/mtracer/ but the latest version is kicking off an AV alert and getting sent to the bin on my system.
Edit: Sent them a note about the AV issue.
If you have issues with it e-mail me ( stan at stanmiller dot info) and I'll mail you the older 2.1 version from 06-2009 that is working for me.
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 3:58 am
by Nik
Hi stan,
as you say, MTracer has a virus... Is it possible to get the old version?
Anyway, I have one question about filters - it seems that filters don't support Unicode set. Specifically, I have made a filter:
'Subject' Contains Plain Text
[Samskrta]
Filter finds following subjects:
Re: [Samskrita] Kalpa Shastra
RE: [Samskrita]
...
but when there is a devanagari text in the subject, the message isn't filtered, for instance:
RE: [Samskrita] आन्तरराष्ट्रीय रामायण परिषत्
[Samskrita] Learning Saanskrit by fresh approach - Lesson 42 संस्कृतभाषायाः नूतनाध्ययनस्य द्विचत्वारिंशः (४२) पाठः ।
...
Today same problem happened with similar filter that searcher for "[Indo-Eurasia]" - it didn't recognise subject:
Re: [Indo-Eurasia] FW: Enc: Quadro chinês - ano 1085 !!!
because of the circumflex "e" character.
Is there any solution to this problem?
Regards
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 4:17 am
by rusticdog
stan_qaz wrote:You can go here:
http://www.regexlab.com/en/mtracer/ but the latest version is kicking off an AV alert and getting sent to the bin on my system.
Edit: Sent them a note about the AV issue.
If you have issues with it e-mail me ( stan at stanmiller dot info) and I'll mail you the older 2.1 version from 06-2009 that is working for me.
Email it to me Stan and I'll upload it to the site.
Nik, could you view one of these [Samskrta] emails in the Source tab, copy/paste the full text and email it to me at
[email protected]
I'm fairly sure that the Beta Testers have already mentioned this isn't right, but the extra reminder won't hurt and a working example we can tool around with will help too.
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 4:50 am
by Nik
rusticdog wrote:
Nik, could you view one of these [Samskrta] emails in the Source tab, copy/paste the full text and email it to me at
[email protected]
I'm fairly sure that the Beta Testers have already mentioned this isn't right, but the extra reminder won't hurt and a working example we can tool around with will help too.
I've sent you one message.
Messages are from open Google group:
http://groups.google.com/group/samskrita
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 7:10 am
by stan_qaz
Tracker e-mailed via Gmail.
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 7:20 am
by Wizcrafts
I have now successfully converted all of my published MailWasher Pro 6 filters in the new xml format. Even with Ira's conversion tool it took me 8 hours to get everything corrected, to prevent crashing the application. Apparently, the new version crashes if there are any serious code or syntax errors in a regular expression. The previous version just removed that particular filter from the set. Somebody needs to look into this problem.
I found among other things, that the new filter format does not allow you to use actual angle brackets or ampersands in the filter name or description. Ira's converter missed this and every filter that had either an angle bracket or & caused the program to crash, or the custom filter set was instantly replaced entirely with the default set of three filters.
I'm still deciding what to do to recoop my investment of time before I post the new format filters to my website.
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 7:22 am
by rusticdog
Is it the latest MTracer on the site that is picked up by AVG ?
I have an account with AVG that let's me upload files to be whitelisted, though I don't think they'd appreciate me uploading files from another company
But maybe I can just email the guy who set me up with the whitelist service
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 7:24 am
by rusticdog
Wizcrafts wrote:I'm still deciding what to do to recoop my investment of time before I post the new format filters to my website.
I asked The Big Cheese to assist with this, I'll remind him
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 8:21 am
by stan_qaz
The version shown here as an upgrade:

- regex-tracker-1.png (13.58 KiB) Viewed 57137 times
http://www.regexlab.com/en/mtracer/download.htm
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 5:17 pm
by rusticdog
Original post edited to include download link for older version MTracer that isn't picked up by AVG
http://s3.amazonaws.com/Firetrust/MTracer.zip
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 5:56 pm
by rusticdog
AVG replied and have removed MTracer, should be in next update in a few hours.
Re: Introduction to regex filters
Posted: Sat Aug 21, 2010 6:03 pm
by Nik
AV doesn't complain with this file. Thanks a lot.

Re: Introduction to regex filters
Posted: Wed Aug 25, 2010 8:32 pm
by rusticdog
Nik wrote:Anyway, I have one question about filters - it seems that filters don't support Unicode set. Specifically, I have made a filter:
'Subject' Contains Plain Text
[Samskrta]
Filter finds following subjects:
Re: [Samskrita] Kalpa Shastra
RE: [Samskrita]
...
but when there is a devanagari text in the subject, the message isn't filtered, for instance:
RE: [Samskrita] आन्तरराष्ट्रीय रामायण परिषत्
[Samskrita] Learning Saanskrit by fresh approach - Lesson 42 संस्कृतभाषायाः नूतनाध्ययनस्य द्विचत्वारिंशः (४२) पाठः ।
...
I've got 4 emails currently in my Inbox that shows this but it seems to be working so far, that said, I'm on a slightly different version than you.
There is a possible issue if the UTF8 encoding goes across multiple lines that breaks oddly, so I'll keep watching as they arrive.