Regex Question - Negative Search

Forum for MailWasher Pro 7 and/or older 2011/2012 versions.
User avatar
stan_qaz
Omniscient Kiwi
Location: Gilbert, Arizona
Posts: 8671
Joined: Fri Jul 25, 2008 5:13 am

Regex Question - Negative Search

Sun Aug 08, 2010 9:02 pm

I know you are not supposed to do regex to search for something NOT being in a string but I really want to anyway. :-)

What regex code would I use in a filter set like this: "To, Contains, Regex" when I want it to trigger on all e-mail's that don't have stan in the To field in some manner but do contain other information that requires the positive match in the filter settings?

I'll combine the negative condition with some other regex code that is working for me, the goal is to filter out mail coming to a good address that doesn't also have my name on the To line.

Stan <Stan@example.com> would be good mail.

Dufus <Stan@example.com> would be spam.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
allangjohnson
Travelling Tuatara
Posts: 48
Joined: Fri Jul 09, 2010 9:15 am

Re: Regex Question - Negative Search

Mon Aug 09, 2010 3:45 am

Stan - I don't know if this is what you mean, but for years I've used a set of protective filters near the top of my list, including one such as you describe. The other lists all the forms of address real people use in emailing me - for you that might be

dear stan|stan,|hi, stan|dear mr. gaz| etc.

Spammers just about never begin emails the way people do.

The filters following these I use to categorize spam. The last filter on the list is a "catch-all" which gets any email that is unprotected, addressed to my email address, and not caught by previous filters.

I almost never have a false positive on legitimate emails using this system.

Does that help?

AJ
User avatar
QuietOne
Omniscient Kiwi
Location: Texas, USA
Posts: 8337
Joined: Thu Jul 24, 2008 3:40 pm

Re: Regex Question - Negative Search

Mon Aug 09, 2010 4:23 am

Hey Stan,

It's gonna be a tricky regex to write but, here the basics:

Negated Regexes:
  • The ^ character is the CLASS Negation Operator IF AND ONLY IF, it is
    • Used IN a Class; and
    • Used 1st in that Class (i.e. immediately after the opening square bracket)
  • The CLASS operators are of course the Open "[" and Close "]" square brackets;
  • The ^ "Negates" the list of characters in the Class, meaning and this is critical "Match characters that are NOT listed in the class.";
  • So it's NOT a list of characters that you WANT to match;
  • It's a list of characters you DON'T want to match.
So essentially the character class you create in your regex is an Exclusion List. Hope this helps. :thumbsup :mrgreen:
Stealth is the Best Weapon.
15" MacBook Pro, OS X 10.14.6, 16GB RAM, Parallels install went kaput so no Windows version, Mozilla Thunderbird. All T'bird accnts: POP3. iPhone Mail Accounts IMAP. Unsubscrbd DNSBL.
And I'm: Just Another βeta-Tester.
User avatar
stan_qaz
Omniscient Kiwi
Location: Gilbert, Arizona
Posts: 8671
Joined: Fri Jul 25, 2008 5:13 am

Re: Regex Question - Negative Search

Mon Aug 09, 2010 4:58 am

allangjohnson, I've got a batch of filters similar to the ones you use but what I'm trying to do is to protect a couple e-mail addresses that are widely known and have been harvested by spammers. A vast majority of the spam to these addresses has either no name preceeding the e-mail address on the to line (easily filtered) or a wrong name. Filtering on the wrong name isn't practical as there are thousands of them. What I want to do is to tag the majority that do not include my first name.

QuietOne I looked at classes and I'd likely see decent accuracy from that but it would match stan, nast, tasn and so forth too, might be good enough for me if I limited it to only looking at the name portion of the address but I was really looking for a way to check for the absence of a specific string.

I've tried (s(?!tan)) that is almost there as it looks for an s not followed by a tan and I've tried turning that into something that looks at the whole to line with an entire header filter but haven't gotten it to work, something like (to: (?!stan)) which seems like a start but would need to also accept other leading stuff between the to and stan.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
User avatar
QuietOne
Omniscient Kiwi
Location: Texas, USA
Posts: 8337
Joined: Thu Jul 24, 2008 3:40 pm

Re: Regex Question - Negative Search

Mon Aug 09, 2010 5:27 am

stan_qaz wrote:QuietOne I looked at classes and I'd likely see decent accuracy from that but it would match stan, nast, tasn and so forth too, might be good enough for me if I limited it to only looking at the name portion of the address but I was really looking for a way to check for the absence of a specific string.

I've tried (s(?!tan)) that is almost there as it looks for an s not followed by a tan and I've tried turning that into something that looks at the whole to line with an entire header filter but haven't gotten it to work, something like (to: (?!stan)) which seems like a start but would need to also accept other leading stuff between the to and stan.
WRT your "(to: (?!stan))" idea ... You could always add a good old fashioned "." before the parenthesis couldn't you??? :devil Just kidding.

In any case, I think you're probably on the right track with the "(s(?!tan))" version or something similar. What you're trying to do is a real toughy! :scratch
Stealth is the Best Weapon.
15" MacBook Pro, OS X 10.14.6, 16GB RAM, Parallels install went kaput so no Windows version, Mozilla Thunderbird. All T'bird accnts: POP3. iPhone Mail Accounts IMAP. Unsubscrbd DNSBL.
And I'm: Just Another βeta-Tester.
allangjohnson
Travelling Tuatara
Posts: 48
Joined: Fri Jul 09, 2010 9:15 am

Re: Regex Question - Negative Search

Mon Aug 09, 2010 6:51 am

For that I use a plain text filter near the top to catch "From" fields that contain both my name AND email address, as in

John Smith
johnsmith@domain.com

Anything caught by that is protected. A filter downstream that catches emails using that address will also be catching emails that don't contain the name. So, the first catches the positive and the negatives are caught by the second as what's left over.

AJ
User avatar
stan_qaz
Omniscient Kiwi
Location: Gilbert, Arizona
Posts: 8671
Joined: Fri Jul 25, 2008 5:13 am

Re: Regex Question - Negative Search

Mon Aug 09, 2010 3:47 pm

I have similar filters for my from field, that won't work really well for my needs on the to field since I'm already pulling several groups based on conditions there and this is the last type of message that is not being filtered for.

Making it worse is that I have several addresses that are valid that have been harvested from friends that insist on using Windows and got virus infested, changing them will do no good until I get new friends or the old ones get smarter. Need a new mother too, she just borrowed my sister's computer to send a few e-mails and it harvested her address book and got the dedicated address I created for her to contact me with.

If all else fails I'll have to look at a positive method filter but it is going to have a bunch of rules and I'm not looking forward to that at all, my filters are as optimized as possible (while remaining maintainable) now as well as being written so the internal optimizer can process them into even more optimized form. The filters.xml file is up to about 50K now and has a significant impact on mail checking speed as well as reclassification after filter edits.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
User avatar
stan_qaz
Omniscient Kiwi
Location: Gilbert, Arizona
Posts: 8671
Joined: Fri Jul 25, 2008 5:13 am

Re: Regex Question - Negative Search

Mon Aug 09, 2010 5:37 pm

This post might be part of my problem!

I'm doing a simple filter, want to find all the messages with the To: "=?UTF-8" but setting the filter so that it reads To, Contains, Plain text, =?UTF-8 does not work.

Changing the filter so that it reads Entire header, Contains, Plain text, To: =?UTF-8 does work.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
User avatar
stan_qaz
Omniscient Kiwi
Location: Gilbert, Arizona
Posts: 8671
Joined: Fri Jul 25, 2008 5:13 am

Re: Regex Question - Negative Search

Mon Aug 09, 2010 6:04 pm

This appears to be working: Entire header, Contains, Regex

To:.*(\s(?!(stan)).*((<.+@example.info>)|(<myaccount@example.com>))

Broken out that comes to roughly this: (pay attention to the ( ) as they are grouping individual operations.

To: Looks for the To: in order to select the correct line, does not seem to be sensitive to the delivered-to line some servers add but if it is a problem you could specify a start of line

.* Zero or more characters.

(\s The space after the To:, the line above will give that back if it is needed to make the regex match or any other space on the line.

(?!(stan)) This is using the regex lookahead function in the negative mode, it is grouped with the line above so that it is looking to see that "stan" does not follow a space on the line ahead of the e-mail addresses.

.* Zero or more characters.

((<.+@example.info>) The angle bracket followed by at least one character and the @ and my domain name then the trailing angle bracket. This is intended to match any address at that domain.

| The logical OR operator.

(<myaccount@example.com>)) The angle bracket followed by my single address at a different domain name then the trailing angle bracket.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
User avatar
stan_qaz
Omniscient Kiwi
Location: Gilbert, Arizona
Posts: 8671
Joined: Fri Jul 25, 2008 5:13 am

Re: Regex Question - Negative Search

Tue Aug 10, 2010 8:05 pm

Well I gave this filter a good test, ran 10,000 messages through MW and found a very few that this got wrong. Looking at the messages it is catching all of the form:

Code: Select all

Stan <myaccount@example.com>
But it is failing to catch ones of the form: (added single or double quotes around the name portion of the line)

Code: Select all

"Stan" <myaccount@example.com>

'Stan' <myaccount@example.com>
I've tried several solutions to this, the one I'd thought would have worked was adding a "Custom Defined Character Set" replacing the \s after the first opening parenthesis. I thought that the single and double quotes were both legal there but they are not working. Neither are the hex \x22 and \x27 or the Unicode \u0022 and \u0027 working. Using the \s in a set alone does work as does using a space.

Code: Select all

To:.*(\s(?!(stan))

To:.*([\s'"](?!(stan))

To:.*([\s\x22\x27](?!(stan))

To:.*([\s\u0022\u0027](?!(stan))
So my question is what do I need to do to trigger a match on the two quote characters when they are found on the To: line in header contains regex mode?

Literal, hex and unicode matches do not seem to be working. I did a simple regex filter looking for the double-quote ["] in a set and it finds it in many locations but not on the To: line.

Are the two quotes when found on the To: line being altered from the message source and what is seen in the source tab prior to being fed to the filter?

Edit: Edit: I thought that this had started working for a few minutes but then realized I was looking at the wrong account, still missing the single and double quotes and failing the match.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.

Return to “MailWasher Pro 7”