I know you are not supposed to do regex to search for something NOT being in a string but I really want to anyway. :-)
What regex code would I use in a filter set like this: "To, Contains, Regex" when I want it to trigger on all e-mail's that don't have stan in the To field in some manner but do contain other information that requires the positive match in the filter settings?
I'll combine the negative condition with some other regex code that is working for me, the goal is to filter out mail coming to a good address that doesn't also have my name on the To line.
Stan <Stan@example.com> would be good mail.
Dufus <Stan@example.com> would be spam.
Regex Question - Negative Search
- stan_qaz
- Omniscient Kiwi
- Location: Gilbert, Arizona
Post
Regex Question - Negative Search
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
- allangjohnson
- Travelling Tuatara
Post
Re: Regex Question - Negative Search
Stan - I don't know if this is what you mean, but for years I've used a set of protective filters near the top of my list, including one such as you describe. The other lists all the forms of address real people use in emailing me - for you that might be
dear stan|stan,|hi, stan|dear mr. gaz| etc.
Spammers just about never begin emails the way people do.
The filters following these I use to categorize spam. The last filter on the list is a "catch-all" which gets any email that is unprotected, addressed to my email address, and not caught by previous filters.
I almost never have a false positive on legitimate emails using this system.
Does that help?
AJ
dear stan|stan,|hi, stan|dear mr. gaz| etc.
Spammers just about never begin emails the way people do.
The filters following these I use to categorize spam. The last filter on the list is a "catch-all" which gets any email that is unprotected, addressed to my email address, and not caught by previous filters.
I almost never have a false positive on legitimate emails using this system.
Does that help?
AJ
- QuietOne
- Omniscient Kiwi
- Location: Texas, USA
Post
Re: Regex Question - Negative Search
Hey Stan,
It's gonna be a tricky regex to write but, here the basics:
Negated Regexes:
It's gonna be a tricky regex to write but, here the basics:
Negated Regexes:
- The ^ character is the CLASS Negation Operator IF AND ONLY IF, it is
- Used IN a Class; and
- Used 1st in that Class (i.e. immediately after the opening square bracket)
- The CLASS operators are of course the Open "[" and Close "]" square brackets;
- The ^ "Negates" the list of characters in the Class, meaning and this is critical "Match characters that are NOT listed in the class.";
- So it's NOT a list of characters that you WANT to match;
- It's a list of characters you DON'T want to match.
Stealth is the Best Weapon.
15" MacBook Pro, OS X 10.14.6, 16GB RAM, Parallels install went kaput so no Windows version, Mozilla Thunderbird. All T'bird accnts: POP3. iPhone Mail Accounts IMAP. Unsubscrbd DNSBL.
And I'm: Just Another βeta-Tester.
15" MacBook Pro, OS X 10.14.6, 16GB RAM, Parallels install went kaput so no Windows version, Mozilla Thunderbird. All T'bird accnts: POP3. iPhone Mail Accounts IMAP. Unsubscrbd DNSBL.
And I'm: Just Another βeta-Tester.
- stan_qaz
- Omniscient Kiwi
- Location: Gilbert, Arizona
Post
Re: Regex Question - Negative Search
allangjohnson, I've got a batch of filters similar to the ones you use but what I'm trying to do is to protect a couple e-mail addresses that are widely known and have been harvested by spammers. A vast majority of the spam to these addresses has either no name preceeding the e-mail address on the to line (easily filtered) or a wrong name. Filtering on the wrong name isn't practical as there are thousands of them. What I want to do is to tag the majority that do not include my first name.
QuietOne I looked at classes and I'd likely see decent accuracy from that but it would match stan, nast, tasn and so forth too, might be good enough for me if I limited it to only looking at the name portion of the address but I was really looking for a way to check for the absence of a specific string.
I've tried (s(?!tan)) that is almost there as it looks for an s not followed by a tan and I've tried turning that into something that looks at the whole to line with an entire header filter but haven't gotten it to work, something like (to: (?!stan)) which seems like a start but would need to also accept other leading stuff between the to and stan.
QuietOne I looked at classes and I'd likely see decent accuracy from that but it would match stan, nast, tasn and so forth too, might be good enough for me if I limited it to only looking at the name portion of the address but I was really looking for a way to check for the absence of a specific string.
I've tried (s(?!tan)) that is almost there as it looks for an s not followed by a tan and I've tried turning that into something that looks at the whole to line with an entire header filter but haven't gotten it to work, something like (to: (?!stan)) which seems like a start but would need to also accept other leading stuff between the to and stan.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
- QuietOne
- Omniscient Kiwi
- Location: Texas, USA
Post
In any case, I think you're probably on the right track with the "(s(?!tan))" version or something similar. What you're trying to do is a real toughy!
Re: Regex Question - Negative Search
WRT your "(to: (?!stan))" idea ... You could always add a good old fashioned "." before the parenthesis couldn't you??? Just kidding.stan_qaz wrote:QuietOne I looked at classes and I'd likely see decent accuracy from that but it would match stan, nast, tasn and so forth too, might be good enough for me if I limited it to only looking at the name portion of the address but I was really looking for a way to check for the absence of a specific string.
I've tried (s(?!tan)) that is almost there as it looks for an s not followed by a tan and I've tried turning that into something that looks at the whole to line with an entire header filter but haven't gotten it to work, something like (to: (?!stan)) which seems like a start but would need to also accept other leading stuff between the to and stan.
In any case, I think you're probably on the right track with the "(s(?!tan))" version or something similar. What you're trying to do is a real toughy!
Stealth is the Best Weapon.
15" MacBook Pro, OS X 10.14.6, 16GB RAM, Parallels install went kaput so no Windows version, Mozilla Thunderbird. All T'bird accnts: POP3. iPhone Mail Accounts IMAP. Unsubscrbd DNSBL.
And I'm: Just Another βeta-Tester.
15" MacBook Pro, OS X 10.14.6, 16GB RAM, Parallels install went kaput so no Windows version, Mozilla Thunderbird. All T'bird accnts: POP3. iPhone Mail Accounts IMAP. Unsubscrbd DNSBL.
And I'm: Just Another βeta-Tester.
- allangjohnson
- Travelling Tuatara
Post
Re: Regex Question - Negative Search
For that I use a plain text filter near the top to catch "From" fields that contain both my name AND email address, as in
John Smith
johnsmith@domain.com
Anything caught by that is protected. A filter downstream that catches emails using that address will also be catching emails that don't contain the name. So, the first catches the positive and the negatives are caught by the second as what's left over.
AJ
John Smith
johnsmith@domain.com
Anything caught by that is protected. A filter downstream that catches emails using that address will also be catching emails that don't contain the name. So, the first catches the positive and the negatives are caught by the second as what's left over.
AJ
- stan_qaz
- Omniscient Kiwi
- Location: Gilbert, Arizona
Post
Re: Regex Question - Negative Search
I have similar filters for my from field, that won't work really well for my needs on the to field since I'm already pulling several groups based on conditions there and this is the last type of message that is not being filtered for.
Making it worse is that I have several addresses that are valid that have been harvested from friends that insist on using Windows and got virus infested, changing them will do no good until I get new friends or the old ones get smarter. Need a new mother too, she just borrowed my sister's computer to send a few e-mails and it harvested her address book and got the dedicated address I created for her to contact me with.
If all else fails I'll have to look at a positive method filter but it is going to have a bunch of rules and I'm not looking forward to that at all, my filters are as optimized as possible (while remaining maintainable) now as well as being written so the internal optimizer can process them into even more optimized form. The filters.xml file is up to about 50K now and has a significant impact on mail checking speed as well as reclassification after filter edits.
Making it worse is that I have several addresses that are valid that have been harvested from friends that insist on using Windows and got virus infested, changing them will do no good until I get new friends or the old ones get smarter. Need a new mother too, she just borrowed my sister's computer to send a few e-mails and it harvested her address book and got the dedicated address I created for her to contact me with.
If all else fails I'll have to look at a positive method filter but it is going to have a bunch of rules and I'm not looking forward to that at all, my filters are as optimized as possible (while remaining maintainable) now as well as being written so the internal optimizer can process them into even more optimized form. The filters.xml file is up to about 50K now and has a significant impact on mail checking speed as well as reclassification after filter edits.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
- stan_qaz
- Omniscient Kiwi
- Location: Gilbert, Arizona
Post
Re: Regex Question - Negative Search
This post might be part of my problem!
I'm doing a simple filter, want to find all the messages with the To: "=?UTF-8" but setting the filter so that it reads To, Contains, Plain text, =?UTF-8 does not work.
Changing the filter so that it reads Entire header, Contains, Plain text, To: =?UTF-8 does work.
I'm doing a simple filter, want to find all the messages with the To: "=?UTF-8" but setting the filter so that it reads To, Contains, Plain text, =?UTF-8 does not work.
Changing the filter so that it reads Entire header, Contains, Plain text, To: =?UTF-8 does work.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
- stan_qaz
- Omniscient Kiwi
- Location: Gilbert, Arizona
Post
Re: Regex Question - Negative Search
This appears to be working: Entire header, Contains, Regex
To:.*(\s(?!(stan)).*((<.+@example.info>)|(<myaccount@example.com>))
Broken out that comes to roughly this: (pay attention to the ( ) as they are grouping individual operations.
To: Looks for the To: in order to select the correct line, does not seem to be sensitive to the delivered-to line some servers add but if it is a problem you could specify a start of line
.* Zero or more characters.
(\s The space after the To:, the line above will give that back if it is needed to make the regex match or any other space on the line.
(?!(stan)) This is using the regex lookahead function in the negative mode, it is grouped with the line above so that it is looking to see that "stan" does not follow a space on the line ahead of the e-mail addresses.
.* Zero or more characters.
((<.+@example.info>) The angle bracket followed by at least one character and the @ and my domain name then the trailing angle bracket. This is intended to match any address at that domain.
| The logical OR operator.
(<myaccount@example.com>)) The angle bracket followed by my single address at a different domain name then the trailing angle bracket.
To:.*(\s(?!(stan)).*((<.+@example.info>)|(<myaccount@example.com>))
Broken out that comes to roughly this: (pay attention to the ( ) as they are grouping individual operations.
To: Looks for the To: in order to select the correct line, does not seem to be sensitive to the delivered-to line some servers add but if it is a problem you could specify a start of line
.* Zero or more characters.
(\s The space after the To:, the line above will give that back if it is needed to make the regex match or any other space on the line.
(?!(stan)) This is using the regex lookahead function in the negative mode, it is grouped with the line above so that it is looking to see that "stan" does not follow a space on the line ahead of the e-mail addresses.
.* Zero or more characters.
((<.+@example.info>) The angle bracket followed by at least one character and the @ and my domain name then the trailing angle bracket. This is intended to match any address at that domain.
| The logical OR operator.
(<myaccount@example.com>)) The angle bracket followed by my single address at a different domain name then the trailing angle bracket.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
- stan_qaz
- Omniscient Kiwi
- Location: Gilbert, Arizona
Post
Re: Regex Question - Negative Search
Well I gave this filter a good test, ran 10,000 messages through MW and found a very few that this got wrong. Looking at the messages it is catching all of the form:
But it is failing to catch ones of the form: (added single or double quotes around the name portion of the line)
I've tried several solutions to this, the one I'd thought would have worked was adding a "Custom Defined Character Set" replacing the \s after the first opening parenthesis. I thought that the single and double quotes were both legal there but they are not working. Neither are the hex \x22 and \x27 or the Unicode \u0022 and \u0027 working. Using the \s in a set alone does work as does using a space.
So my question is what do I need to do to trigger a match on the two quote characters when they are found on the To: line in header contains regex mode?
Literal, hex and unicode matches do not seem to be working. I did a simple regex filter looking for the double-quote ["] in a set and it finds it in many locations but not on the To: line.
Are the two quotes when found on the To: line being altered from the message source and what is seen in the source tab prior to being fed to the filter?
Edit: Edit: I thought that this had started working for a few minutes but then realized I was looking at the wrong account, still missing the single and double quotes and failing the match.
Code: Select all
Stan <myaccount@example.com>
Code: Select all
"Stan" <myaccount@example.com>
'Stan' <myaccount@example.com>
Code: Select all
To:.*(\s(?!(stan))
To:.*([\s'"](?!(stan))
To:.*([\s\x22\x27](?!(stan))
To:.*([\s\u0022\u0027](?!(stan))
Literal, hex and unicode matches do not seem to be working. I did a simple regex filter looking for the double-quote ["] in a set and it finds it in many locations but not on the To: line.
Are the two quotes when found on the To: line being altered from the message source and what is seen in the source tab prior to being fed to the filter?
Edit: Edit: I thought that this had started working for a few minutes but then realized I was looking at the wrong account, still missing the single and double quotes and failing the match.
I am not a Firetrust employee just a MW user.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.
--
First rule of computer consulting: Sell a customer a Linux computer and you'll eat for a day,
sell a customer a Windows computer and you'll eat for a lifetime.