Regex syntax  
Author Message
-





PostPosted: 2005-8-8 14:07:00 Top

java-programmer, Regex syntax I have managed to form the regex for the following two:

CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>

String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

LWS = [CRLF] 1*( SP | HT )

String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


However, the following stumped me for hours.

TEXT = <any OCTET except CTLs, but including LWS>


String TEXT_REGEX = ...... // help me out please.
 
-





PostPosted: 2005-8-8 14:30:00 Top

java-programmer >> Regex syntax - wrote:
> I have managed to form the regex for the following two:
>
> CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
>
> String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";
>
> LWS = [CRLF] 1*( SP | HT )
>
> String LWS_REGEX = "((\r\n)??( |\\x09)+?)";
>
>
> However, the following stumped me for hours.
>
> TEXT = <any OCTET except CTLs, but including LWS>
>
>
> String TEXT_REGEX = ...... // help me out please.

Kindly disregard.
 
Lasse Reichstein Nielsen





PostPosted: 2005-8-8 14:40:00 Top

java-programmer >> Regex syntax - <email***@***.com> writes:

> String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"

> LWS = [CRLF] 1*( SP | HT )

Im not absolutely sure how to read this notation, so I'm guessing
it means one carrige return/line feed pair followed by one or more
space/horizontal tab.

> String LWS_REGEX = "((\r\n)??( |\\x09)+?)";

Why two question marks? And the backlashes might want to be escaped
too. Look more like
"\\r\\n[\\x20\\x09]+"

(is it mail header format or something like that? :)

> However, the following stumped me for hours.
>
> TEXT = <any OCTET except CTLs, but including LWS>

LWS is not an octet, so how much do you want to match?

How about:
"[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"

/L
--
Lasse Reichstein Nielsen - email***@***.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
 
-





PostPosted: 2005-8-8 15:27:00 Top

java-programmer >> Regex syntax Lasse Reichstein Nielsen wrote:
> - <email***@***.com> writes:
>
>
>>String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";
>
>
> Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"
>
>
>>LWS = [CRLF] 1*( SP | HT )
>
>
> Im not absolutely sure how to read this notation, so I'm guessing
> it means one carrige return/line feed pair followed by one or more
> space/horizontal tab.
>
>
>>String LWS_REGEX = "((\r\n)??( |\\x09)+?)";
>
>
> Why two question marks? And the backlashes might want to be escaped
> too. Look more like
> "\\r\\n[\\x20\\x09]+"
>
> (is it mail header format or something like that? :)
>
>
>>However, the following stumped me for hours.
>>
>>TEXT = <any OCTET except CTLs, but including LWS>
>
>
> LWS is not an octet, so how much do you want to match?
>
> How about:
> "[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"
>
> /L\\

Thanks... One more qn:

token = 1*<any CHAR except CTLs>


As corrected, CTL is ([\\x00-\\x1f\\x7f])

CHAR = <any US-ASCII character (octets 0 - 127)>

So it's CHAR = "([\\x00-\\x7F])";

I tried

String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";

and then test for "\u007f".matches(regex) and it returns true which is
obviously wrong.
 
 
Lasse Reichstein Nielsen





PostPosted: 2005-8-9 7:41:00 Top

java-programmer >> Regex syntax - <email***@***.com> writes:

> I tried
>
> String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";

You are guessing blindly now. Good thing it didn't appear to work.
Do read up on the format of regular expressions before trying that
again :)
<URL:http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html>

CHAR except CTL would be the characters 0x20-0x7e, which is most easily
written directly:
"[\\x20-\\x7e]+"

> and then test for "\u007f".matches(regex) and it returns true which is
> obviously wrong.

It's what you asked for, although I'm surprised that it gave "true".
The string is not a valid Regular Expression (the first ")" is
unmatched, since the first one is inside a character group).

/L
--
Lasse Reichstein Nielsen - email***@***.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
 
Roedy Green





PostPosted: 2005-8-14 16:08:00 Top

java-programmer >> Regex syntax On Mon, 08 Aug 2005 14:07:02 +0800, - <email***@***.com> wrote or
quoted :

>I have managed to form the regex for the following two:

my regex cheat sheet might help you. See
http://mindprod.com/jgloss/regex.html