Regular expressions question!  
Author Message
Rio





PostPosted: 2004-10-27 18:00:00 Top

java-programmer, Regular expressions question! I'm trying to parse HTML tables, so I need to find following pieces of text!

<table .....anything between this ..... /table> and

<tr ........ /tr> and

of course

<td> 'part that I need' </td>


I do not know how to put that into regular expression!

Thanks!


 
Rio





PostPosted: 2004-10-27 23:38:00 Top

java-programmer >> Regular expressions question! I've beend doing some reading on regular expressions and in my understanding
this regular expression should work

<table.*table>

as it means find literal <table followed by any charachter not at all or as
many times, followed by litera table>

but it doesn't, anybody can offer some help?




"Rio" <email***@***.com> wrote in message
news:clnrjl$qe2$email***@***.com...
> I'm trying to parse HTML tables, so I need to find following pieces of
text!
>
> <table .....anything between this ..... /table> and
>
> <tr ........ /tr> and
>
> of course
>
> <td> 'part that I need' </td>
>
>
> I do not know how to put that into regular expression!
>
> Thanks!
>
>


 
Rio





PostPosted: 2004-10-27 23:40:00 Top

java-programmer >> Regular expressions question! To make matters worse I found an online applet that can be used to check
your regular expressions and the above example works there but not in my
programm. Irritating.

"Rio" <email***@***.com> wrote in message
news:clofdi$qqi$email***@***.com...
> I've beend doing some reading on regular expressions and in my
understanding
> this regular expression should work
>
> <table.*table>
>
> as it means find literal <table followed by any charachter not at all or
as
> many times, followed by litera table>
>
> but it doesn't, anybody can offer some help?
>
>
>
>
> "Rio" <email***@***.com> wrote in message
> news:clnrjl$qe2$email***@***.com...
> > I'm trying to parse HTML tables, so I need to find following pieces of
> text!
> >
> > <table .....anything between this ..... /table> and
> >
> > <tr ........ /tr> and
> >
> > of course
> >
> > <td> 'part that I need' </td>
> >
> >
> > I do not know how to put that into regular expression!
> >
> > Thanks!
> >
> >
>
>


 
 
Paul Lutus





PostPosted: 2004-10-28 4:37:00 Top

java-programmer >> Regular expressions question! Rio wrote:

> I've beend doing some reading on regular expressions and in my
> understanding this regular expression should work
>
> <table.*table>

In what code, using what classes, using what methods?

>
> as it means find literal <table followed by any charachter not at all or
> as many times, followed by litera table>
>
> but it doesn't,

Define "It doesn't." Leave nothing to the imagination. Post your code, post
your regular expression, post the example text, post the result you got
when you ran the program agast the text, everythng.

> anybody can offer some help?

You first.

--
Paul Lutus
http://www.arachnoid.com

 
 
Paul Lutus





PostPosted: 2004-10-28 4:38:00 Top

java-programmer >> Regular expressions question! Rio wrote:

> To make matters worse I found an online applet that can be used to check
> your regular expressions and the above example works there but not in my
> programm.

You mean the program you didn't post?

> Irritating.

I'll say.

--
Paul Lutus
http://www.arachnoid.com

 
 
Rio





PostPosted: 2004-10-28 16:36:00 Top

java-programmer >> Regular expressions question! Ok, first pardon my misconduct on this group as I'm not the regular user and
I don't know the exact code of conduct!

The problem was solved in the mean time what I am trying to do is parse HTML
in such a way that would extract data from each individual table cell on a
particular web page along with data containing cell number, row number and
table number.

Below is Tablica class that represents a class for holding methods capable
of performing the task. The first method called is pronadjiTablice (meaning
findTables) and is fed by stringBuffer ulaznaStranica (meaning incomingPage
produced by BufferedReader.readline and StringBuffer.append methods).

Now the regular expression I was looking for was:
<table.*?table>
as I needed that reluctant quanitfier *? instead of just * because
<table.*?table> would eat the whole String and find only one match on a page
that contained mulitple tables!


Here's the code:

package Paket;

import java.util.regex.*;

public class Tablica {

static int brojTablice=0;
static int brojReda;
static int brojCelije;


public Tablica() {
}

public static void pronadjiTablice(StringBuffer ulaznaStranica) {

Pattern uzorakTablice = Pattern.compile("<table.*?table>");
Matcher mt=uzorakTablice.matcher(ulaznaStranica);


while (mt.find()){
brojTablice++;
brojReda=0;

pronadjiRedove(ulaznaStranica.substring(mt.start(), mt.end()));


}

}

public static void pronadjiRedove(String ulaznaTablica) {

Pattern uzorakReda = Pattern.compile("<tr.*?tr>");
Matcher mr=uzorakReda.matcher(ulaznaTablica);



while (mr.find()){

brojReda++;
brojCelije=0;


pronadjiCelije(ulaznaTablica.substring(mr.start(), mr.end()));

}

}

public static void pronadjiCelije(String ulazniRed) {

Pattern uzorakCelije = Pattern.compile("<td.*?td>");
Matcher mc=uzorakCelije.matcher(ulazniRed);



while (mc.find()){

brojCelije++;

System.out.println("Tab "+brojTablice+" "+"Red "+
+brojReda+" "+"Celija "+brojCelije+"
"+ulazniRed.substring(mc.start(), mc.end()));




}

}
}




>
> Define "It doesn't." Leave nothing to the imagination. Post your code,
post
> your regular expression, post the example text, post the result you got
> when you ran the program agast the text, everythng.
>
> > anybody can offer some help?
>
> You first.
>
> --
> Paul Lutus
> http://www.arachnoid.com
>