Help with writing simple parser for HTML/XML  
Author Message
sal achhala





PostPosted: 2003-10-17 21:28:00 Top

java-programmer, Help with writing simple parser for HTML/XML I'm having trouble getting started with coding a simple parser and prototype
for my application.

The parser needs to parse HTML & XML pages to strip the tags and return just
the data. Ive read thru sevral java refeerence books but seem to be having
trouble getting started with the code.

I'd be gratefull for a push in the right direction.

thanks

sal

ps this is my final year University Computer Science project

for more details on the project
http://www.mellowmoose.org/project.html


 
Richard Reynolds





PostPosted: 2003-10-17 22:53:00 Top

java-programmer >> Help with writing simple parser for HTML/XML There's a parser generator called javacc (java compiler compiler), you give
it a gramar and it generates a parser for you in java. It used to have its
own newsgroup, probably still does. There are loads of free language
grammars available with it, I'm sure there's an html one there.

"sal achhala" <email***@***.com> wrote in message
news:bmoqoj$jf1$email***@***.com...
> I'm having trouble getting started with coding a simple parser and
prototype
> for my application.
>
> The parser needs to parse HTML & XML pages to strip the tags and return
just
> the data. Ive read thru sevral java refeerence books but seem to be having
> trouble getting started with the code.
>
> I'd be gratefull for a push in the right direction.
>
> thanks
>
> sal
>
> ps this is my final year University Computer Science project
>
> for more details on the project
> http://www.mellowmoose.org/project.html
>
>


 
Andy Fish





PostPosted: 2003-10-18 1:15:00 Top

java-programmer >> Help with writing simple parser for HTML/XML it depends why you are writing a parser.

if you just want to get some XML or HTML parsed, find existing tools to do
it; if you want to write a parser for the sake of writing a parser, read
books on parser design. XML and well-formed HTML are a doddle to parse with
any standard parsing technique

"sal achhala" <email***@***.com> wrote in message
news:bmoqoj$jf1$email***@***.com...
> I'm having trouble getting started with coding a simple parser and
prototype
> for my application.
>
> The parser needs to parse HTML & XML pages to strip the tags and return
just
> the data. Ive read thru sevral java refeerence books but seem to be having
> trouble getting started with the code.
>
> I'd be gratefull for a push in the right direction.
>
> thanks
>
> sal
>
> ps this is my final year University Computer Science project
>
> for more details on the project
> http://www.mellowmoose.org/project.html
>
>


 
 
Paul Lutus





PostPosted: 2003-10-18 2:39:00 Top

java-programmer >> Help with writing simple parser for HTML/XML sal achhala wrote:

> I'm having trouble getting started with coding a simple parser and
> prototype for my application.
>
> The parser needs to parse HTML & XML pages to strip the tags and return
> just the data.

This is not dificult. Let's say you have a String that contains an entire
HTML page. You need to strip all the tags and return only the data.
Example:

String page = "<html><title>My page!</title><body>This is the data on my
page.</body></html>";

page = page.replaceAll("</.*?>","\n");
page = page.replaceAll("<.*?>","");

System.out.println(page);

Result:

My page!
This is the data on my page.

> Ive read thru sevral java refeerence books but seem to be
> having trouble getting started with the code.

Maybe it's too soon for this project. You may have to start at the
beginning, rather than the middle.

--
Paul Lutus
http://www.arachnoid.com

 
 
sal achhala





PostPosted: 2003-10-19 2:11:00 Top

java-programmer >> Help with writing simple parser for HTML/XML Thanks for the info folks, Its been very helpful.

I would like to code my own parser for the sake of doing so, Im hoping I can
find a tool to convert HTML to XML which means I wont need to parse (often
ill formed) HTML.

I should have mentioned that Ive done two semesters worth of Java and also
other programmning languages, the prob i was having was that i havent
programmed in java for nearly 2 years hence out of touch.

BTW paul when i was learning HTML i used the HTML editor uyou designed :-)

once again thanks

sal



"Paul Lutus" <email***@***.com> wrote in message
news:email***@***.com...
> sal achhala wrote:
>
> > I'm having trouble getting started with coding a simple parser and
> > prototype for my application.
> >
> > The parser needs to parse HTML & XML pages to strip the tags and return
> > just the data.
>
> This is not dificult. Let's say you have a String that contains an entire
> HTML page. You need to strip all the tags and return only the data.
> Example:
>
> String page = "<html><title>My page!</title><body>This is the data on my
> page.</body></html>";
>
> page = page.replaceAll("</.*?>","\n");
> page = page.replaceAll("<.*?>","");
>
> System.out.println(page);
>
> Result:
>
> My page!
> This is the data on my page.
>
> > Ive read thru sevral java refeerence books but seem to be
> > having trouble getting started with the code.
>
> Maybe it's too soon for this project. You may have to start at the
> beginning, rather than the middle.
>
> --
> Paul Lutus
> http://www.arachnoid.com
>


 
 
sal achhala





PostPosted: 2003-10-19 2:11:00 Top

java-programmer >> Help with writing simple parser for HTML/XML Thanks for the info folks, Its been very helpful.

I would like to code my own parser for the sake of doing so, Im hoping I can
find a tool to convert HTML to XML which means I wont need to parse (often
ill formed) HTML.

I should have mentioned that Ive done two semesters worth of Java and also
other programmning languages, the prob i was having was that i havent
programmed in java for nearly 2 years hence out of touch.

BTW paul when i was learning HTML i used the HTML editor uyou designed :-)

once again thanks

sal



"Paul Lutus" <email***@***.com> wrote in message
news:email***@***.com...
> sal achhala wrote:
>
> > I'm having trouble getting started with coding a simple parser and
> > prototype for my application.
> >
> > The parser needs to parse HTML & XML pages to strip the tags and return
> > just the data.
>
> This is not dificult. Let's say you have a String that contains an entire
> HTML page. You need to strip all the tags and return only the data.
> Example:
>
> String page = "<html><title>My page!</title><body>This is the data on my
> page.</body></html>";
>
> page = page.replaceAll("</.*?>","\n");
> page = page.replaceAll("<.*?>","");
>
> System.out.println(page);
>
> Result:
>
> My page!
> This is the data on my page.
>
> > Ive read thru sevral java refeerence books but seem to be
> > having trouble getting started with the code.
>
> Maybe it's too soon for this project. You may have to start at the
> beginning, rather than the middle.
>
> --
> Paul Lutus
> http://www.arachnoid.com
>