DOM Partial Document Parsing  
Author Message
auc





PostPosted: 2004-2-21 2:46:00 Top

java-programmer, DOM Partial Document Parsing I am trying to build a Java client which will read a never ending XML
data stream from a socket. Here is a simplified example of the XML
document:

<?xml version='1.0' encoding='us-ascii'?>
<NeverendingDataStream>
<Data>1.0</Data>
<Data>2.0</Data>

followed by a continuous stream of <Data> elements from a server
application that may not send the closing </NeverendingDataStream> tag
for several hours or days.

The DocumentBuilderFactory and DocumentBuilder parser for building a
DOM object tree fails because of a missing end tag.

Is there a way to force partial document parsing? I have turned
validating off but it continues to throw a fatal error.

Thanks,
Gary V
 
Alan





PostPosted: 2004-2-21 5:15:00 Top

java-programmer >> DOM Partial Document Parsing Well, no, there is no way to read a partial DOM object with DOM. But,
if you are talking streaming, you are talking SAX. This is exactly what
the SAX parser is for.

If you aren't familiar with SAX, the brief description from Sun is:

"The 'Simple API' for XML (SAX) is the event-driven, serial-access
mechanism that does element-by-element processing".

So, what does that mean? I think I should just point you to the right
place:

http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPSAX.html

All you do is just listen to your stream of XML data, and the SAX
processor will invoke callback methods as each
node/element/comment/attribute etc is detected.


I would also recommend reading the O'Reilly book
Java & XML, 2nd Edition
Solutions to Real-World Problems
By Brett McLaughlin

Hope that gets you on the way..

<briggs />


>
> Is there a way to force partial document parsing? I have turned
> validating off but it continues to throw a fatal error.

> Thanks,
> Gary V
 
Anton Spaans





PostPosted: 2004-2-21 5:20:00 Top

java-programmer >> DOM Partial Document Parsing
"Gary V" <email***@***.com> wrote in message
news:email***@***.com...
> I am trying to build a Java client which will read a never ending XML
> data stream from a socket. Here is a simplified example of the XML
> document:
>
> <?xml version='1.0' encoding='us-ascii'?>
> <NeverendingDataStream>
> <Data>1.0</Data>
> <Data>2.0</Data>
>
> followed by a continuous stream of <Data> elements from a server
> application that may not send the closing </NeverendingDataStream> tag
> for several hours or days.
>
> The DocumentBuilderFactory and DocumentBuilder parser for building a
> DOM object tree fails because of a missing end tag.
>
> Is there a way to force partial document parsing? I have turned
> validating off but it continues to throw a fatal error.
>
> Thanks,
> Gary V

If you use builder.parse(inputStream), where 'builder' is a DocumentBuilder
instance and 'inputStream' is obtained from socket.getInputStream(): This
does work, or not?

Then, the parse(inputStream) call won't return until all data has been
received. This means that the thread calling the parse(...) method will
'block' untill the whole document has been read.

Then you have a problem that the returned Document (returned by parse) is
not available until the end-tag has been received.

Therefore, you should use the saxParser =
javax.xml.parsers.SAXFactory.newInstance().newSAXParser() call to obtain a
SAXParser.

Then do saxParser.parse(inputStream, (DefaultHandler)dh) , where 'dh' is
your own implementation of the org.xml.sax.helpers.DefaultHandler interface.
This interface is called when elements become available, so you can track
the progress of the document building (and build the document in the mean
time) when its startElement(...) and endElement(...) methods are called.

-- Anton.







 
 
auc





PostPosted: 2004-2-21 12:44:00 Top

java-programmer >> DOM Partial Document Parsing "Anton Spaans" <aspaans at(noSPAM) smarttime dot(noSPAM) com> wrote in message news:<email***@***.com>...
> "Gary V" <email***@***.com> wrote in message
> news:email***@***.com...
> > I am trying to build a Java client which will read a never ending XML
> > data stream from a socket. Here is a simplified example of the XML
> > document:
> >
> > <?xml version='1.0' encoding='us-ascii'?>
> > <NeverendingDataStream>
> > <Data>1.0</Data>
> > <Data>2.0</Data>
> >
> > followed by a continuous stream of <Data> elements from a server
> > application that may not send the closing </NeverendingDataStream> tag
> > for several hours or days.
> >
> > The DocumentBuilderFactory and DocumentBuilder parser for building a
> > DOM object tree fails because of a missing end tag.
> >
> > Is there a way to force partial document parsing? I have turned
> > validating off but it continues to throw a fatal error.
> >
> > Thanks,
> > Gary V
>
> If you use builder.parse(inputStream), where 'builder' is a DocumentBuilder
> instance and 'inputStream' is obtained from socket.getInputStream(): This
> does work, or not?
>
> Then, the parse(inputStream) call won't return until all data has been
> received. This means that the thread calling the parse(...) method will
> 'block' untill the whole document has been read.
>
> Then you have a problem that the returned Document (returned by parse) is
> not available until the end-tag has been received.
>
> Therefore, you should use the saxParser =
> javax.xml.parsers.SAXFactory.newInstance().newSAXParser() call to obtain a
> SAXParser.
>
> Then do saxParser.parse(inputStream, (DefaultHandler)dh) , where 'dh' is
> your own implementation of the org.xml.sax.helpers.DefaultHandler interface.
> This interface is called when elements become available, so you can track
> the progress of the document building (and build the document in the mean
> time) when its startElement(...) and endElement(...) methods are called.
>
> -- Anton.

Thanks for the informative reply. I am using an InputStream and have actually
used the the SAX parser to parse the stream as it becomes available. I was
hoping to use DOM for parsing the XML stream into an object tree, but the more
I thought about it, I realize it just can't work on a continuous stream. I thought
there might be something I had overlooked.

Thanks again,
Gary V