Transmitting strings via tcp from a windows c++ client to a Java server  
Author Message
qqq111





PostPosted: 2006-2-20 4:02:00 Top

java-programmer, Transmitting strings via tcp from a windows c++ client to a Java server Hi all,

We have a C++ client which runs on Windows and that needs to transmit
char* / wchar* strings to and from a Java server.

The client should correctly handle both 'standard' languages & east
Asian
languages (i.e. using wchar).

Now, I'm sure there is a best practice for doing so , I just haven't
found it yet :-)

My best bet would be always encoding the string in UTF-8 before
sending
it via the net, but I could be wrong.

Your help will be highly appreciated.

Thanks,

Gilad

 
Roedy Green





PostPosted: 2006-2-20 4:28:00 Top

java-programmer >> Transmitting strings via tcp from a windows c++ client to a Java server On 19 Feb 2006 12:02:11 -0800, "qqq111" <email***@***.com> wrote,
quoted or indirectly quoted someone who said :

> Now, I'm sure there is a best practice for doing so , I just haven't
> found it yet :-)

How about UTF-8 encoding? It handles all the 16 bit chars. It is
reasonable efficient for American English using just 8-bit chars. It
does not have an endian ambiguity.

HTTP has heard of it and it tend to be an accepted encoding.

You could use a 1 byte length byte giving either char or bytes
insides Or you could use a Java-style big endian length field
compatible with DataInputStream.readUTF

see http://mindprod.com/jgloss/utf.html
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
 
Chris Uppal





PostPosted: 2006-2-20 20:11:00 Top

java-programmer >> Transmitting strings via tcp from a windows c++ client to a Java server qqq111 wrote:

> We have a C++ client which runs on Windows and that needs to transmit
> char* / wchar* strings to and from a Java server.
>
> The client should correctly handle both 'standard' languages & east
> Asian
> languages (i.e. using wchar).

The obvious options are:

Use UTF-8.
Advantages: Compact /if/ you send mostly ASCII text. Easily readable (for
debugging) /if/ you send mostly ASCII text. No byte-order issues.
Disadvantages: Consumes more bandwidth if you send mostly non-ASCII. Requires
explicit en/de-coding on the Windows box (perfectly possible, but you have to
write the code for it).

Use: UTF16-LE
Advantages: Compact in the cases where UTF-8 is not. Requires no special
handling in the Windows code (since that's the native format for a wstring) and
you always have to specify an encoding at the Java end so it makes no
difference which encoding you use from the Java point-of-view.
Disadvantages: Consumes more bandwidth if you send mostly ASCII text.

Without knowing your requirements, I'd can't guess which option would be best
for you, but I don't think any other options make sense.

Some other points to consider.

If you choose UTF8 then don't use java.io.DataInputStream.readUTF() or the
corresponding write method They doesn't do what the method names suggest.

If you choose UTF16-LE then you should consider whether a BOM (byte order mark)
is forbidden, tolerated, or required by your protocol. Alternatively you could
mandate merely UTF16 (either byte order) and /require/ a BOM -- that would give
you flexibility if you anticipate creating non Windows clients (which I doubt).

If you choose UTF8 then you should consider whether a BOM forbidden or
tolerated by your protocol.

If your choice between UTF-8 and -16 is significantly swayed by bandwidth
considerations, then it might be worthwhile considering using zlib compression.
Java already understands that, and it's easy to use the ZLIB1.DLL from Windows
code.

If your protocol is of the form:
<character count><character data>
then you should be very clear about what you mean by a "character", especially
if you use UTF16 (where there may be more 16-bit wchars / Java chars than
actual Unicode characters). Is the BOM (if any) included in the count ?

-- chris


 
 
Roedy Green





PostPosted: 2006-2-24 17:04:00 Top

java-programmer >> Transmitting strings via tcp from a windows c++ client to a Java server On Mon, 20 Feb 2006 12:10:49 -0000, "Chris Uppal"
<email***@***.com> wrote, quoted or indirectly
quoted someone who said :

>If you choose UTF8 then you should consider whether a BOM forbidden or
>tolerated by your protocol.

the BOM for UTF-8 looks like this:

EF BB BF

It is a misnomer. You don't need a byte order mark for UTF-8 since are
no lo-hi bytes to order. It is more like a file signature to indicate
a UTF-8 encoded file. Otherwise it will at a casual glance look no
different from any native platform encoding.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.