encoding troubles  
Author Message
Matthijs Blaas





PostPosted: 2004-8-19 22:57:00 Top

java-programmer, encoding troubles Hi all!

I have a byte array of data which I want to post to an php script. So I send
the representing string of the byte array using a post with utf-8 encoding.
I figured java works with UTF-16BE internally, but after receiving the utf-8
encoded post in php and converting it to back to UTF-16BE it was not the
same...

I wrote a little test and it turns out if i conver a byte array to utf-8 and
back it wont match the original byte array, but the string representing it
is the same:

String test="some test";
byte[] dest,temp;

dest = test.getBytes();

try {
temp=new String(dest).getBytes("UTF-8");
}catch(Exception e) { System.out.println(e); }

try {
test=new String(temp,"UTF-8");
System.out.println(test);
dest=test.getBytes();
} catch(Exception e) { System.out.println(e); }

if(dest==temp) System.out.println("Success!");
else System.out.println("failed");

How can I convert the UTF-8 bytes back to the original byte array? And will
this work when posting to php? Its important the binary data remains intact
instead of the string representation...

Thanks in advance,

Matthijs


 
BigbooTAY





PostPosted: 2004-8-20 0:03:00 Top

java-programmer >> encoding troubles On Thu, 19 Aug 2004 16:57:24 +0200, "Matthijs Blaas"
<email***@***.com> wrote:

>
>String test="some test";
>byte[] dest,temp;
>
>dest = test.getBytes();
>
>try {
> temp=new String(dest).getBytes("UTF-8");
>}catch(Exception e) { System.out.println(e); }
>
>try {
> test=new String(temp,"UTF-8");
> System.out.println(test);
> dest=test.getBytes();
>} catch(Exception e) { System.out.println(e); }
>
>if(dest==temp) System.out.println("Success!");
>else System.out.println("failed");
>
>How can I convert the UTF-8 bytes back to the original byte array? And will
>this work when posting to php? Its important the binary data remains intact
>instead of the string representation...

You can't use == to compare the contents of two arrays, so that test
is never going to work. Actually, by the time you do that test,
you've got two different representations anyway: 'temp' was created
using getBytes("UTF-8") while 'dest' was created with getBytes(),
which would use the system default encoding. They probably do contain
the same bytes, since the string only contains ASCII characters, but
you want to be aware of that.

This test works correctly:

try
{
String str0 = "some test";
byte[] ar0 = str0.getBytes("UTF-8");

String str1 = new String(ar0, "UTF-8");
System.out.println(str1);
}
catch (Exception ex)
{
ex.printStackTrace();
}

BTW, you don't need to know what encoding Java uses internally. All
you need to know is the encoding of the byte array, so you can do the
conversion correctly.
 
Matthijs Blaas





PostPosted: 2004-8-20 6:35:00 Top

java-programmer >> encoding troubles Thanks for your reply, it indeed sounds more logical to have the byte array
encoding converted before i do anything with it :)

However thing is, I convert a string(containing my plaintext) to a byte
array(utf8 encoded), I feed this byte array to an encryption engine which
returns back a byte array(internal encoding I presume?). If I want to
decrypt this encoded block of bytes back it is ofcourse important that the
bytes remain the same as outputted by the engine when php receives the
posted message. Im a little confused on how to achieve this as the original
bytes will get lost if I convert them using another encoding? Must I convert
back to the encoding java uses internal? And isn't this platform specific?

Please help!

-Thijs


"BigbooTAY" <email***@***.com> wrote in message
news:email***@***.com...
> On Thu, 19 Aug 2004 16:57:24 +0200, "Matthijs Blaas"
> <email***@***.com> wrote:
>
> >
> >String test="some test";
> >byte[] dest,temp;
> >
> >dest = test.getBytes();
> >
> >try {
> > temp=new String(dest).getBytes("UTF-8");
> >}catch(Exception e) { System.out.println(e); }
> >
> >try {
> > test=new String(temp,"UTF-8");
> > System.out.println(test);
> > dest=test.getBytes();
> >} catch(Exception e) { System.out.println(e); }
> >
> >if(dest==temp) System.out.println("Success!");
> >else System.out.println("failed");
> >
> >How can I convert the UTF-8 bytes back to the original byte array? And
will
> >this work when posting to php? Its important the binary data remains
intact
> >instead of the string representation...
>
> You can't use == to compare the contents of two arrays, so that test
> is never going to work. Actually, by the time you do that test,
> you've got two different representations anyway: 'temp' was created
> using getBytes("UTF-8") while 'dest' was created with getBytes(),
> which would use the system default encoding. They probably do contain
> the same bytes, since the string only contains ASCII characters, but
> you want to be aware of that.
>
> This test works correctly:
>
> try
> {
> String str0 = "some test";
> byte[] ar0 = str0.getBytes("UTF-8");
>
> String str1 = new String(ar0, "UTF-8");
> System.out.println(str1);
> }
> catch (Exception ex)
> {
> ex.printStackTrace();
> }
>
> BTW, you don't need to know what encoding Java uses internally. All
> you need to know is the encoding of the byte array, so you can do the
> conversion correctly.


 
 
Alan Moore





PostPosted: 2004-8-20 14:56:00 Top

java-programmer >> encoding troubles On Fri, 20 Aug 2004 00:34:59 +0200, "Matthijs Blaas"
<email***@***.com> wrote:

>However thing is, I convert a string(containing my plaintext) to a byte
>array(utf8 encoded), I feed this byte array to an encryption engine which
>returns back a byte array(internal encoding I presume?). If I want to
>decrypt this encoded block of bytes back it is ofcourse important that the
>bytes remain the same as outputted by the engine when php receives the
>posted message. Im a little confused on how to achieve this as the original
>bytes will get lost if I convert them using another encoding? Must I convert
>back to the encoding java uses internal? And isn't this platform specific?

What is this encrytpion engine? Is is a Java app? You say it works
with byte arrays, but does it know that the bytes are supposed to
represent characters? (I would think, if it's for encrypting text, it
would work with char arrays.) On the other hand, if it doesn't care
what the bytes represent, it shouldn't matter what encoding you use.
It will scramble the bytes in a certain way, and the decrytion process
will unscramble them. You just have to make sure you use the same
character encoding at both ends.

I don't have enough information to give you an answer; I'm just trying
to help you ask more fruitful questions.
 
 
Matthijs Blaas





PostPosted: 2004-8-20 15:52:00 Top

java-programmer >> encoding troubles I use this encryption library: http://logi.org/logi.crypto/
The engine I use is to encode blocks of bytes (it doesn't concern if it's
text or something else). Yet it matters what you feed it, I use an RSA
cipher to encode a block of data, but when I want to decode, I have to feed
it exactly the same block that it outputted. It should be compatible with
other RSA standards like openssl. I want the encrypted block of data to be
decrypted in php using openssl. However when I perform a little test in java
to see if the data isn't messed up it throws an error that the input data is
invalid cipherblock:

byte[] source,dest,res;
String plain="test",contents;

try { source=plain.getBytes("UTF-8"); } catch(Exception e) {}
try { dest=Encrypt(source,0,source.length); } catch(Exception e) {}
write dest to utf8 encoded file
read utf8 encoded file in contents

now when I try to decrypt the file contents which is different than the
original byte array dest:
try { res=Decrypt(contents.getBytes(),0,contents.getBytes().length) }
catch(Exception e) {}
it throws me an error, if I use the original byte array dest it obviously
does work. The string representation seems to be the same for both byte
arrays (new String(dest) & contents) the bytes are scrambled in the utf8
encoded version... anyway to get around this problem?

Thanks,

Matthijs


"Alan Moore" <email***@***.com> wrote in message
news:email***@***.com...
> On Fri, 20 Aug 2004 00:34:59 +0200, "Matthijs Blaas"
> <email***@***.com> wrote:
>
> >However thing is, I convert a string(containing my plaintext) to a byte
> >array(utf8 encoded), I feed this byte array to an encryption engine which
> >returns back a byte array(internal encoding I presume?). If I want to
> >decrypt this encoded block of bytes back it is ofcourse important that
the
> >bytes remain the same as outputted by the engine when php receives the
> >posted message. Im a little confused on how to achieve this as the
original
> >bytes will get lost if I convert them using another encoding? Must I
convert
> >back to the encoding java uses internal? And isn't this platform
specific?
>
> What is this encrytpion engine? Is is a Java app? You say it works
> with byte arrays, but does it know that the bytes are supposed to
> represent characters? (I would think, if it's for encrypting text, it
> would work with char arrays.) On the other hand, if it doesn't care
> what the bytes represent, it shouldn't matter what encoding you use.
> It will scramble the bytes in a certain way, and the decrytion process
> will unscramble them. You just have to make sure you use the same
> character encoding at both ends.
>
> I don't have enough information to give you an answer; I'm just trying
> to help you ask more fruitful questions.


 
 
Gordon Beaton





PostPosted: 2004-8-20 16:11:00 Top

java-programmer >> encoding troubles On Fri, 20 Aug 2004 09:51:35 +0200, Matthijs Blaas wrote:
> try { source=plain.getBytes("UTF-8"); } catch(Exception e) {}
> try { dest=Encrypt(source,0,source.length); } catch(Exception e) {}

I think this part is the problem:

> write dest to utf8 encoded file
> read utf8 encoded file in contents

Compare source and dest arrays - are they equal?

Now create a new String from dest _without_ first storing the results
in a file. Is the new String equal() to the original String plain?

Accoding to the following code, contents is a String, so you are
performing an additional conversion from byte[] to String to byte[]
when you write and read the file:

> try { res=Decrypt(contents.getBytes(),0,contents.getBytes().length) }
> catch(Exception e) {}

How do you write dest to the file? How do you read it back? Hint: use
Readers and Writers for text, InputStreams and OutputStreams for
byte[].

/gordon

--
[ do not email me copies of your followups ]
g o r d o n + n e w s @ b a l d e r 1 3 . s e
 
 
Matthijs Blaas





PostPosted: 2004-8-20 16:33:00 Top

java-programmer >> encoding troubles The array which I read from the file is not the same as the dest array,
their string representation is. This is must be because of java's internal
encoding (which the dest array is encoded in).

this is how I write the array to a file:
output = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(aFile), "UTF8"));
output.write(new String(new String(dest).getBytes("UTF-8")));

this is how i read the contents back:
BufferedReader in = new BufferedReader(new InputStreamReader(new
FileInputStream(aFile), "UTF8"));
contents = in.readLine();

So at this point contents=new String(dest) but contents.getBytes() != dest
It might be the Reader & Writer used?


"Gordon Beaton" <email***@***.com> wrote in message
news:4125b22c$email***@***.com...
> On Fri, 20 Aug 2004 09:51:35 +0200, Matthijs Blaas wrote:
> > try { source=plain.getBytes("UTF-8"); } catch(Exception e) {}
> > try { dest=Encrypt(source,0,source.length); } catch(Exception e) {}
>
> I think this part is the problem:
>
> > write dest to utf8 encoded file
> > read utf8 encoded file in contents
>
> Compare source and dest arrays - are they equal?
>
> Now create a new String from dest _without_ first storing the results
> in a file. Is the new String equal() to the original String plain?
>
> Accoding to the following code, contents is a String, so you are
> performing an additional conversion from byte[] to String to byte[]
> when you write and read the file:
>
> > try { res=Decrypt(contents.getBytes(),0,contents.getBytes().length) }
> > catch(Exception e) {}
>
> How do you write dest to the file? How do you read it back? Hint: use
> Readers and Writers for text, InputStreams and OutputStreams for
> byte[].
>
> /gordon
>
> --
> [ do not email me copies of your followups ]
> g o r d o n + n e w s @ b a l d e r 1 3 . s e


 
 
Gordon Beaton





PostPosted: 2004-8-20 16:53:00 Top

java-programmer >> encoding troubles On Fri, 20 Aug 2004 10:32:50 +0200, Matthijs Blaas wrote:
> this is how I write the array to a file:
> output = new BufferedWriter(new OutputStreamWriter(new
> FileOutputStream(aFile), "UTF8"));

The following line takes the dest byte array, converts it using the
default system encoding (which is what?) to a String, converts the
String to a second byte array (using UTF-8), then creates another
String from that (using the system default encoding), and finally (are
you still with me here) writes that second String to the file. If all
has gone well, the UTF8 (hmm, different spelling) representation of
the second String gets stored in the file:

> output.write(new String(new String(dest).getBytes("UTF-8")));

Really, what are you hoping all of those conversions will achieve?

> this is how i read the contents back:
> BufferedReader in = new BufferedReader(new InputStreamReader(new
> FileInputStream(aFile), "UTF8"));
> contents = in.readLine();
>
> So at this point contents=new String(dest) but contents.getBytes()
> != dest It might be the Reader & Writer used?

The Reader and Writer seem to be only part of the problem. All of the
unnecessary conversions certainly don't help.

Step 1: don't use any kind of Reader or Writer for handling non-text.

To write the byte[] to the file, use a FileOutputStream directly
(possibly wrapping it in a BufferedOutputStream):

OutputStream output = new FileOutputStream(afile);
output.write(dest);

To read the byte[] from the file, use a FileInputStream directly
(possibly wrapping it in a BufferedInputStream):

InputStream input = new FileInputStream(afile);
input.read(contents);

(error checking etc left out but still necessary)

/gordon

--
[ do not email me copies of your followups ]
g o r d o n + n e w s @ b a l d e r 1 3 . s e
 
 
monroeds





PostPosted: 2004-8-20 20:23:00 Top

java-programmer >> encoding troubles "Matthijs Blaas" <email***@***.com> wrote in message news:<4124bfd7$0$767$email***@***.com>...
> if(dest==temp) System.out.println("Success!");
> else System.out.println("failed");


Try

if(dest.equals(temp) System.out.println("Success!");

The '==' operator checks to see if the two objects are the same object.
 
 
John C. Bollinger





PostPosted: 2004-8-20 23:36:00 Top

java-programmer >> encoding troubles Matthijs Blaas wrote:

> Thanks for your reply, it indeed sounds more logical to have the byte array
> encoding converted before i do anything with it :)
>
> However thing is, I convert a string(containing my plaintext) to a byte
> array(utf8 encoded), I feed this byte array to an encryption engine which
> returns back a byte array(internal encoding I presume?). If I want to
> decrypt this encoded block of bytes back it is ofcourse important that the
> bytes remain the same as outputted by the engine when php receives the
> posted message. Im a little confused on how to achieve this as the original
> bytes will get lost if I convert them using another encoding? Must I convert
> back to the encoding java uses internal? And isn't this platform specific?

The encrypted byte array no longer represents characters in any
character encoding, so it is inappropriate to attempt to convert it to a
String. A byte[] is exactly what it is. If you MUST transfer it in
String form, then choose an *8-bit* encoding (ISO-8859-1 is probably a
good bet) to get a one-to-one correspondence between bytes and
characters. I'm not sure what your Java / PHP interface looks like, but
do make sure that PHP interprets the characters according to the correct
encoding in order to get the bytes back. Do note that THIS IS A HACK.
You should really be passing around the raw bytes without pretending
that they represent characters.


John Bollinger
email***@***.com
 
 
Michael Borgwardt





PostPosted: 2004-8-21 6:50:00 Top

java-programmer >> encoding troubles John C. Bollinger wrote:
> The encrypted byte array no longer represents characters in any
> character encoding, so it is inappropriate to attempt to convert it to a
> String. A byte[] is exactly what it is. If you MUST transfer it in
> String form, then choose an *8-bit* encoding (ISO-8859-1 is probably a
> good bet)

Still too risky. Instead, use a base64 or hex encoding.
 
 
Matthijs Blaasthijs_blaas





PostPosted: 2004-8-21 17:22:00 Top

java-programmer >> encoding troubles Thanks for all your replies! The subject is much more clear for me now...

"Matthijs Blaas" <email***@***.com> wrote in message
news:4124bfd7$0$767$email***@***.com...
> Hi all!
>
> I have a byte array of data which I want to post to an php script. So I
send
> the representing string of the byte array using a post with utf-8
encoding.
> I figured java works with UTF-16BE internally, but after receiving the
utf-8
> encoded post in php and converting it to back to UTF-16BE it was not the
> same...
>
> I wrote a little test and it turns out if i conver a byte array to utf-8
and
> back it wont match the original byte array, but the string representing it
> is the same:
>
> String test="some test";
> byte[] dest,temp;
>
> dest = test.getBytes();
>
> try {
> temp=new String(dest).getBytes("UTF-8");
> }catch(Exception e) { System.out.println(e); }
>
> try {
> test=new String(temp,"UTF-8");
> System.out.println(test);
> dest=test.getBytes();
> } catch(Exception e) { System.out.println(e); }
>
> if(dest==temp) System.out.println("Success!");
> else System.out.println("failed");
>
> How can I convert the UTF-8 bytes back to the original byte array? And
will
> this work when posting to php? Its important the binary data remains
intact
> instead of the string representation...
>
> Thanks in advance,
>
> Matthijs
>
>