Converting Unicode to ASCII  
Author Message
Bill George





PostPosted: 2003-9-16 16:22:00 Top

java-programmer, Converting Unicode to ASCII Any Java code examples for converting Unicode to ASCII?

Many thanks.

 
Joona I Palaste





PostPosted: 2003-9-16 16:36:00 Top

java-programmer >> Converting Unicode to ASCII Bill George <email***@***.com> scribbled the following:
> Any Java code examples for converting Unicode to ASCII?

As Unicode contains thousands of characters that aren't representable in
ASCII, I don't think converting from Unicode to ASCII is possible at
all.

--
/-- Joona Palaste (email***@***.com) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"The question of copying music from the Internet is like a two-barreled sword."
- Finnish rap artist Ezkimo
 
Michael Borgwardt





PostPosted: 2003-9-16 17:17:00 Top

java-programmer >> Converting Unicode to ASCII Bill George wrote:
> Any Java code examples for converting Unicode to ASCII?

String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");

 
 
Michael Borgwardt





PostPosted: 2003-9-16 17:22:00 Top

java-programmer >> Converting Unicode to ASCII Michael Borgwardt wrote:
> Bill George wrote:
>
>> Any Java code examples for converting Unicode to ASCII?
>
>
> String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
> byte[] bytes = String.getBytes("US_ASCII");

Oops, that should be:

byte[] bytes = unicode.getBytes("US-ASCII");


 
 
Steve Horsley





PostPosted: 2003-9-17 1:59:00 Top

java-programmer >> Converting Unicode to ASCII On Tue, 16 Sep 2003 08:22:26 +0000, Bill George wrote:

> Any Java code examples for converting Unicode to ASCII?
>
> Many thanks.

That's a very vague question. What is your input?

If it's a java String, then yes, it is in unocode, and
byte[] bytes = myString.getBytes("ASCII");
will return the ASCII. All the unicode values that aren't
represented in the ASCII characterset will be converted to '?'.

If it's a file on disk, then it isn't unicode, it's unicode that'e been
encoded into bytes somehow and you need to know how before you can recover
and convert the unicode.

Steve.
 
 
RC





PostPosted: 2003-9-17 5:09:00 Top

java-programmer >> Converting Unicode to ASCII

Bill George wrote:
> Any Java code examples for converting Unicode to ASCII?

There are only 255 = 2 to power 8 (from 00 to ff) values in ASCII.
Unicode are from 0000 to ffff (2 to power 16 = ??).
Everything in Unicode from 0000 to 00ff are exactly the same as
ASCII. You can not convert unicode to ASCII which value greater than
00ff, right?!

But you can convert other code (like Chinese, Japanese, Korean) into
unicode.

 
 
John O'Conner





PostPosted: 2003-9-17 17:50:00 Top

java-programmer >> Converting Unicode to ASCII RC wrote:
>
>
> Bill George wrote:
>
>> Any Java code examples for converting Unicode to ASCII?
>
>
> There are only 255 = 2 to power 8 (from 00 to ff) values in ASCII.
> Unicode are from 0000 to ffff (2 to power 16 = ??).
> Everything in Unicode from 0000 to 00ff are exactly the same as
> ASCII. You can not convert unicode to ASCII which value greater than
> 00ff, right?!
>


Almost right...everything from 0x0000 through 0x007F are exactly the
same as ASCII. Everything from 0x0000 through 0x00FF is the same as ISO
Latin 1.

The safest way to create ASCII text is to use the built-in converters, ie
byte[] ascii = someString.getBytes("ASCII");

 
 
Joona I Palaste





PostPosted: 2003-9-17 18:30:00 Top

java-programmer >> Converting Unicode to ASCII John O'Conner <email***@***.com> scribbled the following:
> RC wrote:
>> Bill George wrote:
>>
>>> Any Java code examples for converting Unicode to ASCII?
>>
>> There are only 255 = 2 to power 8 (from 00 to ff) values in ASCII.
>> Unicode are from 0000 to ffff (2 to power 16 = ??).
>> Everything in Unicode from 0000 to 00ff are exactly the same as
>> ASCII. You can not convert unicode to ASCII which value greater than
>> 00ff, right?!

> Almost right...everything from 0x0000 through 0x007F are exactly the
> same as ASCII. Everything from 0x0000 through 0x00FF is the same as ISO
> Latin 1.

> The safest way to create ASCII text is to use the built-in converters, ie
> byte[] ascii = someString.getBytes("ASCII");

Isn't ISO Latin 1 a superset of ASCII? In that case you could simply say
that everything from 0x0000 through 0x00FF is the same as ISO Latin 1.

--
/-- Joona Palaste (email***@***.com) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"The truth is out there, man! Way out there!"
- Professor Ashfield
 
 
John O'Conner





PostPosted: 2003-9-18 0:54:00 Top

java-programmer >> Converting Unicode to ASCII Joona I Palaste wrote:

> Isn't ISO Latin 1 a superset of ASCII? In that case you could simply say
> that everything from 0x0000 through 0x00FF is the same as ISO Latin 1.
>


Yes you are correct. However, the original post asked about ASCII. Only
telling them that 0x0000 through 0x00FF is the same as ISO Latin 1 would
not have helped them.

ASCII (0x000 through 0x007F) is a subset of Latin 1 (0x0000 through
0x00FF). Both are contained within Unicode using the same codepoints.

 
 
Jon Skeet





PostPosted: 2003-9-18 15:14:00 Top

java-programmer >> Converting Unicode to ASCII Roedy Green <email***@***.com> wrote:
> On Tue, 16 Sep 2003 11:16:37 +0200, Michael Borgwardt
> <email***@***.com> wrote or quoted :
>
> >String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
> >byte[] bytes = String.getBytes("US_ASCII");
>
> does anything but Java source code understand \uxxxx sequences?

Property files do - and it's also used in C#.

--
Jon Skeet - <email***@***.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
 
 
KC Wong





PostPosted: 2003-9-18 15:26:00 Top

java-programmer >> Converting Unicode to ASCII > > >String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
> > >byte[] bytes = String.getBytes("US_ASCII");
> >
> > does anything but Java source code understand \uxxxx sequences?
>
> Property files do - and it's also used in C#.

Be careful with the escape sequences though - you could break your code if
the unicode escapes is converted to new lines, quotes and etc.


KC.


 
 
brazil





PostPosted: 2003-9-18 16:44:00 Top

java-programmer >> Converting Unicode to ASCII Roedy Green <email***@***.com> wrote in message news:<email***@***.com>...
> >String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
> >byte[] bytes = String.getBytes("US_ASCII");
>
> does anything but Java source code understand \uxxxx sequences?

C# uses the same syntax for escape sequences. I don't know
if there's anything else.
 
 
Chris Smith





PostPosted: 2003-9-20 1:24:00 Top

java-programmer >> Converting Unicode to ASCII KC Wong wrote:
> > > >String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
> > > >byte[] bytes = String.getBytes("US_ASCII");
> > >
> > > does anything but Java source code understand \uxxxx sequences?
> >
> > Property files do - and it's also used in C#.
>
> Be careful with the escape sequences though - you could break your code if
> the unicode escapes is converted to new lines, quotes and etc.

Right. Unicode escapes are used to avoid typing characters that aren't
on your keyboard, NOT to escape characters with meaning to the language.
They are a whole different ball of wax from the traditional C-style
escape sequences for strings.

Of course, I could also argue that anyone who tries using the Unicode
value of a quote or newline in preference to \n or \" deserves whatever
they get. :)

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation