Discussion:
iconv GB18030 problems?
ольга крыжановская
2012-07-20 09:02:14 UTC
Permalink
Can any one say why the following iconv fails in GB18030 and prints 2
'?' of the unicode character U+1F000?

printf '\xf0\x9f\x80\x80' | iconv -f 'UTF-8' -t GB18030 | iconv -f GB18030
??

My understanding is that GB18030 supports all Unicode characters with
a GBK-like encoding, right?

Olga
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska-***@public.gmane.org \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Cedric Blancher
2012-07-20 13:18:16 UTC
Permalink
Post by ольга крыжановская
Can any one say why the following iconv fails in GB18030 and prints 2
'?' of the unicode character U+1F000?
printf '\xf0\x9f\x80\x80' | iconv -f 'UTF-8' -t GB18030 | iconv -f GB18030
??
My understanding is that GB18030 supports all Unicode characters with
a GBK-like encoding, right?
Right. My understanding is that GB18030 is slightly broken (iconv
isn't the only part, the whole Tibetan glyphs come up as ? as well)
and Sun^WORACLE doesn't care. I think a well-tuned email to the PRC
ministry of commerce [english.mofcom.gov.cn] will be the only way to
get that fixed (all software sold in China must conform to GB18030,
and if the software does not it will get banned from gov.cn sales or
even banned from China altogether. And the communists KNOW how to make
ORACLE dance).

Ced
--
Cedric Blancher <***@googlemail.com>
Institute Pasteur
_______________________________________________
opensolaris-discuss mailing
Alan Coopersmith
2012-07-20 19:54:27 UTC
Permalink
Post by Cedric Blancher
Post by ольга крыжановская
Can any one say why the following iconv fails in GB18030 and prints 2
'?' of the unicode character U+1F000?
printf '\xf0\x9f\x80\x80' | iconv -f 'UTF-8' -t GB18030 | iconv -f GB18030
??
My understanding is that GB18030 supports all Unicode characters with
a GBK-like encoding, right?
Right. My understanding is that GB18030 is slightly broken (iconv
isn't the only part, the whole Tibetan glyphs come up as ? as well)
and Sun^WORACLE doesn't care. I think a well-tuned email to the PRC
ministry of commerce [english.mofcom.gov.cn] will be the only way to
get that fixed
A customer with a support contract filing an escalation is usually the
easiest way to get a fix and doesn't rely on making vague threats or
trying to involve bureaucrats in other governments.
--
-Alan Coopersmith- ***@oracle.com
Oracle Solaris Engineering - http://blogs.oracle.com/alanc
_______________________________________________
opensolaris-discuss ma
Jan Hnatek
2012-07-23 07:18:11 UTC
Permalink
Hi Olga,

I got the following response forwarding your query:
===
That is because current GB18030<->unicode conversion code table we're
using is NOT latest one.
The character in your input belongs to CJK unified ideographs extension
B, which is defined only in GB18030-2005 standard.
===

Regards,
hnhn
Post by ольга крыжановская
Can any one say why the following iconv fails in GB18030 and prints 2
'?' of the unicode character U+1F000?
printf '\xf0\x9f\x80\x80' | iconv -f 'UTF-8' -t GB18030 | iconv -f GB18030
??
My understanding is that GB18030 supports all Unicode characters with
a GBK-like encoding, right?
Olga
--
Jan Hnatek
***@oracle.com
_______________________________________________
opensolaris-discuss mailing list
Cedric Blancher
2012-07-25 18:24:30 UTC
Permalink
Post by Jan Hnatek
Hi Olga,
===
That is because current GB18030<->unicode conversion code table we're using
is NOT latest one.
So why does it take so long for every Unix operating system (tested
with AIX, FreeBSD, Linux) with GB18030 support to have this, with
Solaris as the only one lagging behind (no, I can't file a support
request; we don't have a support contract anymore, we god rid of those
after Oracle was not able to fulfil its support contracts)?

Ced
--
Cedric Blancher <cedric.blancher-gM/Ye1E23mwN+***@public.gmane.org>
Institute Pasteur
Loading...