Forgot password? | Forgot username? | Register

Chinese Mandarin and support for Asia-Pacific Languages

Chinese Mandarin and support for Asia-Pacific Languages

I am interested to hear from you if you are facing the challenge of how to support EMu content written in Chinese Mandarin or other Asia Pacific languages that have characters outside the normal EMu Latin-1 set.

Id also like to hear from you if you are using unicode UTF-8 to support languages other than English or the standard European languages.

What are the search, sort and indexing limitations that you face if you switch to UTF-8/Unicode? How did you resolve them?

Look forward to your responses.

Giselle

Administrator has disabled public posting. Please login or register in order to proceed.

Re: Chinese Mandarin and support for Asia-Pacific Languages

Hi Giselle & all,

Apologies--this isn't much of an answer, but more of a "we're in the same boat," about to switch from Latin-1 to UTF-8 to accommodate datasets with Arabic text.  Likewise, just wondering if others out there have noticed (& resolved) any issues with character set transitions. 

So far, we haven't found issues in our test environment which just switched to UTF-8, but also not exactly sure what to check for...

Thanks for any wisdom & take care!
-Kate

Kate Webbink
Information Systems Specialist
useravatar
Offline
10 Posts
Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Chinese Mandarin and support for Asia-Pacific Languages

Hi and thanks for your response,

Im glad to hear, that so far, you havnt had any problems switching to UTF-8.

From my understanding there are limitations with moving to UTF-8 as KE advised me below:

"There are several limitations in the EMu Unicode

■ You will lose major search capabilities. Searching, by default, is case insensitive for characters A to Z but no other character transformations will be performed. Thus a search for zoe would match zoe, Zoe, ZOE, ZoE, but would not match ZoÉ. A search for zoÉ would match ZoÉ, zOÉ, zoÉ but would not match Zoe, ZOE, zoe, zoé.
■ Sorting of characters will be done in the strict lexical order of the UCS without any ability to map non-Latin characters to an equivalent base Latin character.
■ Any data loaded or imported must be in UTF-8 format."

Im keen to hear from any other KE EMu clients that may be in the same boat. Looking for a solution to the above limitations. Or, if you have switched already, are these problems affecting your users or use of the data?

Regards Giselle




Administrator has disabled public posting. Please login or register in order to proceed.

Re: Chinese Mandarin and support for Asia-Pacific Languages

Hi Giselle--Many thanks for the details.

Have you been able to test-run a CSV report of data in EMu that includes Chinese/other Asian-language characters? 

For us, CSV's that come out of EMu replace Arabic script with what looks like a utf reference, e.g.:

\x{063a}\x{0631}\x{0628} \x{0643}\x{064a}\x{0634}

...In which case, maybe the CSV's that EMu is able to report might still be Latin-1 (or at least not UTF8) encoded?

It also looks like those 3 points you listed are in fact all issues in our test environment.

- searches are case sensitive for special characters
- sorts don't follow a "mapped to Latin character" order
- when importing data, any special characters/accented letters in a csv encoded with Latin-1 character set turn into "?" in EMu records

..definitely curious to know whether/when any fixes or workarounds are available. (Currently testing/comparing workflows involving Excel, Libre Office, specifying UTF8 in the CSV formatting versus not, etc)

Thanks again
-Kate

Edited by: Kate Webbink - 02-Apr-15 03:14:24

Kate Webbink
Information Systems Specialist
useravatar
Offline
10 Posts
Website 
Administrator has disabled public posting. Please login or register in order to proceed.

Re: Chinese Mandarin and support for Asia-Pacific Languages

Thanks Kate for confirming that you are experiencing the expected issues with UTF-8 as mentioned above.

I am currently talking with Axiell about possible solutions to these problems in the client. If a sustainable solution is found that may benefit others I will make sure this information is posted here.

Im sure many EMu clients will agree that UTF-8 and unicode encoding is a must for multi-cultural collections and artists/collectors names that span the Asia-Pacific language base, especially where this information is published in digital formats such as online.

Cheers Giselle

Administrator has disabled public posting. Please login or register in order to proceed.

Re: Chinese Mandarin and support for Asia-Pacific Languages

Thanks for the update, Giselle! 

Not much progress here, but we tested "iconv" (along with Access & Open/LibreOffice) for converting small batches of data reported or imported as CSVs.  Those solve some of the problems we ran into with opening/handling data with special characters in Excel.  Otherwise, though, we still don't have a more efficient method of converting datasets or spot-checking/globally replacing whichever characters would need it.

Hope Axiell can work something out, or offer advice/support for conversions.

----(and in case it's of use for anyone:
To use iconv to encode Excel csv's as UTF8:
- instal iconv (consider dbaportal.eu/2012/10/24/iconv-for-windows/)
- In a command prompt:
1) cd path/to/unconverted/CSVs:
2) iconv -f CP1252 -t UTF-8 infile > outfile

(use an excel-generated csv for "infile" & give it a new "outfile" name)
(also: beware of opening the "outfile" with Excel; depending on the version, that can unintentionally mangle UTF8 characters & convert it right back to Latin1)

Kate Webbink
Information Systems Specialist
useravatar
Offline
10 Posts
Website 
Administrator has disabled public posting. Please login or register in order to proceed.
There are 0 guests and 0 other users also viewing this topic

Board Info

Board Stats
 
Total Topics:
601
Total Polls:
0
Total Posts:
1362
User Info
 
Total Users:
827
Newest User:
Marcus Swann (Axiell Melbourne)
Members Online:
0
Guests Online:
212

Online: 
There are no members online