Dear colleagues,

Just to let you know that I have in a private capacity sent the following comment to the Public Comment for the Draft final report from the EWG on Internationalized Registration Data:

Subj.: each form may be derived from the other

I am Co-Chair of the Translation & Transliteration of Contact Information PDP WG, but making this comment in a private capacity.

It is a solid and practical report and obviously the result of a huge amount of work.

I would like to make some points about Table 4 on p.11:

• I feel it is important to stress that Table 4 is an ideal of clean data, that currently may only be produced manually and that it contains aspects of both transliteration (e.g. 千代田 -> Chiyoda) and translation (e.g. ビル -> Bldg.).

• Moreover, the relationships between the original and the transformed records are complex and it is not possible to move automatically in either direction e.g.“first” is usually 第一 /daiichi/ but in this case is ファースト /faasuto/ from the English, where // represents a transliteration. In fact there are other Japanese possibilities, but I’ll omit those in the interests of brevity. Unfortunately ファースト may be either “first” or “fast”. (One wonders whether the Japanese fast food chain ファースト・キッチン /faasuto kitchin/, also advertised in Japan as the translated form, “First Kitchen” may have originally been a mistranslation of “Fast Kitchen”.)

• This may affect text such as “original data could have been in either form” and, especially, “each from can be derived from the other”.

Scripts where letters can be read in more than one way or which do not use spaces to define word boundaries (Japanese falls into both of these categories) will be the most resistant to automated transliteration/translation.

Chris Dillon.

Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon