[These texts, from the voa-utext@voa.gov mailing list are archived in the unicode subdirectory. The whole archive is in voa-utext-mails.gz. You can also find a lot of stuff at ftp.voa.gov and gopher://gopher.voa.gov/] From: wt@voa.gov (Walt Torrance) The languages of the texts we will be posting include the following: Albanian Amharic Arabic Armenian Azerbaijani Bangla Bulgarian Burmese Creole Croatian Czech Dari English Estonian Farsi French Georgian Greek Hausa Hindi Hungarian Indonesian Khmer Korean Kurdish Lao Latvian Lithuanian Mandarin Pashto Polish Portuguese Romanian Russian Serbian(latin) Slovak Slovene Spanish Swahili Thai Turkish Ukrainian Urdu Uzbek Vietnamese Texts produced by our Tibetan and Cantonese services will not be included for now due to portability issues that may not be resolved in the near future. Texts in Amharic, Burmese, and Khmer will be mapped to Private Use Area planes. This information will be provided in a followup. File content will _generally_ be a combination of primary language, as indicated by filename, and English. We'll take a shot at how uuencode deals with these texts. Filenames will be of the series pattern: 'language####.suffix' eg.: thai0001.uni.uu Viewing the texts. We're aren't in a position at this time to provide a viewer/fonts for this text. We'd rather let the industry deal with that. How do we produce our texts? A brief overview. Since circa 1985 we have used Xerox(*) document systems to produce all of our texts, but Tibetan and Cantonese. (These two languages are relatively new additions to our broadcast family and for reasons I won't go into, are on separate platforms/environments.) The Xerox(*) XSoft(*) division GlobalView(*) product provides us with sophisticated document processing capabilities. Speak to your friendly local Xerox(*) salesperson for info. By this statement I am not endorsing this product. Government ethics, don't you know. We run Globalview(*) on several platforms, primarily Xerox(*) proprietary hardware (the Star(*) descendants) and Sun(*) SPARC(*) boxes. Not all languages are available on all platforms. The Xerox Corporate Character Set is our internal character encoding. We use a GlobalView(*) tool to out-convert to Unicode(*). Xerox(*), XSoft(*), & GlobalView(*) and probably Star(*) are trademarks of Xerox Corporation. Sun(*) is a trademark of Sun Microsystems, Inc. SPARC(*) is a trademark of SPARC International, Inc. Unicode(*) is a trademark of Unicode, Inc. ------------- Me, I'm one of three network analysts/administrators on a staff of 20, that's including managers, in-house programmers, user support assistants, & operators. We support about 1500 end-users on 1000 workstations, with about 100 shared network servers (print,mail,file,etc.)...major internal network hubs in Washington, DC, New York, & London... transient connections to our bureaus in most major world capitals. Twenty-four hours a day, seven days a week. So if I don't get back to you immediately on a request, thanks for understanding. -Walt