Hello everyone!
Recently I search how to convert Cyrillic to Latin with Cache Object Script, but didn't find anything and decided to write ourselves,
So here code:
//Create Class method
ClassMethod convertRussionToEnglish(word As %String)
{
//add array of transliteration system
Set convertArray = $LB(
$LB("а","a"),$LB("б","b"),$LB("в","v"),$LB("г","g"),$LB("д","d"),$LB("е","e"),$LB("ё","e"),$LB("ж","zh"),$LB("з","z"),
$LB("и","i"),$LB("й","y"),$LB("к","k"),$LB("л","l"),$LB("м","m"),$LB("н","n"),$LB("о","o"),$LB("п","p"),
$LB("р","r"),$LB("с","s"),$LB("т","t"),$LB("у","u"),$LB("ф","f"),$LB("х","kh"),$LB("ц","ts"),$LB("ч","ch"),
$LB("ш","sh"),$LB("щ","shch"),$LB("ы","y"),$LB("э","e"),$LB("ю","yu"),$LB("я","ya"),$LB("ъ",""),$LB("ь",""),
$LB("А","A"),$LB("Б","B"),$LB("В","V"),$LB("Г","G"),$LB("Д","D"),$LB("Е","E"),$LB("Ё","E"),$LB("Ж","ZH"),$LB("З","Z"),
$LB("И","I"),$LB("Й","Y"),$LB("К","K"),$LB("Л","L"),$LB("М","M"),$LB("Н","N"),$LB("О","O"),$LB("П","P"),
$LB("Р","R"),$LB("С","S"),$LB("Т","T"),$LB("У","U"),$LB("Ф","F"),$LB("Х","KH"),$LB("Ц","TS"),$LB("Ч","CH"),
$LB("Ш","SH"),$LB("Щ","SHCH"),$LB("Ы","Y"),$LB("Э","E"),$LB("Ю","YU"),$LB("Я","YA"),$LB("Ъ",""),$LB("Ь","")
)
//word Example
Set wordToConvert = "Пример для Кода"
Set wordToConvertLength = $L(wordToConvert)
Set cnt=$ListLength(convertArray)
Set latinWord = ""
//and with cycle get each letter and parse in transliteration array
for i=1:1:wordToConvertLength {
Set cyrillicWord = $E(wordToConvert,i)
for j=1:1:cnt {
Set codes=$ListGet(convertArray,j)
Set cyrillicLetter=$ListGet(codes,1)
Set latinLetter=$ListGet(codes,2)
if cyrillicLetter=cyrillicWord {
Set cyrillicWord = latinLetter
}
}
Set latinWord = latinWord_cyrillicWord
}
//Get result of convert
Quit latinWord
}
Interesting, why you duplicated lower and uppercase, and not sure if it's good to uppercase all letters in transliterated variant, even when only this letter was in uppercase. I mean like, Юла -> YUla, looks weird. I think it should check the case of the original word, if it completely uppercase, it should uppercase resulting word, but if only first letter in upper, so, resulting string should use $zconvert(word, "W")
I was looking for quick solutions for my task and get mapping of letters from the Internet
Less searching all around:
ClassMethod getDict(Output dict) { kill dict set dict("а")="a" set dict("б")="b" set dict("в")="v" set dict("г")="g" set dict("д")="d" set dict("е")="e" set dict("ж")="zh" set dict("з")="z" set dict("и")="i" set dict("й")="y" set dict("к")="k" set dict("л")="l" set dict("м")="m" set dict("н")="n" set dict("о")="o" set dict("п")="p" set dict("р")="r" set dict("с")="s" set dict("т")="t" set dict("у")="u" set dict("ф")="f" set dict("х")="kh" set dict("ц")="ts" set dict("ч")="ch" set dict("ш")="sh" set dict("щ")="shch" set dict("ъ")="" set dict("ы")="y" set dict("ь")="" set dict("э")="e" set dict("ю")="yu" set dict("я")="ya" } /// w ##class(Test.Cyr).convertRussionToEnglish() ClassMethod convertRussionToEnglish(word As %String = "Привет") { do ..getDict(.dict) set out = "" for i=1:1:$l(word) { set letter = $e(word, i) set letterL = $zcvt(letter, "l") set outLetter = dict(letterL) set:letter'=letterL outLetter = $zcvt(outLetter, "U") set out = out _ outLetter } quit out }
There is a way to do something similar in Caché and IRIS. In a Russian locale, you have access to the "KOI8R" I/O translation table. KOI8-R has the funny property that if you mask out the high-order bit, you get a sort of readable transliteration. Here's an example using a Unicode instance in the "rusw" locale:
USER>s koi8=$zcvt("Пример для Кода","O","KOI8R") USER>s ascii="" f i=1:1:$l(koi8) s ascii=ascii_$c($zb($a(koi8,i),127,1)) USER>zw ascii ascii="pRIMER DLQ kODA"
Mine is faster ;)
ClassMethod RussianToEnglish(russian = "привет") As %String { set rus="абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭьЬъЪ" set eng="abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE" set rus("ж")="zh" set rus("ц")="ts" set rus("ч")="ch" set rus("ш")="sh" set rus("щ")="shch" set rus("ю")="yu" set rus("я")="ya" set rus("Ж")="Zh" set rus("Ц")="Ts" set rus("Ч")="Ch" set rus("Ш")="Sh" set rus("Щ")="Shch" set rus("Ю")="Yu" set rus("Я")="Ya" set english=$tr(russian,rus,eng) set wow=$O(rus("")) while wow'="" { set english=$Replace(english,wow,rus(wow)) set wow=$O(rus(wow)) } return english }
USER>w ##class(Example.ObjectScript).RussianToEnglish("Я вас любил: любовь еще, быть может, В душе моей угасла не совсем;"))
Ya vas lyubil: lyubov eshche, byt mozhet, V dushe moey ugasla ne sovsem;
USER>
Here's my new one-liner. Now 6 times faster.
ClassMethod convertRussionToEnglish4(russian = "привет") As %String [ CodeMode = expression ] { $tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Ш","Sh"),"O","UnicodeBig"),$c(0)) }
Here's tests:
do ##class(Test.Cyr).Time() Method: convertRussionToEnglish1, time: .009022 <- original Method: convertRussionToEnglish2, time: .000689 <- my first idea Method: convertRussionToEnglish3, time: .000417 <- Evgeny Method: convertRussionToEnglish4, time: .000072 <- this version Method: convertRussionToEnglish5, time: .000124 <- Jon
Eduard,
You have just forgotten about "Щ" in your awesome one-liner, while the $replacing of "Ш" is excessive. So, it should look like that:
You're right, I need to change $replace with Ш to $replace with Щ. Ш is replaced in $translate anyway
ClassMethod convertRussionToEnglish4(russian = "привет") As %String [ CodeMode = expression ] { $tr($zcvt($replace($replace($tr(russian, "абвгдезийклмнопрстуфхыэАБВГДЕЗИЙКЛМНОПРСТУФХЫЭЖЦЧШЮЯжцчшюяьЬъЪ", "abvgdeziyklmnoprstufhyeABVGDEZIYKLMNOPRSTUFHYE婨味䍨卨奵奡穨瑳捨獨祵祡"),"щ","shch"),"Щ","Shch"),"O","UnicodeBig"),$c(0)) }
Actual rules used for names and surnames transliteration are more complex as they can be phonetically dependent. E.g. "Егор" -> "Egor", but "Иеремия" -> "Iyeremiya".
Right.