[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference thebay::joyoflex

Title:The Joy of Lex
Notice:A Notes File even your grammar could love
Moderator:THEBAY::SYSTEM
Created:Fri Feb 28 1986
Last Modified:Mon Jun 02 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1192
Total number of notes:42769

1041.0. "Alphabetical ordering: marginal cases" by KETJE::HAENTJENS (Beware of Counterfeit) Mon Apr 19 1993 11:35

T.RTitleUserPersonal
Name
DateLines
1041.1VMSMKT::KENAHThere are no mistakes in Love...Mon Apr 19 1993 12:406
1041.2"1812" par l'duc de la terHorst...TLE::JBISHOPMon Apr 19 1993 12:5214
    It's worse than you think.
    
    I saw a library's ordering rules (in a C.S. article).
    There were rules about sorting books with titles containing
    numbers (spell it out), and books in foreign languages containing
    numbers (spell it out in the foreign language), and titles
    consisting only of numbers, and books in non-Latin scripts,
    and names with prefixes separated by a space (de, von), and 
    names with prefixes not separated by a space (e.b. terHorst)
    and on and on.
    
    If I can remember the article, I'll post a reference.
    
    		-John Bishop
1041.3Was there any life before computers?KETJE::HAENTJENSBeware of CounterfeitTue Apr 20 1993 08:5818
1041.4I'm a rational sortRAGMOP::T_PARMENTERHuman. All too human.Tue Apr 20 1993 10:3221
1041.5SMURF::BINDERDeus tuus tibi sed deus meus mihiTue Apr 20 1993 13:2629
1041.6VMSMKT::KENAHThere are no mistakes in Love...Tue Apr 20 1993 14:2010
1041.7CALS::DESELMSTue Apr 20 1993 15:325
    RE: -1

    Great example...

    - Jim
1041.8Me tooAUSSIE::WHORLOWBushies do it for FREE!Tue Apr 20 1993 19:3210
    G'daym,
     Minor rathole ..
    
    There is  'Sans Souci' in Australia... It's a suburb of Sydney...
    
    derek
    PS where would that fit in the Sanssouci / SANSSOUCI /Sans-Souci...
    scheme?
    
     
1041.9JIT081::DIAMONDPardon me? Or must I be a criminal?Tue Apr 20 1993 21:569
    Re .5
    >>A character range can include a multicharacter collating element
    >>enclosed within bracket-period delimiters ([. and .]).
    [...]
    >>When using Spanish collation rules, [[.ch.]] is treated as an RE
    >>matching the sequence ch, while [ch] is treated as an RE matching
    >>c or h.  In addition, [a-[.ch.]] matches a, b, c, and ch.
    
    How do they do it in a character set that doesn't have [ and ] ?
1041.10Difference between tilde and squiggly thingKETJE::HAENTJENSBeware of CounterfeitWed Apr 21 1993 08:1021
1041.11My rathole or yours?FORTY2::KNOWLESDECspell snot awl ewe kneedWed Apr 21 1993 09:5020
1041.12KETJE::HAENTJENSBeware of CounterfeitWed Apr 21 1993 11:5210
1041.13VMSMKT::KENAHblah blah blah GINGERWed Apr 21 1993 14:4315
    Is this an accurate synopsis?
    
1.  Different European languages have developed ordering rules that are
    internally consistent.
    
2.  You are trying to develop more general ordering rules, rules that
    incorporate different language's rules while maintaining internal
    consistency as well as consistency with each individual language.  
    In addition, it sounds like you're trying to make sense between 
    similar but distinct words and word groupings.
    
3.  Finally, the ordering scheme you develop must be implemented on a
    computer, since computers are valuable tools for tasks like ordering.
    
    Do any of the existing standards (ISO, XPG) deal with this topic?
1041.14VMSMKT::KENAHblah blah blah GINGERWed Apr 21 1993 17:114
    I re-read .0 and see that it states POSIX compiliant systems support
    Multilevel ordering -- which POSIX standard is it a part of?
    
    					andrew
1041.159945-2.2KETJE::HAENTJENSBeware of CounterfeitThu Apr 22 1993 06:349
1041.16VMSMKT::KENAHblah blah blah GINGERThu Apr 22 1993 10:214
    Thanks for the POSIX and XPG references -- I'll think I'll check 'em
    out (I believe one of my colleagues has a copy of XPG4).
    
    					andrew
1041.17NOVA::FISHERDEC Rdb/DinosaurThu Apr 22 1993 10:3133
1041.18Ordering with SanscritKETJE::HAENTJENSBeware of CounterfeitThu Apr 22 1993 12:1117
1041.19%^}VMSMKT::KENAHblah blah blah GINGERThu Apr 22 1993 17:474
    SANSCRIT would probably wind up somewhere else in American English -
    that's because the usual transliteration is SANSKRIT.
    
    					andrew
1041.20let those R's ripRAGMOP::T_PARMENTERHuman. All too human.Tue Apr 27 1993 10:2214
1041.21ARR, Matey!CALS::DESELMSTue Apr 27 1993 10:566
    A "flipped R", is just like a trilled R, except that instead of the tongue
    tapping the roof of your mouth a bunch of times, it only taps the roof of
    the mouth once. It is indeed exactly the same as "dd" in "ladder".
    Pronounce Spanish with an American ARR and they'll laugh in your face.

    - Jim
1041.22NOVA::FISHERDEC Rdb/DinosaurThu Apr 29 1993 11:116
    But rr in Spanish also has no special collation rule [that I have
    seen].
    
    Is rr collated after rz?
    
    ed
1041.23NOTIME::SACKSGerald Sacks ZKO2-3/N30 DTN:381-2085Thu Apr 29 1993 17:463
re .20:

It's an alveolar flap.
1041.24Knuth, of courseTLE::JBISHOPFri Aug 06 1993 15:587
    re .2
    
    See Knuth's _Sorting_and_Searching_ (his volume 3), pp 7..9
    for some library sorting rules, e.g. "Ignore initial articles,
    unless not in nominative case...".
    
    		-John Bishop
1041.25VMSMKT::KENAHFri Aug 06 1993 16:295
    A question came up in another conference -- does Digital support
    Cyrillic alphabets?
    
    I'm embarrassed to ask this, because I don't know whether ISO Latin-1
    includes Cyrillic alphabets. (We *do* support ISO Latin-1, don't we?)
1041.26Nope.SMURF::BINDERSapientia Nulla Sine PecuniaFri Aug 06 1993 16:4117
    Re .25
    
    > I'm embarrassed to ask this, because I don't know whether ISO Latin-1
    > includes Cyrillic alphabets.
    
    It doesn't.
    
    Producing International Products -- Software handbook
    (Identification Number A-MN-ELEN467-00-0 Rev B)
    
    ...says this:
    
    The ISO Latin Alphabet No. 1 has been developed by the International
    Organization for Standards (ISO) as the standard character set for the
    Western European languages.  It will eventually supersede the DEC
    Multinational Character Set.  Further ISO character sets are being
    developed to cover European languages not based on the Latin Alphabet.
1041.27VMSMKT::KENAHFri Aug 06 1993 17:107
    Thanks.
    
    So:  does Digital support Cyrillic alphabets?
    
    Also: Does Digital support ISO Latin-1?
    
    				andrew
1041.28REGENT::BROOMHEADDon't panic -- yet.Fri Aug 06 1993 17:218
    ISO Latin-1 is Digital's default character set -- so, yes, we support
    it.
    
    ISO Latin-Cyrillic (ISO 8859-5 (which is not ISO Latin-5)) is provided
    on a few of our printers (dot matrix ones) and can be added via a
    cartridge on our ANSI laser printers.  So, yes, we support it.
    
    							Ann B.
1041.29VMSMKT::KENAHFri Aug 06 1993 18:279
    Thank you, Ann.  I didn't realize ISO Latin-1 was our default,
    although (based on Dick's description) it's obvious.
    
    How about Cyrillic support at the user-interface level?
    
    					andrew
    
    P.S. I'm tracking this question through another path within Digital;
    should I get an expanded answer, I'll post it here.
1041.30NRSTA2::KALIKOWSupplely ChainedFri Aug 06 1993 18:405
    Hey andrew -- Keep us posted on whether you get the answer thru
    "official" or "other" channels faster than this employee-interest
    notesfile...  It'd be great if we could get you out of the BOX
    faster... :-)
    
1041.31ISTWI1::KINACIWalk thru this worldMon Aug 09 1993 09:4716
    I think Cyrillic is ISO-Latin 2 is it not?
    
    I know there is some Cyrillic support out there and there is more to
    come once the Fonts acquired from Monotype go into distribution. 
    I've been informed that we will have a wide scale test for the various
    fonts.  I will be working on testing ISO-Latin 5 for Turkey, for example.
    
    I know that there is a Cyrillic version of DECterm.  Hold on, I am not 
    sure if we are talking full UI localization or if there is just character 
    set support.  But the latter definitely exists.  I know there was work 
    being done to get EPROMs which support Cyrillic for VT420 type terminals.  
    I believe this has been completed.  I also know that the Cyrillic version 
    of ALL-IN-1 V3.0 should be shipping soon.
    
    Suz
    
1041.32VMSMKT::KENAHMon Aug 09 1993 10:036
    So far, the clear winner is through Employee-Interest conferences;
    Of course the informal channels have given me pointers to more 
    formal channels, so the lines are getting blurred.
    
    Of course without the informal channels, I never would have found
    the formal channels...
1041.33Who can answer Andrew's question?REGENT::BROOMHEADDon't panic -- yet.Mon Aug 09 1993 13:4914
    Suz,
    
    Nope, it's ISO Latin-Cyrillic, with no number in sight.
    
    Andrew,
    
    "How about Cyrillic support at the user-interface level?"
    
    I can't answer that.  All I can tell you is I have the Cyrillic fonts
    from Monotype that Suzan mentioned, but I don't know who is to pay
    to make them into cartridges or soft fonts, or even which fonts (type-
    faces) I should concentrate on.
    
    						Ann B.
1041.34ISTWI1::KINACIWalk thru this worldMon Aug 09 1993 15:2420
    Hi Ann!
    
    Nice to run into you here.
    
    RE the fonts.  You probably know that Israel is going to be running a
    Fonts Q.A. Project in early September, where we will all get to test
    our own fonts.  I suspect that will be when we will get a broader picture 
    of what is out there.
    
    As for who pays... well.. I am told by very reliable sources that
    corporate will pay for the internationalization of products deemed
    necessary by the involved subsidiaries, starting in FY '94.  We've 
    submitted a prioritized list of what we need, and as far as I know 
    the funding discussions should be well under way at this time.  Past 
    experience indicates that it will be the beginning of Calendar year 
    1994 before we see much of anything.
    
    I hear all this will change come FY'95.. Keep your fingers crossed! 
    
    Suz
1041.354GL::LASHERWorking...Tue Aug 10 1993 09:504
    While y'all are looking into this, could you also check to see whether
    DECwindows supports Orthodox icons?
    
Lew Lasher
1041.36Spanish Alphabetical Order SimplifiedREGENT::BROOMHEADDon't panic -- yet.Mon May 02 1994 14:2645
         <<< NOTED::DISK$NOTES7:[NOTES$LIBRARY_7OF4]WORLDWIDE.NOTE;2 >>>
                 -< Worldwide -- International Product Issues >-
================================================================================
Note 525.0              Change in Spanish collating rules             No replies
R2ME2::HINXMAN "It's waiting for it that's so tryin" 39 lines   2-MAY-1994 07:58
--------------------------------------------------------------------------------
Days in dictionary numbered for two in Spanish alphabet
=======================================================
Associated Press (Boston Globe 1994-05-01)

        MADRID - The world's more than 300 million Spanish speakers now have
two fewer letters in their alphabet to worry about, a mostly bookkeeping move
that won almost unanimous support but disturbed some traditionalists.
        The Association of Spanish Language Academies, meeting in Madrid for
its 10th annual congress, voted last week to eliminate the "Ch" an "Ll" from
the Spanish alphabet.
        The two letters, which historically have had their own separate 
headings in dictionaries, now will be listed under other letters. Words 
beginning with "Ch", like "chico", will fall under the letter "C", and words
beginning with "Ll", like "llama", will fall under the letter "L".
        The move does not change pronunciation, usage or spelling. It was 
made mainly to simplify dictionaries and make Spanish more computer-
compatible with English.
        Pushing for the change was Spain, a member of the 12-nation European
Union. The EU has urged its members to implement measures that aid 
translation and computer standardization.
        Cuban delegate Luisa Campuzano said he favored the change "because it
means that dictionaries will be easier to use. But arguments related to the
European Union shouldn't be brought up. Our talks are along scientific lines
and nothing more."
        The vote Wednesday was 17 in favor, one opposed and three abstaining.
Ecuador voted "no" and Panama, Nicaragua and Ecuador abstained.
        "It's not that the letters are disappearing, they're just being put
in a different place in the dicitionary," said a Madrid artist, Maria Gato.
"I don't think most people are upset."
        Guatemala supported the change, but one Guatemalan delegate, Mario
Alberto Carrera, referred to the simplification as "killing" part of the
language.
        "The two letters have succumbed to the dictates of the market and the 
Anglo-Saxon world," Carrera said.
        Some dictionaries, including the highly respected Maria Moliner, had
already made the change.
        The Spanish alphabet now has 27 letters - the 26 contained in the
alphabet plus a stylized "n".

1041.37NOVA::FISHERTay-unned, rey-usted, rey-adyThu May 05 1994 10:469
    aye, the contrariness of it all....
    
    One of th efun parts of "internationalizing Rdb" was to assure that
    "c*" did not MATCH "chxyz" when SPanish was the collating sequence
    in use.
    
    Drat!
    
    ed
1041.38JIT081::DIAMOND$ SET MIDNIGHTMon May 16 1994 05:4710
    Re .36
    
    >    "The two letters have succumbed to the dictates of the market and the
    >Anglo-Saxon world," Carrera said.
    
    Cute opinion.  Has the Library of Congress changed their lexicography
    to consider Mc as Mc instead of as Mac?  If they did or will, they're
    succumbing to the dictates of the market and the Spanish world.
    
    -- Norman Diamond