[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxuum::document_ft

Title:DOCUMENT T1.0
Notice:**New notesfile (DOCUMENT.NOTE) now available (see note 897)**
Moderator:CLOSET::ADLER
Created:Mon Feb 09 1987
Last Modified:Thu Oct 31 1991
Last Successful Update:Fri Jun 06 1997
Number of topics:897
Total number of notes:4397

372.0. "How to extract formatting tags for translation" by MUNSBE::RUEDEL () Wed May 13 1987 16:36

    My question is relevant for on-line translation:
    
    Does anyone know a procedure to extract all tags from a DOCUMENT
    source file and put them into another file?
    
    The 'empty' format file consisting of tags only wil then be used
    as a starting point for a translator to type in the translation.
    The idea is to preserve the format of a text while translating it.
    
    The translator will use two windows on his screen:
    
    - one to READ the source text (e.g. in English)
    - the other with the extracted tags to type in his translation  
      in the target language (e.g. French)
T.RTitleUserPersonal
Name
DateLines
372.1AUTHOR::WELLCOMESteveFri May 15 1987 13:381
    TECO could probably do it; do you know any TECO wizards?
372.2MUNSBE::RUEDELFri May 15 1987 17:312
    No, but I would have thought that there is a nice TPU procedure
    around somwhere. So, if someone knows of something like that...
372.3PossibilityBUNSUP::LITTLETodd Little NJCD SWS 323-4475Fri May 15 1987 18:376
In the note on DECspell dictionaries in this conference I posted a trivial
SCAN program that will locate all tags in a file and extract just the
tags and place them in another file.  It loses all the arguments, etc. so
I'm not sure how helpful it would be for your translation needs.

-tl
372.4translation toolsVAXUUM::KOHLBRENNERMon May 18 1987 13:2428
    Extracting tags from a file is a non-trivial task.
    
    Tags have arguments and the arguments can contain other tags.
    Sometimes the arguments contain text that would need to be
    translated, other times the arguments contain keywords that
    probably should not be translated.
    
    The <comment> tag and the <literal> tags come in two formats,
    and they may or may not contain other tags in the text that
    they contain.  Is this text, with or without its tags, intended
    for translation?
    
    Separating the tags from the text seems to me only half the problem,
    if there is an intention to later merge the tags back into the 
    translated text.  How will the merge be accomplished?  What are
    the "markers" that tell how to put the two pieces back together?
    
    Won't some tags disappear as part of the translation?  The writer
    of English may want to add emphasis to a word or phrase, using the
    <emphasis> tag.  The translator may find it easy to convey the 
    emphasis in the target language without bolding or italics.
    
    Computerized aids for doing the translation sounds like a worthwhile
    project, but it seems that it will require more than a simple TPU
    procedure to be very useful...
    
    bill
    
372.5separation of syntax and semantics (form and functions?)ATLAST::BOUKNIGHTEverything has an outlineMon May 18 1987 14:268
    What it should really have access to is the exact same parser used
    in GUTENTAG.  With the work to be done in adopting SGML, maybe the
    DOCUMENT folks could consider breaking the front end up into separate
    syntax and sematics handling code, making the syntax handing code
    available to other users such as translation aids, a "pretty" formatting
    program for SGML, etc.
    
    jack