(Ce tutoriel est aussi disponible en Fran├žais )

TreeTagger installation tutorial for TXM

Introduction

To be able to automatically lemmatize your corpora during the import process into TXM, follow this tutorial steps that will allow you to:

  1. Download the TreeTagger software and some language specific model files
  2. Tell the TXM platform where TreeTagger and its model files are installed on your machine

Tutorial

    A. Using your Internet browser

    While connected to the Internet:

  1. Download the TreeTagger software archive from the TreeTagger web site:

  2. Extract the content (bin, cmd, doc, FILES, LICENSE and README) to a directory named "treetagger" located in your applications directory, depending on your system, in:

    - WindowsC:\Program Files\treetagger
    - Mac OS X/Applications/treetagger
    - Linux/usr/lib/treetagger

    Check: After extraction, the treetagger directory must contain the following files and directories : bin, cmd, doc, FILES and README.

    Note: This way of installing TreeTagger is specific to TXM. You really just need to extract the contents of the TreeTagger archive. You don't need to follow any additionnal instructions found in any INSTALL.txt file that could be found in the archive.

  3. Create a "models" sub directory in the "treetagger" directory you've just created. It will contain all the language specific model files.
  4. Download a TreeTagger language model file (compressed file: '*.gz') for each language in which you may need to tag a text:

  5. Extract the downloaded model(s) archive(s) into the "models" directory.

    Under Windows, if you don't know how to extract '*.gz' files, we recommend to use the 7-zip open-source software.

  6. Rename each model file according to the 2-letter ISO 639-1 language code standard.
    For instance:

    With Windows and Mac OS X : The default behavior of these sytems is to hide file extensions they think they can manage. This may mislead the user when he rename a file (the name displayed is "fr.par" but the real file name is "fr.par.bin"
    In that case, you need to display and check the real file names in your Explorer/Finder:

    • Under Windows :
      1. Follow the official tutorial: Show or hide file name extensions
      2. You can now choose the appropriate file name.
    • Under Mac OS X :
      1. Double click on the file icon (Ctrl-click mouse or double-finger tap in the trackpad)
      2. Select the 'Get Info' menu entry
      3. Edit the 'Name and Extension' field : delete the '.bin' extension.
      4. Close the "Info" window.

    Check: the 'models' directory must contain some model files like the 'fr.par' file of size about 18 Mo or the 'en.par' file of size about 14.4 Mo.

  7. B. In TXM

  8. Go to the TreeTagger preferences page (see next figure):
    1. Select the 'Tools/Parameters' main Menu entry
    2. Go to the 'TXM / Advanced / NLP / TreeTagger' page
    3. For the 'TreeTagger install dir' field put the 'treetagger' directory path
    4. For the 'TreeTagger models directory' field put the 'models' directory path
    5. Finish with the 'OK' button to save the preferences
  9. Figure 1 : TreeTagger preferences in TXM
  10. Check the installation:

    1. Copy the following text:

    Running SearchEngine in memory mode.
    Statistical Engine launched.connected.
    Reloading subcorpora and partitions...Done.
    No update available.
    

    2. In TXM launch the File > Import > Clipboard command

    3. Check in the console that the last lines are:

    pAttrs : [id, lbid, enpos, enlemma]
    sAttrs : [text:+id+path+base+project, s:+n, p:+id, txmcorpus:+lang]
    -- EDITION - Building edition
    .
    Import done:3sec (3265 ms)
    Running SearchEngine in memory mode.
    Statistical Engine launched.connected.
    Reloading subcorpora and partitions...Done.
    TXM is ready.
    
    (Note that the first above line should contain enpos and enlemma. But the indication of time after "Import done" can of course be different.)

In case of difficulty you can find further help in the 'FAQ' (fr).

If you can't manage the installation process, please send your enquiries to the TXM users mailing list (txm-users@cru.fr) after subscription at https://listes.cru.fr/sympa/subscribe/txm-users, or contact the TXM team directly.




Note: (*) TreeTagger licence prohibits the delivery of TreeTagger embedded in a commercial software. As TXM licence doesn't prevent anyone to do business with TXM, we can not include TreeTagger in the TXM distribution. See TreeTagger web site