(Ce tutoriel est aussi disponible en Français )
To be able to automatically lemmatize your corpora during the import process into TXM, follow this tutorial steps that will allow you to:
While connected to the Internet:
Download the TreeTagger software archive from the TreeTagger web site:
Extract the content (bin, cmd, doc, FILES, LICENSE and README) to a directory named "treetagger" located in your applications directory, depending on your system, in:
|- Windows||C:\Program Files\treetagger|
|- Mac OS X||/Applications/treetagger|
Check: After extraction, the treetagger directory must contain the following files and directories : bin, cmd, doc, FILES and README.
Note: This way of installing TreeTagger is specific to TXM. You really just need to extract the contents of the TreeTagger archive. You don't need to follow any additionnal instructions found in any INSTALL.txt file that could be found in the archive.
Download a TreeTagger language model file (compressed file: '*.gz') for each language in which you may need to tag a text:
Extract the downloaded model(s) archive(s) into the "models" directory.
Under Windows, if you don't know how to extract '*.gz' files, we recommend to use the 7-zip open-source software.
Rename each model file according to the 2-letter ISO 639-1 language code standard.
With Windows and Mac OS X :
The default behavior of these sytems is to hide file extensions they
think they can manage. This may mislead the user when he rename a file
(the name displayed is "fr.par" but the real file name is "fr.par.bin"
In that case, you need to display and check the real file names in your Explorer/Finder:
Check: the 'models' directory must contain some model files like the 'fr.par' file of size about 18 Mo or the 'en.par' file of size about 14.4 Mo.
1. Copy the following text:
Running SearchEngine in memory mode. Statistical Engine launched.connected. Reloading subcorpora and partitions...Done. No update available.
2. In TXM launch the File > Import > Clipboard command
3. Check in the console that the last lines are:
pAttrs : [id, lbid, enpos, enlemma] sAttrs : [text:+id+path+base+project, s:+n, p:+id, txmcorpus:+lang] -- EDITION - Building edition . Import done:3sec (3265 ms) Running SearchEngine in memory mode. Statistical Engine launched.connected. Reloading subcorpora and partitions...Done. TXM is ready.(Note that the first above line should contain enpos and enlemma. But the indication of time after "Import done" can of course be different.)
In case of difficulty you can find further help in the 'FAQ' (fr).
If you can't manage the installation process, please send your enquiries to the TXM users mailing list (firstname.lastname@example.org) after subscription at https://listes.cru.fr/sympa/subscribe/txm-users, or contact the TXM team directly.