This is not currently part of the peer-reviewed material of the project. Do not cite as a research publication.
These pages are designed to assist in the lemmatising of Menota texts, including linking lemmas to ONPs wordlist.
The features and processes outlined here are only available to logged-in users with appropriate permissions.
The process involves importing the Menotic TEI/XML file and then using the database interface to assist in the lemmatising process.
Stage 1: importing XML file and initial modifications
This stage involves selecting a manuscript and opening a form which manages the automated processes. The user uploads a Menotic TEI file and inputs a text to which it is linked.
This ensures that there is a link between the text and the manuscript, and all <w> elements in the XML file can be uniquely identified when it comes to the database processing them.
No further changes to the XML file are made, but lemmatising information is added to the exported XML at the point of use in the final stage.
Stage 2: importing the words into the database
This stage is fully automated.
Stage 3: auto-lemmatising
This stage is fully automated.
Initial trials suggest this process matches around 80% of words, with a very high level of accuracy for these matches.
Stage 4: manual lemmatising
Manual lemmatising is done through a separate form which is linked from the processing form.
The lemmatising interface from the Skaldic Project has been adapted for this process. It is an assisted lemmatiser which remembers the wordforms linked to a particular headword and prompts the user with options based on the previous matches.
The form lists up to 100 words in the text, in order. For longer texts, the already-lemmatised words are not shown.
The first column contains information about the word: the word number, the lookup form (lemma if it is in the xml file, otherwise the normalised form) the facsimile and normalised forms. Clicking on the box will give a popup with information about the word, including the surrounding text at the three different levels, plus any morphosyntactic analysis available.
The second column gives a list of headwords previously linked to either this lemma form or word form, in order of frequency (based on ONP’s citation count). There are sometimes some odd results in this list because of previous errors, but these can be edited in the third column.
The third column provides a lookup facility for finding headwords that are not found by the semi-automated process. Type in letters and headwords starting with those letters, with some normalisation, are shown in a drop-down list. Some headwords may be available from the recent ONP list that have not yet been imported into the database word list; these appear with an arrow, and link to a form where the word can be imported (new window). The user then modifies the search term to refresh the list and the word will appear in the top part of the list for linking.
The third column also provides popups for further information and editing of the lemma: deleting the link, reverting to the previous link, information about other words linked to the lemma, editing lemma, listing other instances of the same wordform linked to the lemma, and looking up the lemma in imported dictionaries. The edit lemma button can be used to create new lemmas, but this should be done with caution: apart from some rarer proper nouns, all Old Norse prose words should already be available in the ONP wordlist.
When the update button is pressed, the database saves the lemma links with the words and updates the index of wordform-lemma links with the (potentially) new word forms. New words are then presented in the form for further lemmatising.
Stage 5: exporting the TEI/XML with links
In the final stage, the processing form processes the XML file using regular expression parsing. For each matching xml:id value, it inserts a me:ref attribute with URIs representing the word in the database and the linked lemma. If there is no lemma attribute, the ONP lemma is inserted as this attribute, and if there is no me:msa attribute, relevant information from the wordlist is inserted, namely, word class and, in the case of nouns, gender.
The resulting XML file, which should be valid Menotic XML, is presented in a text box and can be copied and pasted into a plain text file.
The resulting XML file is not saved to the database, as the additional information derives from the database. Rather, the XML is processed at the point of the user’s request and presented in the web page.
Other features
The imported texts are shown on the main page and can be viewed as the whole text at three levels with most formatting removed. A concordance of the linked lemmas is shown with all word forms in the text. Clicking on one of the words shows all the Menota words linked to the headword.
Using the unofficial ONP interface, you can find headwords and see which Menota words (as well as other corpora) are linked to it.
To be developed