The project Ob-Ugric Database: analysed text corpora and dictionaries for less described Ob-Ugric dialects is the proceeding project of OUL. It has been launched on July 1st 2014 and runs until June 30th 2017 and is funded by DFG (German Science Fund) and FWF (Austrian Science Fund).
The objective of this project is to continue and expand the work of OUL by systematizing, digitalizing, analysing and providing online data on two more Ob-Ugric dialects: Western Mansi (in the Pelym, Northern Vagilsk, Middle Lozva and Lower Lozva varieties) as well as Yugan Khanty. While the Western Mansi dialect group is already extinct and only text materials from the 19th century is available, Yugan Khanty is endangered, yet not described as a separate subdialect of (Eastern) Surgut Khanty. In addition to recent fieldwork materials, the Yugan Khanty team also deals with texts from Heikki Paasonen’s collection from the beginning of the 20th century.
Within the three project years, two new database modules for the two less described dialects will be created and implemented in the already established Virtual Research Environment. According to the OUL parameters, description and analysis will include (a) a phonological analysis in IPA form (instead of traditional idiosyncratic transcription systems), (b) a definition of morphological categories and their allomorphy in the given dialect . Results will be (c) grammatical descriptions (d) grammatically analysed and (e) into broadly used metalanguages (English, partly German) translated texts; some of them are also provided with (f) an annotation of functional, semantic and pragmatic roles.
The fieldwork on the Yugan rivers added visual material to our fieldwork archive.
The systematization of phonological and morphological categories of the dialects in question will extend and adapt description and result in a more exact account of grammatical categories of each Ob-Ugric dialect; the system of glossing principles and symbols (abbreviations) will be revized and extended. Also the number of dialectal dictionaries/concordances on the basis of the text corpus will be expanded, taking into consideration all available lexicographic sources and interviews with informants.
The user interface of the existing database was provided with many more options for filtering the lexicon data and for starting concordance queries in the glossed text data. The value of the database has been increased enormously with the possibility of filtering for parts of speech, morpheme types, complex form types, allomorphs, dialectal variants and writing variants. Such a filtering system is the basis for statistical returns on the corpus, a precondition for corpus linguistics on Ob-Ugric languages.
For Yugan Khanty, the only spoken dialect among those dealt with in this project, additional texts were recorded during fieldwork. With the agreement of the speakers as a precondition, these audio enrich the Yugan Khanty database by phonetic tagging with the program ELAN. In this way the phonetic characteristics of Yugan Khanty are described more fully and morphonologic rules will be determined. Contrastive analysis of Yugan Khanty and Surgut Khanty proper will show whether the classification of the former as a separate subdialect is justified.
Additionally, the functional and pragmatic analysis will provide all-round linguistic information on the principles of text construction. The principles and set of categories for the syntactic, semantic and information structure (IS) analysis were developed in the OUL project and tested on a few selected texts from the corpus. The desiderata in this respect is to implement a technical realization of this differentiated annotation system in our corpus. The result is a (semi) automatic annotation of syntactic, semantic and pragmatic roles as well as a tagging of referents. This way the database can provide information on finer points of functioning of morphological categories in the text corpus.