CAsAnOMa: A computer-based tool for the assisted annotation of manuscripts

Working with historical manuscripts requires skills in the area of Digital Humanities, for example annotation standards (TEI, XML) in order to construct, query and explore digital corpora for linguistic, historical, societal, literature and diversity studies. Creating and annotating conformant documents is often too demanding for the students themselves. In order to facilitate the task, we are developing a graphical tool which allows the students to perform all the necessary steps in a user-friendly fashion, from OCR corrections to adding annotation and querying the documents.

A prototype is currently being designed, based on Transkribus and in close collaboration with the Transkribus developers. Jean-Philippe Goldman is the main programmer. We are using the genres ‘letters’ and ‘diaries’ as a first application, constructing a corpus of the letters to/by and the diaries of Mary Hamilton (1736-1821). The challenges of hand-written text recognition (single hand) and TEI annotation (letters/diaries) offer a good starting set. In the proposed project we would like to extend to more and more open domains, several hands, different languages, and further applications, to test the versatility and portability of the tool.

In particular, a larger and different subset of TEI will be used, and extensions for researchers with partly different questions and needs will be added. For linguists and historians, e.g. the ability to aggregate and tabulate query results according to the metadata is important. For semantic analyses, we will add keyword detection.

Such a tool will allow linguists to perform a variationist analysis of the Mary Hamilton corpus, a domain in which Marianne Hundt, Eric Haeberli, and Gerold Schneider have ample experience. They will investigate syntactic variation (e.g. do-support, word order, PP complementation) morphosyntax (progressives, past tense), and development of vocabulary, in comparison to corpora of the period, like ARCHER and CLMET. Some of the texts exist in draft versions, which allow us to track the editing process from a cognitive perspective.

Participants

Prof Eric Haeberli, University of Geneva

Dr Gerold Schneider, University of Zurich

Jean-Philippe Goldman, University of Geneva

Prof Marianne Hundt, University of Zurich

Prof Martin Volk, University of Zurich

Prof Gilles Falquet, University of Geneva