Welcome to my professional website.

About myself and this website

I have a Ph.D. in Computational Linguistics, obtained on June 24th, 2013 at the University of Groningen under Prof. Gertjan van Noord, who acted as co-promotor with Prof. Dr. ir. John Nerbonne. My thesis work (download here) centres around the use of rule-based and statistical methods to improve syntactic tree-to-tree alignment, in the context of its eventual application to syntax-based machine translation.

Currently, I am a research fellow at the “Dissident Networks Project” (DISSINET), hosted at Masaryk University’s Centre for the Digital Research of Religion in Brno, Czechia. Here, I work on the representation and data storage of a corpus of records related to the Medieval Inquisition, while applying text mining and other natural language processing techniques for the extraction of useful information for visualization and research.

Previously, I have also worked on web app development, trained and applied machine translation models, built and trained pipelines for text digitisation, designed, built and extended/filtered/aligned corpora, parallel corpora, and (parallel) treebanks, and designed, built, or contributed to various databases (lexical, terminology, place names, etc.). I also have experience with various aspects of language practice such as lexicography, translation, editing, and writing. Here you will find more information about me and my work, which includes publications as well as downloadable data such as code and annotated corpora.

Post PhD career

In February of 2016, I was appointed as researcher in computational linguistics after my postdoctoral fellowship at the University of South Africa (Unisa) in Pretoria, South Africa, as part of the core research group at the Academy for African Languages and Science (AALS), under Prof Laurette Pretorius (now retired). AALS was a strategic project within the School of Interdisciplinary Research and Graduate Studies (SIRGS) under the College of Graduate Studies (CGS) on the Muckleneuk Campus of Unisa in Pretoria.

My work at Unisa included multilingual corpus design and implementation in TEI XML, digitisation solutions, statistical and neural machine translation, the collecting, collating and cleaning of (parallel) corpora, application of various natural language processing tools, as well as assisting and presenting at workshops where we helped participants to develop content for South African languages in Wikipedia. I also supervised postgraduate students.

In 2020, I have developed a software environment that performs various checks, queries and updates on an offline SQL version of the African Wordnet database at the Department of African Languages, Unisa.

Until recently, I have worked on the back end of a place name app for the Department of South African Sign Language and Deaf Studies at the University of the Free State, South Africa. This will contain information on places in both English and South African Sign Language. It is planned to be distributed on Google Play before the end of 2022.

I was also appointed as research fellow at the University of the Free State where I am performing research that relates to the mobile app project.

Concurrently and until recently, I also worked on a terminology web app for the Department of Geography at the University of South Africa. This is an adaptation of the Terminator software and hosts various glossaries in English, isiZulu, Sesotho sa Leboa, and Afrikaans. I will post the URL here when it is public, and also link the Github repository once it is ready.

Finally, feel free to look around by navigating the links (for example, here is my résumé). If there are any broken links, I would appreciate a message. You can contact me at:

E-mail: kotzegj [at] ufs [dot] ac [dot] za

Note: Given the urgency of the Covid-19 pandemic, .za Domain name holders are requested to link here: www.sacoronavirus.co.za