A list of downloads.
My PhD thesis can be downloaded here. The title is "Complementary approaches to tree alignment: Combining statistical and rule-based methods". [Short English abstract] [Short Dutch abstract] [Long Dutch abstract]
You may download my Curriculum Vitae, my Master's Thesis and a list of all my publications in BibTeX format.
Data
Automatic English-Zulu sentence and word aligned parallel corpus (2321 sentence pairs)
Links were broken - to be updated soon. For reference, see the following paper:
Kotzé, G and Wolff, F. 2014. Experiments with syllable-based English-Zulu alignment. Proceedings of the SaLTMiL Workshop on free/open-source language resources for the machine translation of less-resourced languages, at LREC 2014, May 2014, Reykjavík, Iceland. [BibTeX]
@InProceedings{KotzeWolff:2014,
author = {Kotz\'{e}, Gideon and Wolff, Friedel},
title = {Experiments with syllable-based {English-Zulu} alignment},
journal = {Proceedings of the SaLTMiL Workshop on free/open-source language resources for the machine translation of less-resourced languages (at LREC 2014)},
year = {2014},
pages = {7--11},
address = {Reykjav\'{i}k, Iceland},
isbn = {978-2-9517408-8-4}
}
Dutch/English and Dutch/French phrase-structure parse trees (448 sentence pairs) from the PaCo-MT project
The tree alignment data sets used in the PaCo-MT project (2008-2011) that were used to train the statistical tree aligner Lingua::Align are available for download. The languages involved are Dutch, English and French. Constituent alignments were manually created by myself. In the case of the Dutch-to-English and the English-to-Dutch sets, word alignments were also corrected. Please refer to the included README files for further information.
Here are the alignment sets:
- Dutch to English, with corrected word alignments .zip .tgz (140 sentence pairs)
- English to Dutch, with corrected word alignments .zip .tgz (150 sentence pairs)
- Dutch to French, with uncorrected word alignments .zip .tgz (158 sentence pairs)
Software
The transformation-based tree alignment system that I have worked on for my doctoral thesis, TBLign, is available for download. This also includes all alignment data sets that have been used in the experiments (Dutch-to-English only). From time to time I will do bug fixes and update the documentation, but my intention is to discontinue Perl development in favour of Python 3 reimplementation at some point. Download here: .zip, tarball, at Github or Bitbucket.
I expect to have an update in the following months on the Github repository hosting the code for the adaptation of the Terminator software that I am using for the terminology web application at Unisa. Watch this space.