Some of our data collections (Polish speech corpora, multimodal corpora, software solutions, etc.):

1) PoInt (2002) – most probably the first digitally recorded corpus of (semi) spontaneous Polish that includes three types of monologues and three types of task-oriented dialogues. There are also read texts. Ten sessions of transcribed map tasks included! Freely available!

2) Pol’n’Asia (2009) – a (quasi-parallel) corpus of map task dialogues in Korean, Thai, Vietnamese and Polish (Polish material comes from PoInt) – ten dialogues for each of the languages.

3) DiaGest2 (2011) – most probably the first audiovisual corpus of Polish task-oriented dialogues with extremely rich annotation multi-tier annotation. Ten “origami” dialogues in mutual visibility condition and ten in the limited visibility condition. There are some limitations on sharing but please contact us for details if you are interested.

4) Mike (2013) – a unique corpus of studio recordings in various miking conditions and configurations

5) AnnotationPro (2013 and being developed by Katarzyna Klessa et al.) – free software for speech transcription and annotation; it has a plug-in socket for analytic tools as well as some built-in options for the analysis of annotation.

Comments are closed.