The Welsh National Corpus Portal is a collection of on-line written Welsh and bilingual corpora in an easily searchable format.
The portal uses a provides an interface which can aid searching for words, terms and phrases, and they are displayed within their context. If the corpus is bilingual it’s possible to search in either language, with the translated text displayed alongside the original search result. Matches are highlighted in bold.
For more information about individual corpora and to search them individually, click on the relevant
Natural Language Processing
The Corpora Portal uses natural language processing components developed by the Language Technologies Unit such as a Welsh language lemmatizer that recognizes all forms for Welsh words, be it mutated, conjugated or any other inflected form. However, it is possible to search for exact strings matches in the corpora.
Other Corpora Resources
The Corpora Portal offers links to other relevant corpora of interest to researchers.
The Welsh National Language Technologies Portal has corpora resources that can be downloaded by developers and researchers, including a Corpus of Welsh Language Tweets and Corpus of Welsh Language Facebook Texts.