Article
Details
Citation
Bölücü N & Can B (2025) Building a Turkish UCCA dataset. Can Buglalilar B (Supervisor) Natural Language Processing, 31 (1), pp. 111-149. https://doi.org/10.1017/nlp.2024.36
Abstract
it to a logical form that can be processed and understood by machines. It is utilised by many applications
in natural language processing (NLP), particularly in tasks relevant to natural language understanding(NLU). Due to the widespread use of semantic parsing in NLP, many semantic representation schemes
with different forms have been proposed; Universal Conceptual Cognitive Annotation (UCCA) is one of them. UCCA is a cross-lingual semantic annotation framework that allows easy annotation without
requiring substantial linguistic knowledge. UCCA-annotated datasets have been released so far for English, French, German, Russian, and Hebrew. In this paper, we present a UCCA-annotated Turkish dataset of 400 sentences that are obtained from the METU-Sabanci Turkish Treebank. We provide the UCCA annotation specifications defined for the Turkish language so that it can be extended further. We followed a semiautomatic annotation approach, where an external semantic parser is utilised for the initial annotation of the dataset, which is manually revised by two annotators. We used the same semantic parser model to evaluate the dataset with zero-shot and few-shot learning, demonstrating that even a small sample set from the target language in the training data has a notable impact on the performance of the parser (15.6% and 2.5% gain over zero-shot for labelled and unlabelled results, respectively).
Keywords
Universal Conceptual Cognitive Annotation; UCCA; Semantic representation; METU-Sabanci Turkish Treebank; dataset
Journal
Natural Language Processing: Volume 31, Issue 1
Status | Published |
---|---|
Contributor | Dr Burcu Can Buglalilar |
Funders | University of Stirling |
Publication date | 31/01/2025 |
Publication date online | 31/08/2024 |
Date accepted by journal | 04/05/2024 |
URL | http://hdl.handle.net/1893/37466 |
Publisher | Cambridge University Press (CUP) |
ISSN | 2977-0424 |
eISSN | 2977-0424 |
People (1)
Lecturer in Computing Science, Computing Science