SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding

cs.CL Haroun Elleuch, Salima Mdhaffar, Yannick Est\`eve, Fethi Bougares · Mar 23, 2026
Local to this browser
What it does
SLURP-TN introduces a Spoken Language Understanding (SLU) dataset for Tunisian Arabic, a low-resource dialect. The authors translate and record six domains from the English SLURP corpus with 55 speakers across 18 geographic regions,...
Why it matters
The authors translate and record six domains from the English SLURP corpus with 55 speakers across 18 geographic regions, emphasizing gender balance and code-switching phenomena. The dataset provides approximately five hours of audio...
Main concern
The paper presents a solid methodological contribution for Tunisian Arabic SLU, with careful attention to speaker diversity and acoustic variability. However, the dataset covers only six of eighteen original SLURP domains and relies on a...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

SLURP-TN introduces a Spoken Language Understanding (SLU) dataset for Tunisian Arabic, a low-resource dialect. The authors translate and record six domains from the English SLURP corpus with 55 speakers across 18 geographic regions, emphasizing gender balance and code-switching phenomena. The dataset provides approximately five hours of audio across three acoustic conditions (clean, noisy, headphone) to enable robust benchmarking of ASR and SLU systems for dialectal Arabic.

Critical review
Verdict
Bottom line

The paper presents a solid methodological contribution for Tunisian Arabic SLU, with careful attention to speaker diversity and acoustic variability. However, the dataset covers only six of eighteen original SLURP domains and relies on a single annotator for semantic labels, raising concerns about annotation consistency and limited domain generalization. The benchmarking results convincingly demonstrate that code-switching and dialectal variation remain challenging for current SoTA models, with w2v-BERT 2.0 achieving 33.6% WER and SENSE reaching 59.0% CVER on the test set.

“We limited our coverage to six representative domains from the original 18 in SLURP”
paper · Section 3
“This was realized by a single professional bilingual annotator, a native speaker of the Tunisian dialect”
paper · Section 3.4
What holds up

The dataset design exhibits thoughtful demographic balancing: speakers represent eighteen distinct Tunisian regions with nearly equal gender distribution (50.4% male, 49.6% female in training). The inclusion of code-switching—present in 54% of utterances—reflects authentic Tunisian speech patterns, and the three acoustic conditions (clean, noisy, headphone) enable robustness testing. The public release via HuggingFace and detailed metadata (age, gender, region) facilitate reproducible research.

“spread across several Tunisian regions from north to south... In total, we recorded speakers from 18 different regions”
paper · Section 4
“CS exists in almost 54% of the recorded training sentences”
paper · Section 4.3
Main concerns

The scope is limited: only six of eighteen SLURP domains are covered, representing roughly one-third of the original corpus, which constrains generalization to unseen intents. More critically, SLU annotation was performed by a single professional annotator without reported inter-annotator agreement (IAA) metrics, making it impossible to assess label reliability or consistency. The training set is modest (2 hours 46 minutes, 2677 segments), potentially insufficient for training robust neural SLU systems from scratch without heavy reliance on pre-trained models.

“During this step, the annotator was provided with the annotated source utterances... They were asked to transfer the original English annotation to the correct position in the target sentences”
paper · Section 3.4
“it still represents only about one-third of the original SLURP corpus in terms of overall scenario coverage”
paper · Section 8
Evidence and comparison

The experimental evidence supports the claim that semantic enrichment trades off against transcription accuracy: SENSE outperforms w2v-BERT 2.0 on semantic metrics (CoER/CVER) but lags on WER/CER, consistent with findings from the authors' prior work. However, the paper omits direct experimental comparison with TARIC-SLU—the only existing Tunisian SLU corpus—despite citing it as related work and highlighting its single-domain limitation. Comparisons to SpeechMASSIVE's Arabic subset (only 115 training utterances) correctly highlight SLURP-TN's relative scale, though the absolute size remains small compared to English SLU resources.

“the SENSE model consistently outperforms w2v-BERT 2.0 on SLU-oriented metrics (CoER and CVER). However, this trend reverses when considering transcription fidelity metrics (CER and WER)”
paper · Section 6.2
“SpeechMASSIVE (Ar)... #Utt. (train) 115”
paper · Table 1
Reproducibility

The experimental setup is generally reproducible: authors use the open-source SpeechBrain toolkit, provide hyperparameters (Adam with $10^{-5}$ lr for encoders, Adadelta with 1.0 lr for downstream layers, batch size 16, 250 epochs), and release both dataset and models on HuggingFace. However, critical details are missing: random seeds are not specified, exact training duration is not reported, and hardware specifications beyond GPU type (H100/A100) are absent. The single-annotator setup also means replication of the semantic annotation phase is impossible without the original annotator's guidelines or IAA metrics.

“All our training pipelines were implemented using the SpeechBrain toolkit”
paper · Section 5
“All models are trained for 250 epochs with a batch size of 16 and a gradient accumulation factor of 2”
paper · Section 5
Abstract

Spoken Language Understanding (SLU) aims to extract the semantic information from the speech utterance of user queries. It is a core component in a task-oriented dialogue system. With the spectacular progress of deep neural network models and the evolution of pre-trained language models, SLU has obtained significant breakthroughs. However, only a few high-resource languages have taken advantage of this progress due to the absence of SLU resources. In this paper, we seek to mitigate this obstacle by introducing SLURP-TN. This dataset was created by recording 55 native speakers uttering sentences in Tunisian dialect, manually translated from six SLURP domains. The result is an SLU Tunisian dialect dataset that comprises 4165 sentences recorded into around 5 hours of acoustic material. We also develop a number of Automatic Speech Recognition and SLU models exploiting SLUTP-TN. The Dataset and baseline models are available at: https://huggingface.co/datasets/Elyadata/SLURP-TN.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.