SynSym: A Synthetic Data Generation Framework for Psychiatric Symptom Identification
Psychiatric symptom identification from social media requires expensive expert annotation and suffers from inconsistent labeling across platforms. SynSym addresses this by using GPT-4o to generate synthetic training data across four stages: symptom concept expansion, dual-style (clinical/colloquial) expression generation, clinically-grounded multi-symptom composition, and LLM-based quality filtering. The framework produces 18,254 samples covering 14 DSM-5 symptoms, enabling models to match real-data performance and generalize across diverse social media platforms.
SynSym is a methodologically sound contribution to clinical NLP that convincingly demonstrates synthetic data can substitute for expensive expert annotations in multi-label psychiatric symptom detection. The cross-dataset evaluation and ablation studies are rigorous, though the reliance on proprietary GPT-4o and deliberate exclusion of figurative language limit applicability to indirect symptom expressions common on platforms like Twitter.
The dual-style generation strategy (clinical and colloquial) and incorporation of clinical co-occurrence patterns are well-motivated design choices that address real limitations of LLMs avoiding sensitive terminology. Table 4 shows models trained solely on SynSym data achieve comparable Macro-F1 to MentalBERT trained on real data (e.g., 0.778 vs 0.811 on PsySym), with further gains when combined with real data. Expert validation by two psychiatrists yielded high scores (4.61/5 for sub-concepts, 4.99/5 for expressions with >94% inter-rater agreement), supporting clinical validity.
The framework deliberately excludes metaphorical and figurative expressions—which constitute significant portions of datasets like D2S—to preserve label reliability, limiting deployment on platforms where users express symptoms indirectly. While the paper claims novelty as the 'first attempt to apply synthetic data to symptom prediction,' prior work (Ghanadian et al., 2024; Vedanta and Rao, 2024) used LLMs for synthetic mental health data; the distinction rests on multi-label granularity, which should be emphasized more clearly. Validation relied on only two psychiatrists reviewing 300 expressions, raising questions about scalability for larger corpora. Additionally, the evaluation relies on benchmark datasets with known reliability issues: Milintsevich et al. found remarkably low agreement (κ=0.09) between PRIMATE's crowd-sourced labels and professional re-annotations.
Comparisons to BERT, DeBERTa, MentalBERT, and GPT-4o prompting baselines are fair and consistently reported with confidence intervals across 5-fold cross-validation. The cross-dataset generalization experiment (Table 5) is particularly compelling: SynSym-trained models outperform multi-source training and achieve strong zero-shot transfer across PsySym, PRIMATE, and D2S, supporting claims of style invariance. The ablation study (Table 6) confirms that removing clinical co-occurrence knowledge (CK), dual-style generation (DU), or symptom expansion (SE) degrades performance. However, the work could benefit from comparison with other LLM-based augmentation techniques beyond back-translation.
The authors commit to releasing code and synthetic datasets, and report detailed hyperparameters (AdamW lr 5e-5/3e-5, batch size 32/64, max length 512) and prompt templates in Appendix A. However, full reproduction is hindered by dependence on GPT-4o, which is proprietary and subject to versioning drift; the paper uses temperature 0.0 for deterministic expansion but 0.8 for generation. No API version date is specified, and generating 18,254 samples requires substantial compute credits. The synthetic data evaluation relies on benchmark datasets with inconsistent annotation schemes, requiring complex remapping (Appendix C.1) that introduces additional variability.
Psychiatric symptom identification on social media aims to infer fine-grained mental health symptoms from user-generated posts, allowing a detailed understanding of users' mental states. However, the construction of large-scale symptom-level datasets remains challenging due to the resource-intensive nature of expert labeling and the lack of standardized annotation guidelines, which in turn limits the generalizability of models to identify diverse symptom expressions from user-generated text. To address these issues, we propose SynSym, a synthetic data generation framework for constructing generalizable datasets for symptom identification. Leveraging large language models (LLMs), SynSym constructs high-quality training samples by (1) expanding each symptom into sub-concepts to enhance the diversity of generated expressions, (2) producing synthetic expressions that reflect psychiatric symptoms in diverse linguistic styles, and (3) composing realistic multi-symptom expressions, informed by clinical co-occurrence patterns. We validate SynSym on three benchmark datasets covering different styles of depressive symptom expression. Experimental results demonstrate that models trained solely on the synthetic data generated by SynSym perform comparably to those trained on real data, and benefit further from additional fine-tuning with real data. These findings underscore the potential of synthetic data as an alternative resource to real-world annotations in psychiatric symptom modeling, and SynSym serves as a practical framework for generating clinically relevant and realistic symptom expressions.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.