Speech and signal processing
Why dysarthric speech classification is mostly a data problem
Dysarthric speech classification sounds straightforward at first. In practice, the difficult part is usually the data. The speech is variable, the labels are limited, and augmentation choices can help or hurt depending on how well they preserve the structure of the original signal.
This was the research thread that made me much more careful about low-resource ML. In this setting, dataset structure, augmentation quality, and speaker variability can matter more than swapping one model for another.
Why this is hard
Dysarthria affects articulation in ways that are clinically meaningful but not always easy to separate with small datasets. A classifier can overfit to speaker-specific quirks or recording conditions long before it learns something useful about severity.
Why augmentation matters so much
In low-resource settings, augmentation is not just a way to make the dataset larger. It changes the shape of the problem. Some transformations preserve the parts of the signal that matter; others make the data look richer while quietly distorting the very patterns the model needs to learn.
What this changed for me
The useful lesson from this work was that speech models can look better for the wrong reason if the data pipeline is not being treated as part of the modeling decision. That made me think much more seriously about how to choose augmentations, how to interpret low-resource gains, and how to separate signal from artifacts that come from the dataset itself.
Transferable principle
In small speech datasets, the first question is often not “which model is stronger?” It is “what does the data actually let me claim?”