AUTOMATIC SPEECH RECOGNITION OF SPANISH: SOCIOLINGUISTIC FACTORS OF ACCURACY AND ERROR TYPOLOGY
DOI:
https://doi.org/10.17721/2663-6530.2025.48.11Keywords:
automatic speech recognition, Spanish, ASR, WER, CER, accent, sociolinguisticsAbstract
The article investigates the effectiveness of Automatic Speech Recognition (ASR) for Spanish based on a corpus of 304 audio recordings of speakers of different ages, genders, and accents. The aim of the study is to evaluate the accuracy of Google Speech-to-Text, identify common errors, and determine the impact of sociolinguistic factors on transcription quality. The analysis employs WER and CER metrics, as well as the number of substitutions, deletions, and insertions. The results revealed an average accuracy of 94.7 %, with substitutions being the predominant error type. The highest accuracy was achieved for speakers with a northern peninsular accent, while the lowest was observed in teenagers and speakers of the Argentinian variety of Spanish. The practical value of this study lies in the possibility of improving ASR models by taking into account dialectal and social characteristics of speakers.
References
Dudchenko, I. V. (2020). Holosove upravlinnia komputerom na osnovi hlosariiu za dopomohoiu alhorytmiv rozpiznavannia movy [Diploma project, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”]. https://ela.kpi.ua/server/api/core/bitstreams/781f9949-a7c4-4033-bcd6-403b6449e866/content
Nakhood, O. (2025). Avtomatychne rozpiznavannia ukrains’koho movlennia na osnovi hlybokoho navchannia. https://doi.org/10.36074/logos-24.01.2025.043
Samvelian, A. R. (2021). Rozrobka systemy avtomatychnoho rozpiznavannia ukrains’koho movlennia [Diploma thesis, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”]. https://ela.kpi.ua/server/api/core/bitstreams/af954b61-6f2e-47b7-963b-aac8648b500f/content
Vintsiuk, T. K., Sazhok, M. M., Seliukh, R. A., Fedorin, D. Ya., Iukhymenko, O. A., & Robeiko, V. V. (2018). Avtomatychne rozpiznavannia, rozuminnia ta syntez movlennievykh syhnaliv v Ukraini. Upravliuiuchi systemy i mashyny, (6), 7–24. https://nasplib.isofts.kiev.ua/handle/123456789/161562
Ardila, R., Branson, M., Davis, K., Kohler, M., Meyer, J., Henretty, M., Morais, R., Saunders, L., Tyers, F., & Weber, G. (2019). Common Voice: A massively-multilingual speech corpus. https://arxiv.org/abs/1912.06670
Gómez Seibane, S., San Martín, M., Herras, J., & Mata, G. (2024). Is ASR a suitable tool for creating spoken linguistic corpora in European Spanish? Procesamiento del Lenguaje Natural, 73, 165–176. https://corpusrural.es/publicaciones/2024/GomezSeibane-et-AL-SEPLN-2024.pdf
Jurafsky, D., & Martin, J. H. (2018). Speech and language processing. Stanford University. https://web.stanford.edu/~jurafsky/slp3/
Maison, L., & Estève, Y. (2023, August). Some voices are too common: Building fair speech recognition systems using the Common Voice dataset. In Interspeech 2023 (ISCA). Dublin, Ireland. https://hal.archives-ouvertes.fr/hal-04163615
Rufiner, H. L., & Milone, D. H. (2004). Sistema de reconocimiento automático del habla. Ciencia, Docencia y Tecnología, XV(28), 151–177. https://www.redalyc.org/articulo.oa?id=14502806
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Юлія Тарасенко

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.






