INFLUENCIA DEL ORDEN DEL MODELO DE PRONY EN LA MATRIZ DE COVARIANZA DEL FILTRO DE KALMAN PARA LA REDUCCIÓN DE RUIDO EN SEÑALES DE VOZ: EVALUACIÓN MEDIANTE LA RELACIÓN SEÑAL-RUIDO SEGMENTADA Y LA DIVERGENCIA DE ITAKURA–SAITO
DOI:
https://doi.org/10.66104/hegxr115Palabras clave:
Filtro de Kalman, Supresión de ruido, Mejora de la voz, PronyResumen
Este trabajo investiga la influencia del orden del modelo de predicción basado en Prony en la construcción/parametrización de las matrices involucradas en el filtro de Kalman aplicado a la reducción de ruido en señales de voz. A diferencia de los enfoques tradicionales que adoptan LPC (modelo all-pole), se emplea un modelo IIR basado en Prony (numerador + denominador) estimado directamente a partir de la señal ruidosa en ventanas de corto tiempo, lo que permite capturar estructuras espectrales con polos y ceros y, en consecuencia, modificar la estadística de la innovación utilizada en la covarianza de proceso. Para aislar el efecto del orden del modelo, se mantiene fija la misma señal ruidosa con una relación señal-ruido segmentada de entrada (segSNR in ≈ 3 dB), variando únicamente el orden del predictor M ∈ {6, 8,…,20}. El desempeño se cuantifica mediante la segSNR de salida y la divergencia de Itakura–Saito (IS) entre densidades espectrales de potencia estimadas mediante el método de Welch. Los resultados con palabras aisladas muestran una tendencia consistente: órdenes bajos (típicamente M=6 or M=8) presentaron un mejor compromiso entre la ganancia de segSNR y una menor distorsión espectral (IS), mientras que órdenes mayores degradaron progresivamente la fidelidad espectral.
Descargas
Referencias
BAI, Yuting et al. State of art on state estimation: Kalman filter driven by machine learning. Annual Reviews in Control, v. 56, p. 100909, 2023. Disponível em: https://researchr.org/publication/BaiYZSJ23. Acesso em: 21 fev. 2026.
BENDORY, Tamir; DE CASTRO, Yoann; ELDAR, Yonina C. On the accuracy of Prony’s method for recovery of sparse measures from noisy frequency samples. arXiv, 2024. Disponível em: https://arxiv.org/abs/2302.05883. Acesso em: 21 fev. 2026.
BROWN, Robert Grover; HWANG, Patrick Y. C. Introduction to random signals and applied Kalman filtering. New York: John Wiley & Sons, 1997.
DELLER, John R.; PROAKIS, John G.; HANSEN, John H. L. Discrete-time processing of speech signals. New Jersey: Prentice Hall, 1993.
DIONELIS, Nikolaos; BROOKES, Mike. Phase-Aware Single-Channel Speech Enhancement with Modulation-Domain Kalman Filtering. arXiv, 2017. Disponível em: https://arxiv.org/abs/1708.02171. Acesso em: 20 fev. 2026.
FÉVOTTE, Cédric; BERTIN, Nancy; DUFOUR, Jean-Louis. Nonnegative matrix factorization with the Itakura–Saito divergence: with application to music analysis. Neural Computation, 2009. Disponível em: https://perso.ens-lyon.fr/patrice.abry/ENSEIGNEMENTS/14M2SCExam/Bertin.pdf. Acesso em: 20 fev. 2026.
GABREA, Marcel. An adaptive Kalman filter for the enhancement of speech signals. In: INTERSPEECH 2004. p. 2709–2712. DOI: 10.21437/Interspeech.2004-719. Disponível em: https://www.isca-archive.org/interspeech_2004/gabrea04_interspeech.html. Acesso em: 20 fev. 2026.
GIRALDO, Juan et al. Evaluating Speech Enhancement Performance Across Demographics: Revisiting VoiceBank-DEMAND. In: INTERSPEECH 2025. Disponível em: https://www.isca-archive.org/interspeech_2025/giraldo25_interspeech.pdf. Acesso em: 20 fev. 2026.
KANTAMANENI, S. et al. Speech enhancement with noise estimation and filtration using Extended Kalman Filter approach. Theoretical Computer Science, 2023. (Discussão de EKF e sensibilidade do Kalman a modelagem/ruído). Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0304397522004935. Acesso em: 20 fev. 2026.
KIM, K. et al. Neural Network Regression for Sound Source Localization Using Classical Spectral Estimation Techniques (Yule–Walker, Prony, Steiglitz–McBride). Applied Sciences, v. 15, n. 17, 2025. Disponível em: https://www.mdpi.com/2076-3417/15/17/9272. Acesso em: 21 fev. 2026.
KUMARESAN, R.; TUFTS, D. W.; SCHARF, L. L. A Prony method for noisy data: Choosing the signal components and selecting the order in exponential signal models. Proceedings of the IEEE, 1984. Disponível em: https://www.researchgate.net/publication/2996886_A_Prony_method_for_noisy_data_Choosing_the_signal_components_and_selecting_the_order_in_exponential_signal_models. Acesso em: 20 fev. 2026.
O’SHAUGHNESSY, Douglas. Review of methods for coding of speech signals. EURASIP Journal on Audio, Speech, and Music Processing, 2023. DOI: 10.1186/s13636-023-00274-x. Disponível em: https://link.springer.com/article/10.1186/s13636-023-00274-x. Acesso em: 21 fev. 2026.
R.E. Kalman, “A new approach to linear filtering and prediction problems”, Basic Eng, Trans ASME, Series D, Vol 82, March 1960, pp 35–45.
ROY, Sujan Kumar; NICOLSON, Aaron; PALIWAL, Kuldip K. A Deep Learning-Based Kalman Filter for Speech Enhancement. In: INTERSPEECH 2020. p. 2692–2696. DOI: 10.21437/Interspeech.2020-1551. Disponível em: https://www.isca-archive.org/interspeech_2020/roy20_interspeech.html. Acesso em: 20 fev. 2026.
SELICATO, L. et al. Sparse hyperparametric Itakura–Saito nonnegative matrix factorization via bi-level optimization. arXiv, 2025. Disponível em: https://eprints.soton.ac.uk/499610/1/2502.17123v2.pdf. Acesso em: 21 fev. 2026.
TAKABATAKE, Tetsuya; YANO, Keisuke. Towards a robust frequency-domain analysis: Spectral Rényi divergence revisited. arXiv, 2023. Disponível em: https://arxiv.org/abs/2310.06902. Acesso em: 21 fev. 2026.
VASEGHI, Saeed V. Advanced digital signal processing and noise reduction. New York: John Wiley & Sons, 2000.
WANG, J. et al. Independent low-rank matrix analysis for determined blind source separation of audio and speech signals using Itakura–Saito divergence. arXiv, 2024. Disponível em: https://arxiv.org/pdf/2401.01762. Acesso em: 21 fev. 2026.
ZHENG, C. et al. Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods. IEEE/Journal survey (versão em PMC), 2023. Disponível em: https://pmc.ncbi.nlm.nih.gov/articles/PMC10658184/. Acesso em: 20 fev. 2026.
Descargas
Publicado
Número
Sección
Licencia
Derechos de autor 2026 Dr. Leandro Aureliano da Silva, Dr. Eduardo Silva Vasconcelos, Dr. Luiz Fernando Ribeiro de Paiva, Dr. Adriano Dawison de Lima, Me. Welington Mrad Joaquim, Dr. Edilberto Pereira Teixeira

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Authors who publish in this journal agree to the following terms:
Authors retain copyright and grant the journal the right of first publication, with the work simultaneously licensed under the Creative Commons Attribution License, which permits the sharing of the work with proper acknowledgment of authorship and initial publication in this journal;
Authors are authorized to enter into separate, additional agreements for the non-exclusive distribution of the version of the work published in this journal (e.g., posting in an institutional repository or publishing it as a book chapter), provided that authorship and initial publication in this journal are properly acknowledged, and that the work is adapted to the template of the respective repository;
Authors are permitted and encouraged to post and distribute their work online (e.g., in institutional repositories or on their personal websites) at any point before or during the editorial process, as this may lead to productive exchanges and increase the impact and citation of the published work (see The Effect of Open Access);
Authors are responsible for correctly providing their personal information, including name, keywords, abstracts, and other relevant data, thereby defining how they wish to be cited. The journal’s editorial board is not responsible for any errors or inconsistencies in these records.
PRIVACY POLICY
The names and email addresses provided to this journal will be used exclusively for the purposes of this publication and will not be made available for any other purpose or to third parties.
Note: All content of the work is the sole responsibility of the author and the advisor.
