Increase the accuracy of speech signal categories in high noise environments

Document Type: Original Article

Author

Mechanical Engineering and Engineering Science, University of North Carolina at Charlotte, Charlotte, USA

10.22034/jbr.2020.232801.1023

Abstract

In this paper, in order to improve the quality and reconstruction of the noisy speech signal, an algorithm based on clustering in the spectrographic image texture is proposed and evaluated. In this algorithm, we first determine the cluster centers by the method of Glass Compressed Transformation (GCT), based on image tissue extraction and Weighted K-mean algorithm in the field of GCT. After determining the centers of the clusters and their variance, by using the Gaussian Distribution Function (GDF), we determine the mask for each cluster in the GCT space and apply a signal to the GCT domain. In the continuation of this research, this method can be used to separate the two speakers that have been combined and it also can be used in separating speakers in noisy situations. The results of the method evaluation in the presence of a variety of conventional acoustic noises show the advantages of this method in improving the quality of speech.

Keywords


References
[1]        Adnani AT., Dokami A., Morovati M., Fault detection in high speed helical gears considering signal processing method in real simulation. Latin American Journal of Solids and Structures., 2016; 13: 2113-21140.

[2]        Kelareh, A. Y., Shahri, P. K., Khoshnevis, S. A., Valikhani, A., & Shindgikar, S. C., Dynamic Specification Determination using System Response Processing and Hilbert-Huang Transform Method. International Journal of Applied Engineering Research., 2019; 14(22), 4188-4193.

[3]        Ahani, H., Familian, M., Ashtari, R., Optimum Design of a Dynamic Positioning Controller for an Offshore Vessel. Journal of Soft Computing and Decision Support Systems., 2020; 7(1), 13-18.

[4]        Mohamed, A. F., Modir, A., Tansel, I. N., Uragun, B., Detection of Compressive Forces Applied to Tubes and Estimation of Their Locations with the Surface Response to Excitation (SuRE) Method. In 2019 9th International Conference on Recent Advances in Space Technologies., 2019; 83-88. IEEE.

[5]        Mohamed, A. F., Modir, A., Shah, K. Y., Tansel, I., Control of the Building Parameters of Additively Manufactured Polymer Parts for More Effective Implementation of Structural Health Monitoring (SHM) Methods. Structural Health Monitoring., 2019.

[6]        Sheikhshahrokhdehkordi, M., Goudarzi, N., Saffaraval, F., Mousavi sani, S., Tkacik, P., A TomoPIV Flow Field Study of NACA 63-215 Hydrofoil With CFD Comparison. In Fluids Engineering Division Summer Meeting. American Society of Mechanical Engineers., 2019; 59070: V004T04A039.

[7]        Mousavi sani, S., Goudarzi, N., Sheikhshahrokhdehkordi, M., Bisel, T., Dahlberg, J., Tkacik, P., Exploring and Improving the Flow Characteristics of an Empty Water Channel Test Section: The Application of TomoPIV and Flowrate Sensors for Whole-Flow-Field Visualization. In Fluids Engineering Division Summer Meeting. American Society of Mechanical Engineers., 2019; 59070: V004T04A040.

[8]        Izadi, V., Abedi, M., Bolandi, H., Verification of reaction wheel functional model in HIL test-bed. In2016 4th International Conference on Control, Instrumentation, and Automation (ICCIA) IEEE., 2016; 155-160.

[9]        Izadi, V., Abedi, M., Bolandi, H., Supervisory algorithm based on reaction wheel modelling and spectrum analysis for detection and classification of electromechanical faults. IET Science, Measurement & Technology., 2017; 11(8): 1085-93.

[10]      Izadi, V., Shahri, P. K., Ahani, H. A., Compressed-sensing-based compressor for ECG. Biomedical engineering letters., 2020; 6: 1-9.

[11]      Wang TT., F Quatieri T., Towards Interpretive Models for 2-D Processing of Speech. IEEE transactions on audio, speech, and language processing., 2012; 20: 2159-2173.

[12]      Wang TT., F Quatieri T., Two-dimensional speech-signal modeling. IEEE transactions on audio, speech, and language processing., 2012; 20: 1843-1856.

[13]      Wang TT., F Quatieri T., Multi-pitch estimation by a joint 2-D representation of pitch and pitch dynamics. In Eleventh Annual Conference of the International Speech Communication Association., 2010.     

[14]      Wang TT., F Quatieri T., High-pitch formant estimation by exploiting temporal change of pitch. IEEE transactions on audio, speech, and language processing., 2009; 18: 171-186.

[15]      Wang TT., Exploiting pitch dynamics for speech spectral estimation using a two-dimensional processing framework. PhD diss., Massachusetts Institute of Technology., 2008.

[16]      Wang TT., Exploiting pitch dynamics for speech spectral estimation using a two-dimensional processing framework. PhD diss., Massachusetts Institute of Technology., 2008.

[17]      Griffin D., Jae L., Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing., 1984; 32: 236-243.

[18]      Abhijith MN., Ghosh PK., Rajgopal K., Multi-pitch tracking using Gaussian mixture model with time varying parameters and grating compression transform. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 2014; 1473-1477.

[19]      Jie L., Zhang G., Fu B., Hao Y., Multipitch tracking with continuous correlation feature and hybrid DBNS/HMM model. 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing (ICCWAMTIP)., 2014; 218-221.

[20]    Jilt S., Manoj Kumar PA., Murthy HA., Pitch estimation from speech using grating compression transform on modified group-delay-gram. Twenty First National Conference on Communications (NCC)., 2015; 1-6.

[21]    Tony E., Bouvrie J., Poggio T., AM-FM demodulation of spectrograms using localized 2D max-Gabor analysis. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07., 2007; 4: IV-1061.   

[22]      Karimi Shahri, P., Chintamani Shindgikar, S., HomChaudhuri, B., & Ghasemi, A. H. (2019, October). Optimal Lane Management in Heterogeneous Traffic Network. In Dynamic Systems and Control Conference (Vol. 59162, p. V003T18A003). American Society of Mechanical Engineers.

[23]      Shahri, P. K., Ghasemi, A. H., & Izadi, V. (2020). Optimal Lane Management in Heterogeneous Traffic Network Using Extremum Seeking Approach (No. 2020-01-0086). SAE Technical Paper.

[24]    Wang TT., F Quatieri T., Towards co-channel speaker separation by 2-D demodulation of spectrograms. In 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics., 2009; 65-68.

[25]    Petros M., F Kaiser J., F Quatieri T., Energy separation in signal modulations with application to speech analysis. IEEE transactions on signal processing., 1993; 41: 3024-3051.

[26]      Rafieipour H, Zadeh AA, Mirzaei M. Distributed Frequent Itemset Mining with Bitwise Method and Using the Gossip-Based Protocol. Journal of Soft Computing and Decision Support Systems. 2020 May 7;7(3):32-9.

[27]    Petros M., F Kaiser J., F Quatieri T., On amplitude and frequency demodulation using energy operators. IEEE Transactions on signal processing., 1993; 41: 1532-1550.

[28]    Petros M., F Quatieri T., F Kaiser J., Speech nonlinearities, modulations, and energy operators. In Proc. ICASSP., 1991; 91: 421-424.

[29]    Wang TT., F Quatieri T., High-pitch formant estimation by exploiting temporal change of pitch. IEEE transactions on audio, speech, and language processing., 2009; 18: 171-186.

[30]    Mingyang W., Wang D., J Brown G., A multipitch tracking algorithm for noisy speech. IEEE Transactions on Speech and Audio Processing., 2003; 11: 229-241.

[31]    F Quatieri T., Discrete-time speech signal processing: principles and practice. Pearson Education India., 2006.

[32]      Srikanth V., Y Espy-Wilson C., An algorithm for speech segregation of co-channel speech. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing., 2009; 109-112.

[33]    Jae S L., Two-dimensional signal and image processing.  Ph.D Thesis., 1990.

[34]    Harald G., E Nordholm S., Claesson I., Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE transactions on speech and audio processing., 2001; 9: 799-807.

[35]    Kohei Y., Ogata S., Shimamura T., Spectral subtraction iterated with weighting factors in speech coding. IEEE Workshop Proceedings., 2002; 138-140.

[36]    Varga A., The NOISEX-92 study on the effect of additive noise on automatic speech recognition. ical Report, DRA Speech Research Unit., 1992.

[37]      Surakanti, S. R., Khoshnevis, S. A., Ahani, H., & Izadi, V. (2019). Efficient Recovery of Structrual Health Monitoring Signal based on Kronecker Compressive Sensing. International Journal of Applied Engineering Research, 14(23), 4256-4261.

[38]      Taremi, R. S., Shahri, P. K., & Kalareh, A. Y. (2019). Design a Tracking Control Law for the Nonlinear Continuous Time Fuzzy Polynomial Systems. Journal of Soft Computing and Decision Support Systems, 6(6), 21-27.