Design and Implementation of Vehicle-based Audio Control System Based on Unspecified Person

With the development of modern electronic technology, more and more on-board electrical appliances have been added to the body electronics industry, which has improved the performance of the car, increased the complexity of the driving operation of the car, and brought insecurity to the driving process. Hidden dangers. With the improvement of the speech recognition algorithm and the advent of a new generation of dedicated speech processing chips, the use of voice control instead of manual control of the vehicle-mounted electrical appliances has emerged, thereby reducing the burden on the driver's manual operation and greatly improving the driving safety factor.
At present, the electronic voice control of the body in China is mainly concentrated on the application of the car navigation system, and the application value of the voice recognition technology in the body electronics is not fully utilized. This paper firstly proposes a design scheme of non-specific human car audio voice control system with the special voice processing chip UniSpeech-SDA80D51 as the core, and realizes the development of the system prototype. The system carries out voice control experiments on the JAC Tongyue SL1102C1 car audio. The experimental data shows that the system voice recognition rate can reach 95%, which lays the foundation for the next product development.
1 Car audio voice control system <br> The block diagram of the non-specific car audio voice control system is shown in Figure 1.

This article refers to the address: http://

The main function of the system is: the voice acquisition module (composed of directional pickups) is used to collect the voice command signal sent by the driver, the A/D conversion of the signal is realized by the voice recognition module, and the converted digital signal is subjected to voice recognition processing, and finally The term encoding information corresponding to the voice command is output. The control module logically analyzes and processes the received term coded signal and generates a corresponding control signal, and drives the car audio action through the system I/O interface to complete the driver's voice command.
1.1 Speech Recognition Module <br> The speech recognition module is mainly composed of UniSpeech-SDA80D51 chip and peripheral circuits.
The SDA80D51 is a new dedicated chip developed by Infineon of Germany for speech recognition and speech processing applications. It is manufactured in a 0.18 μm semiconductor process using a highly integrated SoC system structure. The basic structure of the SDA80D51 is shown in Figure 2.

The SDA80D51 integrates direct dual access fast SRAM, 2-channel ADC and 2-way DAC, multiple communication interfaces and general purpose GPIO. The working mode of SDA80D51 is M8051 as the main control chip, which mainly completes the system configuration and the control of SPI, PWM, I2C, GPIO and other interfaces and the transmission of voice data. The DSP core OAK is the coprocessor, which completes the speech recognition algorithm and speech codec algorithm. Wait for voice processing work.
The non-specific human voice signal is input by the directional pickup, and is subjected to A/D conversion by the data acquisition module inside the SDA80D51, and then processed by the preprocessing of the identification program, the endpoint detection, the feature parameter extraction, the template matching, etc., and the closest in the recognition vocabulary is selected. The number of the entry is used as the recognition result, and the recognition result is output through the GPIO port.
1.2 Control Module <br> The control module is composed of MCU and analog switch circuit. This module mainly performs logic analysis and processing on the word coded signal output by the voice recognition module, and generates the corresponding function control signal to control the sound action through the analog switch circuit. . The MCU selects the AT89S51 of the American ATMEL company, integrates the output voltage characteristics of the AT89S51 output I/O signal and the characteristics of the SL1102C1 audio control panel resistive shunt keyboard circuit, and determines the closing and opening action of the control panel buttons of the SL1102C1 using the relay. The schematic diagram of the AT89S51 and relay analog switch circuit is shown in Figure 3.

1.3 Audio Module <br> This design is based on the SL1102C1 car audio. The SL1102C1 is a car stereo designed for mid-range cars. It features MP3 playback, radio and display time. It is currently used in the JAC Tongyue sedan. The front panel of the SL1102C1 has a total of 15 buttons for power on/off, mute, sound, play/pause, and a code switch for adjusting the volume.
The buttons on the front panel of SL1102C1 are voltage sampling identification mode. The buttons include short press and long press. The output voltage of AT89S51 is TTL level. Directly using I/O signals to drive the audio button action is easy to cause misidentification, resulting in system misoperation. Therefore, this paper uses the analog switch circuit shown in Figure 3 to solve the above problems. When the AT89S51 receives the speech coded signal, it immediately performs logic analysis and outputs the corresponding control signal to drive the corresponding relay to pull the analog button action. The short press and long press function of the button are realized by software.
The analog switch circuit is also applicable to the code switch on the front panel of the SL1102C1. The code switch has a volume adjustment function, and its working principle is shown in FIG. 4 .
As can be seen from Fig. 4, there are three terminals A, B and C on the coding switch. When the switch knob is rotated left and right, the A and B terminals output corresponding pulse signals. When the MCU receives the voice command signal for operating the code switch, it drives the relay to operate, and the control terminals A and B output signals to simulate the switch knob function.


2 system software design <br> The system software includes non-specific person speech recognition module and logic control module.
2.1 Non-specific person speech recognition module <br> The non-specific person speech recognition module is based on the HMM model algorithm. The algorithm collects statistics on a large number of voice data, establishes a statistical model speech library for identifying the terms, and then extracts features from the speech to be recognized, matches the model library, and obtains the recognition result by comparing the matching scores, and passes the GPIO port of SDA80D51. The word coded signal corresponding to the recognition result is output. The speech recognition module is mainly composed of signal preprocessing, feature parameter extraction, model matching and Viterbi algorithm. The block diagram of the non-specific human speech recognition module is shown in Fig. 5.

2.1.1 Signal Preprocessing <br> The signal preprocessing part mainly completes the sampling and analog/digital conversion functions of the input speech signal. The A/D conversion is implemented by the SDA80D51 embedded with a 12-bit A/D converter, and the sampling frequency is fixed at 8 kHz.
2.1.2 Feature Parameter Extraction <br> Feature parameter extraction is based on speech frames, and features are extracted using framing. First, the speech signal is overlapped and framing, and the previous frame and the latter frame overlap by half (the frame signal overlap is to reflect the correlation between the adjacent two frames of data), the frame length is 25 ms, and the speech feature is extracted once for each frame. .
The speech signal is a convolution of the channel response and the glottal excitation signal. Solving the logarithmic frequency response of the channel transfer function and the glottal excitation signal respectively, because the frequency response of the glottal excitation signal and the channel transmission function are different in the frequency of the spectrum change, such as the frequency axis as the time axis, the glottal excitation The frequency response of the signal corresponds to the "high frequency" zone; the channel transfer function corresponds to the "low frequency" zone, which is easily distinguishable in different regions.
The MFCC parameters belong to the perceptual frequency domain cepstrum parameters, reflecting the characteristics of the short-term amplitude spectrum of the speech signal. The specific calculation and extraction process of the p-dimensional MFCC parameters is as follows:
(1) Calculate the linear spectrum for each frame s(n:m) by DFFT, and calculate the square of the spectrum mode as the power spectrum;
(2) The power spectrum obtains D parameters X(i) through the Mel filter bank, and D is the number of triangular filters in the Mel filter bank;
(3) Logarithmically and discrete cosine transforms are performed on X(i). The cosine transform is calculated as follows:

Y(i) in the equation is the logarithmic energy output of the i-th Mel filter, i = 1, 2, ..., D.
2.1.3 HMM speech recognition algorithm <br> Hidden Markov uses a probabilistic statistical model to describe the speech signal. The HMM model is based on the Markov chain and uses the Markov chain to simulate the statistical characteristics of the speech signal. The HMM model is a double stochastic process, one of which is the Markov chain, which describes the state transition by (Ï€, A) and the output as a state sequence. The other is a stochastic process, described by B. In the statistical sense, B reflects the state and observation. The correspondence between the outputs is a sequence of observation vector. The state and time parameters in the Markov chain are discrete Markov processes.
The Viterbi algorithm is a frame synchronization dynamic regularization algorithm. Given a sequence of observations and a model, the Viterbi algorithm gives a sequence of states with the largest probability density P(Q, O|λ). The Viterbi algorithm includes initialization, recursion, termination, path backtracking, and determining the best state sequence.

For speech processing, P(Q, O|λ) takes a large range due to the change of Q, and the maximum value of P(Q, O|λ) accounts for all P(Q, O|λ). Large components, so you can use the Viterbi algorithm to calculate P(O|λ).
2.2 Control Module <br> The main function of the control module is: After the AT89S51 queries the voice entry signal, the lookup table obtains the entry code, and according to the code, the corresponding button is long pressed or short pressed, and respectively enters the corresponding subroutine processing. In the subroutine, the I/O control signal corresponding to the output voice command drives the relay to pull the analog button or the code switch action, and resets the I/O port in time. The control module also has full compatibility with manual control. It can also be manually operated while the voice control operation is performed. The manual priority is higher than the voice command, which avoids conflicts between voice control and manual control.
The control module part of the program code is as follows:

3 System test results <br> This system carries out the non-specific person speech recognition rate and analog switch action accuracy test on the JAC Tongyue SL1102C1 car audio. Since the voice entry of car audio is 2 to 4 words, the experimental content of speech recognition rate is 18 for car audio, 12 for 3 words, 12 for 4 words, and 10 for 4 words. 6 people (4 males, 2 females, Mandarin and dialects), the experimental environment is the laboratory environment. In order to improve the recognition rate of the system, the system uses the Olympus ME52 directional microphone to improve the microphone receiving range. The system test results are shown in Table 1.

As can be seen from Table 1, the recognition rate of the system is related to the number of voice command words, the microphone receiving distance, and the speaker dialect. The recognition rates of male and female voices are close.
In the system control circuit experiment, the analog switch action reached a high accuracy rate, and the test result was 98% or more. As long as the control program was running normally, each relay could perform the closed and open analog manual switch operation according to the program.
Realizing the voice control of automotive electrical appliances is the development trend of automotive electrical appliances in the future, and more and more solutions have been proposed and verified. This paper designs the SDA80D51 chip on the SL1102C1 car audio system to realize the voice recognition and control of the car audio. Due to the high integration of the chip and the need for fewer peripheral modules, the designed hardware circuit is simple and convenient for debugging and detection. The prototype obtained by this design has a high recognition rate, stable operation and strong scalability, and achieves the expected design goals. The entire design and implementation method are feasible. Since the speech recognition rate varies with the environment and the speaker, although the HMM algorithm can obtain a high recognition rate in a low-noise environment, the speech recognition system is used when the test speech or the environment contains different degrees of noise pollution. The performance will be reduced. Improving the noise immunity and robustness of the system is one of the keys to the practical application of speech recognition systems.
references
[1] Yang Xingjun, Chi Huisheng. Digital Processing of Speech Signals [M]. Beijing: Publishing House of Electronics Industry, 1995.
[2] Inifneon. UniSpeech2V2.0 Functional Specification [Z]. Infineon Technologies AG, 2002.
[3] Han Jiqing, Zhang Lei, Zheng Tiran. Speech Signal Processing [M]. Beijing: Tsinghua University Press, 2004.
[4] Wang Haiqing. Password-based speech recognition system based on CDHMM and its DSP implementation [D]. Hefei: University of Science and Technology of China, 2003.
[5] BURCHARD B, ROMER R, FOX O. A single chip phoneme based HMM speech recognition system for consumer applications [J]. Consumer Electronics, IEEE Transactions on, 2000, 46(3): 914-919.
[6] Masao Namiki, Takayuki Hamamoto, Seiichiro Hangai.Spoken word recognition with digital cochlea using 32 DSP-boards, IEEE Trans. on Acoust, Speech, Signal Processing, 2001, 2:969-972.

Apple Lightning Cables

Dongguan Pinji Electronic Technology Limited , https://www.iquaxusb4cable.com

This entry was posted in on