대화형AI 시장이 급성장하고 있는 가운데 인간과 기계가 상호작용을 효과적으로 하기 위해 필요한 기술에 대해 인피니언 테크놀로지스 기술진들이 이야기 한다.
“High SNR MEMS Microphones Are Essential for Conversational AI”
Clear audio capture even in imperfect environments
Improved speech recognition, effective for multimodal systems
■ Conversational AI is growing rapidly, and elements for human-machine interaction are important
Conversational AI is evolving rapidly as a field of machine learning. Conversational AI can be used to make human-machine interaction intuitive and natural.
Conversational AI uses advanced algorithms and technologies to interpret natural language input, enabling machines to respond like humans.
By introducing a conversational AI framework with tools and systems, users can interact with machines using natural language commands.
These intelligent systems are designed to understand intent and context, remember users’ preferences, and engage in meaningful conversations.
This article describes conversational AI that interprets and responds to spoken words rather than written text.
Voice-enabled applications are becoming increasingly popular in our daily lives. We explore the technological advances that are driving the growth of the conversational AI market and the challenges that must be addressed for voice-enabled assistants to be widely adopted.
One important element to improving the user experience of voice-enabled applications is the voice user interface (VUI). Micro-electro-mechanical system (MEMS) microphones with high signal-to-noise ratio (SNR) are emerging as important components for accurate speech recognition and overall improved audio quality.
These high-performance silicon microphones feature a compact size and high sensitivity, enabling more precise sound capture, background noise filtering, and clearer audio input for conversational AI systems.
This paper describes how high SNR MEMS microphones can significantly improve speech recognition accuracy and enable smoother, more natural human-machine interaction in voice-enabled applications.
■ Utilizing devices and applications to realize conversational AI
Conversational AI is becoming an essential component in many devices and applications today. Conversational AI is changing the way we interact with technology in many areas.
Some familiar applications where conversational AI is being used significantly today include:
▷Smart speaker: A smart speaker is a standalone speaker that can respond to user requests by having a built-in voice assistant. Some of the more well-known ones on the market include Google Home with Google Assistant, Amazon Echo with Alexa, and Apple HomePod with Siri.
▷Voice-enabled vehicle systems: Cars with built-in voice-controlled assistants allow drivers to keep their hands off the steering wheel and keep their eyes on the road. They can control things like music playback, navigation systems, and climate control without the driver having to find buttons or flip through menus.
▷Smart Home System: Smart home systems can conveniently control your home using natural language commands. Everyday devices that can utilize conversational AI include lighting, thermostats, and security systems.
▷Smart Meeting System: Smart Meeting System is a productivity tool that can add and translate meeting subtitles using conversational AI. This system can utilize a voice-enabled secretary for tasks such as coordinating schedules, checking action items, and writing meeting minutes.
■ Voice recognition, natural language processing, and voice-enabled devices are shaping the future of conversational AI
The market for applications and devices that adopt conversational AI has been growing rapidly over the past few years, especially during the COVID pandemic. The voice assistant market is expected to grow at a CAGR (compound annual growth rate) of 33.5% between 2023 and 2030 [1].
The foundation for this growth is advances in conversational AI. The following trends are driving the growth of this technology today:
▷Improvement of voice recognition algorithms: As conversational AI becomes more widely used, the dataset for voice recognition increases. As a result, voice recognition algorithms become better at recognizing words and sentences and better at understanding what real people are saying. Voice recognition technology is becoming better at recognizing languages, accents, and dialects[2].
▷Advances in natural language processing: Natural language processing is the mechanism that conversational AI uses to interpret what the user wants. As natural language processing algorithms become more sophisticated, the accuracy and personalization of conversational AI are improving. This makes conversational AI more intuitive and reliable [3].
▷Increased use of voice-enabled devices: The increasing use of voice-enabled functions in various devices and applications will increase the demand for conversational AI, which will accelerate the advancement of this field. As technology advances, virtual assistants will be able to handle increasingly complex tasks better and better. As the perception that conversational AI improves work efficiency spreads, the number of companies using voice-enabled applications will continue to increase[4].
■ Microphone performance is important for widespread adoption of voice assistants
Speech recognition and natural language processing technologies are advancing rapidly, and there is clear market demand for advanced conversational AI systems. Despite these advances, users remain dissatisfied.
This is a barrier to widespread adoption of voice assistants. Much of this is related to data privacy. Users are concerned about the security of voice data stored in the cloud and the potential for devices to eavesdrop on and record private conversations.
Another complaint users have is that they feel they are interacting with voice assistants. Almost every operating system and device on the market today has a voice assistant, but these assistants still confuse homonyms, misunderstand intonation, and require extremely precise pronunciation.
It works poorly in places where there is background noise and fails to understand the speech of users with speech impairments. These speech recognition problems are related to the performance of the microphone adopted by the device [5].
Voice user interfaces (VUIs) are a key element of conversational AI technologies such as voice-enabled assistants.
Users interact with the assistant by speaking to the VUI. An effective voice-enabled assistant, in other words an effective VUI, must accurately hear and understand voice commands. If you don't understand what your users are saying, you'll make them unhappy and your user experience will suffer.
■ High SNR MEMS Microphone User Experience Improvement
You can avoid some of the misunderstanding by speaking clearly and directly to your voice assistant, avoiding noisy places, or only giving simple commands. However, doing so limits the potential of conversational AI and runs counter to users’ expectations of a natural, conversational interaction with their voice assistant.
The solution to this problem is to improve audio capture with VUI. High SNR MEMS microphones are designed to capture clear audio even in imperfect environments and are effective for improved speech recognition, far-field voice capture, contextual understanding, and multimodal systems (capable of interpreting both audio and visual input). These are important to remove the obstacles that hinder the adoption of voice-enabled assistants.
■ Improved voice recognition
High SNR MEMS microphones capture clear and accurate audio signals. This serves as a foundation for improving the performance of speech recognition algorithms. MEMS microphones can capture speech even in background noise. Therefore, voice-enabled assistants can better understand user commands and questions. As the microphone provides a better quality input signal, the accuracy of the assistant's interpretation also improves[6].
MEMS microphones improve the overall user experience and efficiency of voice-based interactions because they better handle the real-world sound environment in which users ask questions to voice-enabled assistants.
■ Noise reduction and distant voice capture
A high SNR allows MEMS microphones to clearly capture voice commands. SNR is the ratio of the desired sound that the microphone is trying to capture to the noise generated by the microphone itself.
Therefore, a higher SNR can capture more desired signals. By adding high sensitivity to a high SNR, long-distance voice capture is possible. Therefore, users can interact with voice assistants even from a distance or in noisy places [7].

▲Voice signal level and distance from device according to major VUI use cases (Source: Infineon)

▲This chart shows that high SNR microphones perform better in whisper/soft speech scenarios. (Source: Infineon)
Active noise filtering and far-field voice capture increase the usability of voice assistants in a variety of noisy environments, such as smart homes, conference rooms, customer support systems, and public spaces. A study conducted by Infineon showed that using high SNR MEMS microphones with a 75 dB SNR can capture audio up to 40% better than standard microphones used in commercial voice-enabled assistants [8].
■ Understanding context and multimodal interaction
VUIs that employ high SNR MEMS microphones can also capture contextual clues, such as tone and stress, from a user's voice. This contextual understanding allows voice assistants to infer a user's intent and provide more accurate and personalized responses.
These improvements lead to enabling multimodal interactions. For example, by combining a VUI with a high SNR MEMS microphone and a facial recognition model, users can interact with their devices using both voice commands and facial expressions. This can improve the understanding of the user’s intent by voice-enabled assistants [9].
■ High SNR MEMS microphone, voice recognition accuracy ↑, noise ↓, long-distance voice capture possible
High SNR MEMS microphones are a critical component to enhance the usability of conversational AI models used in VUIs. These MEMS microphones improve speech recognition accuracy, reduce noise, enable long-distance voice capture, enable contextual understanding, and enable multimodal interactions. These microphones provide clear audio capture with the best performance even in noisy environments. High SNR MEMS microphones enhance user experiences by enabling more reliable interactions with virtual assistants.
As high-SNR MEMS microphone technology advances, the performance and reliability of voice-enabled assistants will continue to improve. Continued advances in microphone sensitivity, signal processing, and noise cancellation techniques will enhance the performance of conversational AI systems. Continued improvements in high SNR MEMS microphones will significantly advance human-machine interaction and enable new capabilities in voice-based technologies.
Conversational AI has a bright future. With the breakthroughs in speech recognition, contextual awareness, and learning models, voice assistants will be able to handle increasingly complex commands and conversations. Combining advanced algorithms with superior microphones will enable voice assistants to provide more convenient and intuitive user experiences.
■ Infineon's high SNR MEMS microphone
Infineon’s XENSIV™ MEMS microphones feature high SNR, low distortion even at high sound pressure levels, component-to-component phase and sensitivity matching, flat frequency response with low frequency roll-off, and extremely low group delay. With selectable power modes and a small package size, Infineon’s XENSIV™ MEMS microphones are the ideal solution for devices employing conversational AI.
www.infineon.com/mems 
▲Infineon’s high-performance digital XENSIV™ MEMS microphone (IM70D122) enables high-quality audio capture in laptop and tablet applications. (Source: Infineon)
※ References
[1] Vantage Market Research. “Voice Assistants Market Size, Share & Trends Analysis Report by 2030”. May 2023. Accessed 7 July 2023 from https://www.linkedin.com/pulse/voice-assistants-market-size-share-trends-analysis-report-hancock/
[2] Murf Resources. “Future of AI in Speech Recognition.” April 2023. Accessed 18 June 2023 from https://murf.ai/resources/future-of-ai-in-speech-recognition/
[3] Schmelzer, Ronald. “Natural language processing drives conversational AI trends.” TechTarget. June 2019. Accessed 18 June 2023 from https://www.techtarget.com/searchenterpriseai/feature/Natural-language-processing-drives-conversational-AI-trends
[4] GlobeNewswire. “Global Conversational AI Market Report 2023: Increasing Demand for AI-Powered Customer Support Services Boosts Growth.” April 2023. Accessed 18 June 2023 from https://www.globenewswire.com/en/news-release/2023/04/17/2648259/28124/en/Global-Conversational-AI-Market-Report-2023-Increasing- Demand-for-AI-Powered-Customer-Support-Services-Boosts-Growth.html…
[5] Zetlin, Minda. “Here’s Why Alexa (and Siri and Google) Still Don’t Understand You as Well as They Should”. Inc. December 2022. Accessed 19 June 2023 from https://www.inc.com/minda-zetlin/heres-why-alexa-and-siri-google-still-dont-understand-you-as-well-as-they- should.html
[6] Infineon. “Why you need high performance, ultra-high SNR MEMS microphones”. Accessed 19 June 2023 from https://www.infineon.com/dgdl/Infineon-AN547_Why+you+need+high+performance+ultra-high+SNR+microphones+-AN-v01_01-EN.pdf?fileId=5546d4626102d35a01612d1e2afd6ad3
[7] Infineon. “Why you need high performance, ultra-high SNR MEMS microphones”. Accessed 19 June 2023 from https://www.infineon.com/dgdl/Infineon-AN547_Why+you+need+high+performance+ultra-high+SNR+microphones+-AN-v01_01-EN.pdf?fileId=5546d4626102d35a01612d1e2afd6ad3
[8] Infineon. "Value of high-SNR microphones in Voice User Interface". Accessed 19 June 2023 from https://www.infineon.com/dgdl/Infineon-Value+of+high+SNR+microphones+in+Voice+user+Interface-ApplicationNotes-v01_01-EN.pdf?fileId=5546d46269e1c019016a78d976d852fd
[9] Ahmad, Majeed. “How MEMS Microphones Aid Sound Detection and Keyword Recognition in Voice-Activated Designs”. DigiKey. Accessed 19 June 2023 from https://www.digikey.com/en/articles/how-mems-microphones-aid-sound-detection