Speech Recognition Techniques For Robustness In Adverse Environments, E.g., In Noise, Of Stress Induced Speech, Etc. (epo) Patents and Patent Applications (Class 704/E15.039)

Heterogeneous computing for hybrid acoustic echo cancellation

Patent number: 11984110

Abstract: A device operates to perform acoustic echo cancellation. The device includes a speaker to output a far-end signal at the device, a microphone to receive at least a near-end signal and the far-end signal from the speaker to produce a microphone output, and an AI accelerator operative to perform neural network operations according to a first neural network model and a second neural network model to output an echo-suppressed signal. The device further includes a digital signal processing (DSP) unit. The DSP unit is operative to perform adaptive filtering to remove at least a portion of the far-end signal from the microphone output to generate a filtered near-end signal, and perform Fast Fourier Transform (FFT) and inverse FFT (IFFT) to generate input to the first neural network model and the second neural network model, respectively.

Type: Grant

Filed: March 7, 2022

Date of Patent: May 14, 2024

Assignee: MEDIATEK SINGAPORE PTE. LTD.

Inventors: Xiaoxi Yu, Hantao Huang, Ziang Yang, Chia Hsin Yang, Li-Wei Cheng

Method of waking a device using spoken voice commands

Patent number: 11917384

Abstract: Disclosed herein are systems and methods for processing speech signals in mixed reality applications. A method may include receiving an audio signal; determining, via first processors, whether the audio signal comprises a voice onset event; in accordance with a determination that the audio signal comprises the voice onset event: waking a second one or more processors; determining, via the second processors, that the audio signal comprises a predetermined trigger signal; in accordance with a determination that the audio signal comprises the predetermined trigger signal: waking third processors; performing, via the third processors, automatic speech recognition based on the audio signal; and in accordance with a determination that the audio signal does not comprise the predetermined trigger signal: forgoing waking the third processors; and in accordance with a determination that the audio signal does not comprise the voice onset event: forgoing waking the second processors.

Type: Grant

Filed: March 26, 2021

Date of Patent: February 27, 2024

Assignee: Magic Leap, Inc.

Inventors: David Thomas Roach, Jean-Marc Jot, Jung-Suk Lee

Hearing aid determining turn-taking

Patent number: 11863938

Abstract: The present application relates to a hearing aid adapted to be worn in or at an ear of a hearing aid user and/or to be fully or partially implanted in the head of the hearing aid user.

Type: Grant

Filed: May 27, 2022

Date of Patent: January 2, 2024

Assignee: Oticon A/S

Inventors: Thomas Lunner, Lars Bramsløw

Voice onset detection

Patent number: 11790935

Abstract: In some embodiments, a first audio signal is received via a first microphone, and a first probability of voice activity is determined based on the first audio signal. A second audio signal is received via a second microphone, and a second probability of voice activity is determined based on the first and second audio signals. Whether a first threshold of voice activity is met is determined based on the first and second probabilities of voice activity. In accordance with a determination that a first threshold of voice activity is met, it is determined that a voice onset has occurred, and an alert is transmitted to a processor based on the determination that the voice onset has occurred. In accordance with a determination that a first threshold of voice activity is not met, it is not determined that a voice onset has occurred.

Type: Grant

Filed: April 6, 2022

Date of Patent: October 17, 2023

Assignee: Magic Leap, Inc.

Inventors: Jung-Suk Lee, Jean-Marc Jot

Video surveillance system with audio analytics adapted to a particular environment to aid in identifying abnormal events in the particular environment

Patent number: 11765501

Abstract: Methods and systems for identifying abnormal sounds in a particular environment. A normal audio stream obtained in the absence of abnormal sounds may be used as a baseline for subsequently processing an incoming audio stream with a processor to determine whether the incoming audio stream from the microphone in the particular environment includes an abnormal audio event for the particular environment. When it is determined that the incoming audio stream includes an abnormal audio event for the particular environment an electronic database may be accessed to determine a location of the abnormal audio event in the particular environment. A video camera with a field of view that includes the location of the abnormal audio event in the particular environment may be identified and the video stream from the identified video camera retrieved and displayed.

Type: Grant

Filed: March 10, 2021

Date of Patent: September 19, 2023

Assignee: HONEYWELL INTERNATIONAL INC.

Inventors: Lalitha M. Eswara, Syed Omar Khaiyam, Siddharth Sonkamble, Deepak Kaul, K Karthikeyan

Method and apparatus for controlling device located a distance away from a user

Patent number: 11721334

Abstract: A method and apparatus for controlling a device according to an embodiment of the present disclosure may be based on a speech feature of a user reflecting the Lombard effect so as to operate a device located far away from the user, among a plurality of electronic devices. As such, even when the user calls a device located far away from the user without any separate context information, speech recognition neural networks and weight calculation neural networks may be selected and used to operate the device located far away from the user, and reception of a speech signal of the user calling a device located far away from the user may be performed in an Internet of Things (IoT) environment using a 5G network.

Type: Grant

Filed: March 5, 2020

Date of Patent: August 8, 2023

Assignee: LG ELECTRONICS INC.

Inventors: Jong Hoon Chae, Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang

Modular quick-connect A/V system and methods thereof

Patent number: 11683638

Abstract: A modular speaker system, comprising an exoskeleton, configured to mechanically support and quick attach and release at least one functional panel and an electrical interface provided within the exoskeleton, configured to mate with a corresponding electrical connector of the functional panel. An optional endoskeleton is provided to support internal components. The system preferably provides a digital electronic controller, and the electrical interface is a digital data and power bus, with multiplexed communications between the elements of the system. The elements of the system preferably include at least one speaker, and other audiovisual and communications components. Multiple modules may be interconnected, communicating through the electrical interface. A base module may be provided to provide power and typical control, user and audiovisual interface connectors.

Type: Grant

Filed: July 4, 2022

Date of Patent: June 20, 2023

Assignee: Sonic Blocks, Inc.

Inventors: Scott D. Wilker, Jordan D. Wilker

Robust audio identification with interference cancellation

Patent number: 11631404

Abstract: Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.

Type: Grant

Filed: August 12, 2021

Date of Patent: April 18, 2023

Assignee: ROKU, INC.

Inventors: Jose Pio Pereira, Sunil Suresh Kulkarni, Mihailo M. Stojancic, Shashank Merchant, Peter Wendt

Artificial intelligence server

Patent number: 11605379

Abstract: Disclosed is an artificial intelligence server. The artificial intelligence server includes a communicator in communication with at least one electronic device and a processor for receiving input data from a specific electronic device, applying personalized information corresponding to the specific electronic device to a recognition model, inputting the input data into the recognition model to which the personalized information is applied to obtain a final result value, and transmitting the final result value to the specific electronic device.

Type: Grant

Filed: July 11, 2019

Date of Patent: March 14, 2023

Assignee: LG ELECTRONICS INC.

Inventor: Jongwoo Han

Determining input for speech processing engine

Patent number: 11587563

Abstract: A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.

Type: Grant

Filed: February 28, 2020

Date of Patent: February 21, 2023

Assignee: Magic Leap, Inc.

Inventors: Anthony Robert Sheeder, Colby Nelson Leider

Dual-microphone with wind noise suppression method

Patent number: 11363367

Abstract: A dual-microphone arrangement (300) provides improve voice performance in a wireless headset (12). A vibration sensor (1130) is used for voice pickup and will add low-frequency voice audio content in windy conditions. An equalizer (810) is used to restore low-frequency voice audio content in wind-free conditions. Depending on the measured wind power, the output will derive more signal from the equalizer (810) or more signal from the vibration sensor (1130).

Type: Grant

Filed: November 30, 2020

Date of Patent: June 14, 2022

Assignee: Dopple IP B.V.

Inventors: Jacobus Cornelis Haartsen, Aalbert Stek

DYNAMIC SELECTION AMONG ACOUSTIC TRANSFORMS

Publication number: 20150149167

Abstract: Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data.

Type: Application

Filed: September 30, 2011

Publication date: May 28, 2015

Applicant: GOOGLE INC.

Inventors: Françoise Beaufays, Johan Schalkwyk, Vincent Olivier Vanhoucke, Petar Stanisa Aleksic

Voice signals improvements in compressed wireless communications systems

Patent number: 8953812

Abstract: Improvements in voice signals transmitted within communication systems are obtained by use of adaptive filters, front and rear microphones, noise cancelling systems and other means and methods. Disclosed embodiments include the use of directional microphones, primary inputs, secondary inputs, adaptive weight generators, canceller outputs to improve signal to noise ratios and other communication attributes.

Type: Grant

Filed: July 20, 2013

Date of Patent: February 10, 2015

Inventor: Alon Konchitsky

THOUGHT RECOLLECTION AND SPEECH ASSISTANCE DEVICE

Publication number: 20140074464

Abstract: Some embodiments of the inventive subject matter may include a method for detecting speech loss and supplying appropriate recollection data to the user. The method can include detecting a speech stream from a user. The method can include converting the speech stream to text. The method can include storing the text. The method can include detecting an interruption to the speech stream, wherein the interruption to the speech stream indicates speech loss by the user. The method can include searching a catalog using the text as a search parameter to find relevant catalog data. The method can include presenting the relevant catalog data to remind the user about the speech stream.

Type: Application

Filed: September 12, 2012

Publication date: March 13, 2014

Applicant: International Business Machines Corporation

Inventor: Scott H. Berens

Utilizing Scalar Operations for Recognizing Utterances During Automatic Speech Recognition in Noisy Environments

Publication number: 20140067387

Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.

Type: Application

Filed: September 5, 2012

Publication date: March 6, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong

SIGNAL PROCESSING APPARATUS HAVING VOICE ACTIVITY DETECTION UNIT AND RELATED SIGNAL PROCESSING METHODS

Publication number: 20140012573

Abstract: A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.

Type: Application

Filed: September 13, 2012

Publication date: January 9, 2014

Inventors: Chia-Yu Hung, Tsung-Li Yeh, Yi-Chang Tu

MRI Compatible Headset

Publication number: 20130311176

Abstract: A wireless headset capable of receiving audio signals transmitted wirelessly and compatible for use in an MRI scanner is disclosed. The headset includes a first wireless module connected to the first earphone and a second wireless module connected to the second earphone. Each wireless module is electrically connected to a speaker in the respective earphone. The first wireless module receives the audio signal from a remote source and coordinates transmission of the audio signal to each of the speakers. The compact nature of each earphone minimizes the length of wire runs. In addition, the headset is made of materials having low magnetic susceptibility such that they will not be affected by the magnetic field from the MRI scanner.

Type: Application

Filed: June 8, 2012

Publication date: November 21, 2013

Inventors: Brian Brown, Manuel J. Ferrer Herrera, Richard J. Smaglick

NOISE CANCELLATION METHOD

Publication number: 20130304463

Abstract: An embodiment of the invention provides a noise cancellation method for an electronic device. The method comprises: receiving an audio signal; applying a Fast Fourier Transform operation on the audio signal to generate a sound spectrum; acquiring a first spectrum corresponding to a noise and a second spectrum corresponding to a human voice signal from the sound spectrum; estimating a center frequency according to the first spectrum and the second spectrum; and applying a high pass filtering operation to the sound spectrum according to the center frequency.

Type: Application

Filed: May 14, 2012

Publication date: November 14, 2013

Inventors: Lei Chen, Yu-Chieh Lai, Chun-Ren Hu, Hann-Shi Tong

NON-SPATIAL SPEECH DETECTION SYSTEM AND METHOD OF USING SAME

Publication number: 20130297305

Abstract: A non-spatial speech detection system includes a plurality of microphones whose output is supplied to a fixed beamformer. An adaptive beamformer is used for receiving the output of the plurality of microphones and one or more processors are used for processing an output from the fixed beamformer and identifying speech from noise though the use of an algorithm utilizing a covariance matrix.

Type: Application

Filed: May 2, 2012

Publication date: November 7, 2013

Applicant: GENTEX CORPORATION

Inventors: Robert R. Turnbull, Michael A. Bryson

Adaptive Equalization System

Publication number: 20130297306

Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.

Type: Application

Filed: May 4, 2012

Publication date: November 7, 2013

Applicant: QNX Software Systems Limited

Inventors: Phillip Alan Hetherington, Xueman Li

System and Method for Robust Estimation and Tracking the Fundamental Frequency of Pseudo Periodic Signals in the Presence of Noise

Publication number: 20130246062

Abstract: Method and system for tracking fundamental frequencies of pseudo-periodic signals in the presence of noise that include receiving a time-frequency representation of signals measured in a predefined environment; estimating and tracking a fundamental frequency of a respective pseudo-periodic signal at each time frame of the time-frequency representation by tracking detections of harmonious frequencies in the time-frequency representation over time; and outputting each respective estimated fundamental frequency associated with the pseudo-periodic signal of each respective time frame.

Type: Application

Filed: March 19, 2012

Publication date: September 19, 2013

Applicant: VOCALZOOM SYSTEMS LTD.

Inventors: Yekutiel Avargel, Tal Bakish

COMMUNICATION DEVICE AND METHOD

Publication number: 20130226581

Abstract: A communication method includes: capturing analog sound signals output by the audio output unit, and analyze the captured analog sound signals to obtain a corresponding digital audio information. Comparing the obtained digital audio information with a digital feature information stored in a storage unit to determine whether the obtained digital audio information includes the stored digital feature information. Playing a reply information stored in the storage unit if the obtained digital audio information includes the stored digital feature information.

Type: Application

Filed: September 26, 2012

Publication date: August 29, 2013

Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (Shenzhen) CO., LTD .

Inventors: HONG FU JIN PRECISION INDUSTRY (Shenzhen, HON HAI PRECISION INDUSTRY CO., LTD.

SPEECH SIGNAL PROCESSING RESPONSIVE TO LOW NOISE LEVELS

Publication number: 20130211832

Abstract: A method of speech recognition in a vehicle. Audio including noise and a speech signal representative of an utterance from a user is received via a microphone, and a signal-to-noise ratio (SNR) for the received audio is calculated using a processor. It is determined whether the calculated SNR is greater than a predetermined SNR. If so, then a noise distribution is identified for addition to the received audio, and noise corresponding to the identified noise distribution is injected into the received audio to produce noise-injected audio including the speech signal.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Robert D. Sims

VOICE ACTIVITY DETECTION IN PRESENCE OF BACKGROUND NOISE

Publication number: 20130191117

Abstract: In speech processing systems, compensation is made for sudden changes in the background noise in the average signal-to-noise ratio (SNR) calculation. SNR outlier filtering may be used, alone or in conjunction with weighting the average SNR. Adaptive weights may be applied on the SNRs per band before computing the average SNR. The weighting function can be a function of noise level, noise type, and/or instantaneous SNR value. Another weighting mechanism applies a null filtering or outlier filtering which sets the weight in a particular band to be zero. This particular band may be characterized as the one that exhibits an SNR that is several times higher than the SNRs in other bands.

Type: Application

Filed: November 6, 2012

Publication date: July 25, 2013

Applicant: Qualcomm Incorporated

Inventor: Qualcomm Incorporated

Adaptive filters to improve voice signals in communication systems

Patent number: 8494174

Abstract: A clear, high quality voice signal with a high signal-to-noise ratio is achieved by use of an adaptive noise reduction scheme with two microphones in close proximity. The method includes the use of two omini directional microphones in a highly directional mode, and then applying an adaptive noise cancellation algorithm to reduce the noise.

Type: Grant

Filed: June 14, 2010

Date of Patent: July 23, 2013

Inventor: Alon Konchitsky

METHOD AND SYSTEM FOR USING SOUND RELATED VEHICLE INFORMATION TO ENHANCE SPEECH RECOGNITION

Publication number: 20130185065

Abstract: An audio signal may be received, in a processor associated with a vehicle. Sound related vehicle information representing one or more sounds may be received by the processor. The sound related vehicle information may or may not include an audio signal. A speech recognition process or system may be modified based on the sound related vehicle information.

Type: Application

Filed: January 17, 2012

Publication date: July 18, 2013

Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Inventors: Eli TZIRKEL-HANCOCK, Omer Tsimhoni

METHOD AND SYSTEM FOR USING VEHICLE SOUND INFORMATION TO ENHANCE AUDIO PROMPTING

Publication number: 20130185066

Abstract: Sound related vehicle information representing one or more sounds may be received in a processor associated with a vehicle. The sound related vehicle information may or may not include an audio signal. An audio signal output to a passenger may be modified based on the sound related vehicle information.

Type: Application

Filed: January 17, 2012

Publication date: July 18, 2013

Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Inventors: Eli TZIRKEL-HANCOCK, Omer Tsimhoni

IN-CAR COMMUNICATION SYSTEM FOR MULTIPLE ACOUSTIC ZONES

Publication number: 20130179163

Abstract: An In-Car Communication (ICC) system supports the communication paths within a car by receiving the speech signals of a speaking passenger and playing it back for one or more listening passengers. Signal processing tasks are split into a microphone related part and into a loudspeaker related part. A sound processing system suitable for use in a vehicle having multiple acoustic zones includes a plurality of microphone In-Car Communication (Mic-ICC) instances coupled and a plurality of loudspeaker In-Car Communication (Ls-ICC) instances. The system further includes a dynamic audio routing matrix with a controller and coupled to the Mic-ICC instances, a mixer coupled to the plurality of Mic-ICC instances and a distributor coupled to the Ls-ICC instances.

Type: Application

Filed: January 10, 2012

Publication date: July 11, 2013

Inventors: Tobias Herbig, Markus Buck, Meik Pfeffinger

METHODS AND ELECTRONIC DEVICES FOR SPEECH RECOGNITION

Publication number: 20130144618

Abstract: A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.

Type: Application

Filed: March 12, 2012

Publication date: June 6, 2013

Inventors: Liang-Che Sun, Yiou-Wen Cheng, Chao-Ling Hsu, Jyh-Horng Lin

SPEECH RECOGNITION APPARATUS BASED ON CEPSTRUM FEATURE VECTOR AND METHOD THEREOF

Publication number: 20130138437

Abstract: A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.

Type: Application

Filed: July 25, 2012

Publication date: May 30, 2013

Applicant: Electronics and Telecommunications Research Institute

Inventors: Hoon-Young Cho, Youngik Kim, Sanghun Kim

Semi-Supervised Source Separation Using Non-Negative Techniques

Publication number: 20130132077

Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.

Type: Application

Filed: May 27, 2011

Publication date: May 23, 2013

Inventors: Gautham J. Mysore, Paris Smaragdis

SYSTEMS, DEVICES AND METHODS FOR LIST DISPLAY AND MANAGEMENT

Publication number: 20130103397

Abstract: Exemplary embodiments provide systems, devices and methods that allow creation and management of lists of items in an integrated manner on an interactive graphical user interface. A user may speak a plurality of list items in a natural unbroken manner to provide an audio input stream into an audio input device. Exemplary embodiments may automatically process the audio input stream to convert the stream into a text output, and may process the text output into one or more n-grams that may be used as list items to populate a list on a user interface.

Type: Application

Filed: October 21, 2011

Publication date: April 25, 2013

Applicant: WAL-MART STORES, INC.

Inventors: Dion Almaer, Bernard Paul Cousineau, Ben Galbraith

System and Method for Dynamic Noise Adaptation for Robust Automatic Speech Recognition

Publication number: 20130096915

Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.

Type: Application

Filed: October 17, 2011

Publication date: April 18, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek

Hybrid Client/Server Speech Recognition In A Mobile Device

Publication number: 20130085753

Abstract: A computing device is able to use an embedded speech recognizer and a network speech recognizer for speech recognition. In response to detecting speech in the captured audio, the computing device may forward the captured audio to its embedded speech recognizer and to a speech client for the network speech recognizer. The embedded speech recognizer provides an embedded-recognizer result for the captured audio. If a network-recognition criterion is met, the speech client forwards the captured audio to the network speech recognizer and receives a network-recognizer result for the captured audio from the network speech recognizer. A speech recognition result for the captured audio is forwarded to at least one application, wherein the speech recognition result is based on at least one of the embedded-recognizer result and the network-recognizer result.

Type: Application

Filed: August 15, 2012

Publication date: April 4, 2013

Applicant: GOOGLE INC.

Inventors: Bjorn Erik Bringert, Johan Schalkwyk, Michael J. LeBeau, Richard Zarek Cohen, Luca Zanolin, Simon Tickner

Front-End Noise Reduction for Speech Recognition Engine

Publication number: 20130060567

Abstract: VoIP phones according to the present invention include a microphone, which may be internal or external, and allow the user to communicate unobtrusively, check voice mail and conduct other activities in an environment which can be noisy in general and extremely noisy sometimes. Speech recognition functionally may also be used to generate and send touch tone or DTMF tones such as in response to call trees or voice recognition functionality used by airlines, credit card companies, voice mail systems, and other applications. A system and method of audio processing which provides enhanced speech recognition is provided. Audio input is received at the microphone which is processed by adaptive noise cancellation to generate an enhanced audio signal. The operation of the speech recognition engine and the adaptive noise canceller may be advantageously controlled based on Voice Activity Detection (VAD).

Type: Application

Filed: October 31, 2012

Publication date: March 7, 2013

Inventor: Alon Konchitsky

METHOD FOR THE DETECTION OF SPEECH SEGMENTS

Publication number: 20130054236

Abstract: A method for the detection of noise and speech segments in a digital audio input signal, the input signal being divided into a plurality of frames including a first stage in which a first classification of a frame as noise is performed if the mean energy value for this frame and the previous N frames is not greater than a first energy threshold, N>1, a second stage in which for each frame that has not been classified as noise in the first stage it is decided if the frame is classified as noise or as speech based on combining at least a first criterion of spectral similarity of the frame with acoustic noise and speech models, a second criterion of analysis of the energy of the frame and a third criterion of duration, and of using a state machine for detecting the beginning of a segment as an accumulation of a determined number of consecutive frames with acoustic similarity greater than a first threshold and for detecting the end of the segment; a third stage in which the classification as speech or as noise

Type: Application

Filed: October 7, 2010

Publication date: February 28, 2013

Applicant: TELEFONICA, S.A.

Inventors: Carlos Garcia Martinez, Helenca Duxans Barrobés, Mauricio Sendra Vicens, David Cadenas Sanchez

Method and Apparatus for Performing Song Detection on Audio Signal

Publication number: 20130046536

Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.

Type: Application

Filed: July 26, 2012

Publication date: February 21, 2013

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventors: Lie Lu, Claus Bauer

DEVICE AND METHOD FOR DETERMINING SEPARATION CRITERION OF SOUND SOURCE, AND APPARATUS AND METHOD FOR SEPARATING SOUND SOURCE

Publication number: 20130035935

Abstract: The present invention allows a man to recognize a location of a sound source in a three-dimensional space using two ears and applies a method of separating a sound source in a certain orientation to improve the performance of an application technology using a speech in a noisy environment. The present invention acquires a speech signal using two sensors and determines an orientation angle of a sound source in a zero-crossing point step with respect to a frequency separated signal with a band pass filter bank. An object of the present invention is to obtain excellent sound source orientation detection and division performance which is difficult to be obtained in an existing crossing correlation method calculated in units of time frames in a noisy environment with a plurality of sound sources.

Type: Application

Filed: May 1, 2012

Publication date: February 7, 2013

Applicant: Electronics and Telecommunications Research Institute

Inventors: Young Ik KIM, Hoon Young Cho, Sang Hun Kim

Automatically monitoring for voice input based on context

Patent number: 8359020

Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.

Type: Grant

Filed: August 6, 2010

Date of Patent: January 22, 2013

Assignee: Google Inc.

Inventors: Michael J. Lebeau, John Nicholas Jitkoff, Dave Burke

SOUND SOURCES SEPARATION AND MONITORING USING DIRECTIONAL COHERENT ELECTROMAGNETIC WAVES

Publication number: 20130006624

Abstract: An apparatus and a method that achieve physical separation of sound sources by pointing directly a beam of coherent electromagnetic waves (i.e. laser). Analyzing the physical properties of a beam reflected from the vibrations generating sound source enable the reconstruction of the sound signal generated by the sound source, eliminating the noise component added to the original sound signal. In addition, the use of multiple electromagnetic waves beams or a beam that rapidly skips from one sound source to another allows the physical separation of these sound sources. Aiming each beam to a different sound source ensures the independence of the sound signals sources and therefore provides full sources separation.

Type: Application

Filed: September 12, 2012

Publication date: January 3, 2013

Applicant: AUDIOZOOM LTD

Inventor: Tal Bakish

SPEECH FEATURE EXTRACTION APPARATUS, SPEECH FEATURE EXTRACTION METHOD, AND SPEECH FEATURE EXTRACTION PROGRAM

Publication number: 20120330657

Abstract: A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.

Type: Application

Filed: September 6, 2012

Publication date: December 27, 2012

Applicant: International Business Machines Corporation

Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura

VOICE RECOGNITION DEVICE

Publication number: 20120330655

Abstract: A voice recognition device includes a voice recognition dictionary in which a word which is recognized as a result of voice recognition on an inputted voice is registered, a reply voice data storage unit for storing recorded voice data about words registered in the voice recognition dictionary, a dialog control unit for, when a word registered in the voice recognition dictionary is recognized, acquiring recorded voice data corresponding to the word from the reply voice data storage unit, a reproduction noise reduction unit for carrying out a process of reducing noise included in the recorded voice data, an amplitude adjusting unit for adjusting an amplitude of the recorded voice data in which the noise has been reduced to a predetermined amplitude level, and a voice reproduction unit for reproducing a voice from the amplitude-adjusted recorded voice data.

Type: Application

Filed: June 28, 2010

Publication date: December 27, 2012

Inventors: Masanobu Osawa, Kazuyuki Nogi

VOICE DATA TRANSFERRING DEVICE, TERMINAL DEVICE, VOICE DATA TRANSFERRING METHOD, AND VOICE RECOGNITION SYSTEM

Publication number: 20120330651

Abstract: A voice data transferring device intermediates between an in-vehicle terminal and a voice recognition server. In order to check a change in voice recognition performance of the voice recognition server, the voice data transferring device performs a noise suppression processing on a voice data for evaluation in a noise suppression module; transmits the voice data for evaluation to the voice recognition server; and receives a recognition result thereof. The voice data transferring device sets a value of a noise suppression parameter used for a noise suppression processing or a value of a result integration parameter used for a processing of integrating a plurality of recognition results acquired from the voice recognition server, at an optimum value, based on the recognition result of the voice recognition server. This makes it possible to set a suitable parameter even if the voice recognition performance of the voice recognition server changes.

Type: Application

Filed: June 22, 2012

Publication date: December 27, 2012

Inventors: Yasunari Obuchi, Takeshi Homma

VOICE ACTIVITY DETECTION

Publication number: 20120330656

Abstract: Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.

Type: Application

Filed: September 4, 2012

Publication date: December 27, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Zica Valsan

ADAPTIVE ACTIVE NOISE CANCELING FOR HANDSET

Publication number: 20120316872

Abstract: Embodiments of the present invention provide an adaptive noise canceling system. The adaptive noise canceling system may be used in a handset to cancel background noise by generating an anti-noise signal. The adaptive noise canceling system may include first input to receive a first signal from a feedforward microphone; a second input to receive a second signal from an error microphone; a controller coupled to the inputs, the controller configured to adaptively generate an anti-noise signal according to the received signals, wherein the controller derives a profile of the anti-noise signal from the first signal and derives a magnitude of the anti-noise signal from both first and second signal; and an output to transmit the anti-noise signal to a speaker.

Type: Application

Filed: June 7, 2011

Publication date: December 13, 2012

Applicant: ANALOG DEVICES, INC.

Inventors: Thomas Stoltz, Kim Spetzler Berthelsen, Robert Adams

Method And Apparatus For Voice Activity Determination

Publication number: 20120310641

Abstract: In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.

Type: Application

Filed: August 13, 2012

Publication date: December 6, 2012

Inventors: Riitta Elina Niemistö, Päivi Marianna Valve

MIC COVERING DETECTION IN PERSONAL AUDIO DEVICES

Publication number: 20120310640

Abstract: A personal audio device, such as a wireless telephone, includes noise canceling circuit that adaptively generates an anti-noise signal from a reference microphone signal and injects the anti-noise signal into the speaker or other transducer output to cause cancellation of ambient audio sounds. An error microphone may also be provided proximate the speaker to estimate an electro-acoustical path from the noise canceling circuit through the transducer. A processing circuit uses the reference and/or error microphone, optionally along with a microphone provided for capturing near-end speech, to determine whether one of the reference or error microphones is obstructed by comparing their received signal content and takes action to avoid generation of erroneous anti-noise.

Type: Application

Filed: September 30, 2011

Publication date: December 6, 2012

Inventors: Nitin Kwatra, Jeffrey Alderson, Jon D. Hendrix

Automatically monitoring for voice input based on context

Patent number: 8326328

Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.

Type: Grant

Filed: September 29, 2011

Date of Patent: December 4, 2012

Assignee: Google Inc.

Inventors: Michael J. LeBeau, John Nicholas Jitkoff, Dave Burke

Robust Noise Estimation

Publication number: 20120303367

Abstract: An enhancement system improves the estimate of noise from a received signal. The system includes a spectrum monitor that divides a portion of the signal at more than one frequency resolution. Adaptation logic derives a noise adaptation factor of the received signal. A plurality of devices tracks the characteristics of an estimated noise in the received signal and modifies multiple noise adaptation rates. Weighting logic applies the modified noise adaptation rates derived from the signal divided at a first frequency resolution to the signal divided at a second frequency resolution.

Type: Application

Filed: August 13, 2012

Publication date: November 29, 2012

Applicant: QNX Software Systems Limited

Inventor: Phillip A. Hetherington

SYSTEM FOR DETECTING SPEECH WITH BACKGROUND VOICE ESTIMATES AND NOISE ESTIMATES

Publication number: 20120303366

Abstract: A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.

Type: Application

Filed: August 3, 2012

Publication date: November 29, 2012

Inventors: Phillip Alan Hetherington, Mark Ryan Fallat