Speech Recognition Techniques For Robustness In Adverse Environments, E.g., In Noise, Of Stress Induced Speech, Etc. (epo) Patents (Class 704/E15.039)
  • Patent number: 11984110
    Abstract: A device operates to perform acoustic echo cancellation. The device includes a speaker to output a far-end signal at the device, a microphone to receive at least a near-end signal and the far-end signal from the speaker to produce a microphone output, and an AI accelerator operative to perform neural network operations according to a first neural network model and a second neural network model to output an echo-suppressed signal. The device further includes a digital signal processing (DSP) unit. The DSP unit is operative to perform adaptive filtering to remove at least a portion of the far-end signal from the microphone output to generate a filtered near-end signal, and perform Fast Fourier Transform (FFT) and inverse FFT (IFFT) to generate input to the first neural network model and the second neural network model, respectively.
    Type: Grant
    Filed: March 7, 2022
    Date of Patent: May 14, 2024
    Assignee: MEDIATEK SINGAPORE PTE. LTD.
    Inventors: Xiaoxi Yu, Hantao Huang, Ziang Yang, Chia Hsin Yang, Li-Wei Cheng
  • Patent number: 11917384
    Abstract: Disclosed herein are systems and methods for processing speech signals in mixed reality applications. A method may include receiving an audio signal; determining, via first processors, whether the audio signal comprises a voice onset event; in accordance with a determination that the audio signal comprises the voice onset event: waking a second one or more processors; determining, via the second processors, that the audio signal comprises a predetermined trigger signal; in accordance with a determination that the audio signal comprises the predetermined trigger signal: waking third processors; performing, via the third processors, automatic speech recognition based on the audio signal; and in accordance with a determination that the audio signal does not comprise the predetermined trigger signal: forgoing waking the third processors; and in accordance with a determination that the audio signal does not comprise the voice onset event: forgoing waking the second processors.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: February 27, 2024
    Assignee: Magic Leap, Inc.
    Inventors: David Thomas Roach, Jean-Marc Jot, Jung-Suk Lee
  • Patent number: 11863938
    Abstract: The present application relates to a hearing aid adapted to be worn in or at an ear of a hearing aid user and/or to be fully or partially implanted in the head of the hearing aid user.
    Type: Grant
    Filed: May 27, 2022
    Date of Patent: January 2, 2024
    Assignee: Oticon A/S
    Inventors: Thomas Lunner, Lars Bramsløw
  • Patent number: 11790935
    Abstract: In some embodiments, a first audio signal is received via a first microphone, and a first probability of voice activity is determined based on the first audio signal. A second audio signal is received via a second microphone, and a second probability of voice activity is determined based on the first and second audio signals. Whether a first threshold of voice activity is met is determined based on the first and second probabilities of voice activity. In accordance with a determination that a first threshold of voice activity is met, it is determined that a voice onset has occurred, and an alert is transmitted to a processor based on the determination that the voice onset has occurred. In accordance with a determination that a first threshold of voice activity is not met, it is not determined that a voice onset has occurred.
    Type: Grant
    Filed: April 6, 2022
    Date of Patent: October 17, 2023
    Assignee: Magic Leap, Inc.
    Inventors: Jung-Suk Lee, Jean-Marc Jot
  • Patent number: 11765501
    Abstract: Methods and systems for identifying abnormal sounds in a particular environment. A normal audio stream obtained in the absence of abnormal sounds may be used as a baseline for subsequently processing an incoming audio stream with a processor to determine whether the incoming audio stream from the microphone in the particular environment includes an abnormal audio event for the particular environment. When it is determined that the incoming audio stream includes an abnormal audio event for the particular environment an electronic database may be accessed to determine a location of the abnormal audio event in the particular environment. A video camera with a field of view that includes the location of the abnormal audio event in the particular environment may be identified and the video stream from the identified video camera retrieved and displayed.
    Type: Grant
    Filed: March 10, 2021
    Date of Patent: September 19, 2023
    Assignee: HONEYWELL INTERNATIONAL INC.
    Inventors: Lalitha M. Eswara, Syed Omar Khaiyam, Siddharth Sonkamble, Deepak Kaul, K Karthikeyan
  • Patent number: 11721334
    Abstract: A method and apparatus for controlling a device according to an embodiment of the present disclosure may be based on a speech feature of a user reflecting the Lombard effect so as to operate a device located far away from the user, among a plurality of electronic devices. As such, even when the user calls a device located far away from the user without any separate context information, speech recognition neural networks and weight calculation neural networks may be selected and used to operate the device located far away from the user, and reception of a speech signal of the user calling a device located far away from the user may be performed in an Internet of Things (IoT) environment using a 5G network.
    Type: Grant
    Filed: March 5, 2020
    Date of Patent: August 8, 2023
    Assignee: LG ELECTRONICS INC.
    Inventors: Jong Hoon Chae, Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang
  • Patent number: 11683638
    Abstract: A modular speaker system, comprising an exoskeleton, configured to mechanically support and quick attach and release at least one functional panel and an electrical interface provided within the exoskeleton, configured to mate with a corresponding electrical connector of the functional panel. An optional endoskeleton is provided to support internal components. The system preferably provides a digital electronic controller, and the electrical interface is a digital data and power bus, with multiplexed communications between the elements of the system. The elements of the system preferably include at least one speaker, and other audiovisual and communications components. Multiple modules may be interconnected, communicating through the electrical interface. A base module may be provided to provide power and typical control, user and audiovisual interface connectors.
    Type: Grant
    Filed: July 4, 2022
    Date of Patent: June 20, 2023
    Assignee: Sonic Blocks, Inc.
    Inventors: Scott D. Wilker, Jordan D. Wilker
  • Patent number: 11631404
    Abstract: Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: April 18, 2023
    Assignee: ROKU, INC.
    Inventors: Jose Pio Pereira, Sunil Suresh Kulkarni, Mihailo M. Stojancic, Shashank Merchant, Peter Wendt
  • Patent number: 11605379
    Abstract: Disclosed is an artificial intelligence server. The artificial intelligence server includes a communicator in communication with at least one electronic device and a processor for receiving input data from a specific electronic device, applying personalized information corresponding to the specific electronic device to a recognition model, inputting the input data into the recognition model to which the personalized information is applied to obtain a final result value, and transmitting the final result value to the specific electronic device.
    Type: Grant
    Filed: July 11, 2019
    Date of Patent: March 14, 2023
    Assignee: LG ELECTRONICS INC.
    Inventor: Jongwoo Han
  • Patent number: 11587563
    Abstract: A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.
    Type: Grant
    Filed: February 28, 2020
    Date of Patent: February 21, 2023
    Assignee: Magic Leap, Inc.
    Inventors: Anthony Robert Sheeder, Colby Nelson Leider
  • Patent number: 11363367
    Abstract: A dual-microphone arrangement (300) provides improve voice performance in a wireless headset (12). A vibration sensor (1130) is used for voice pickup and will add low-frequency voice audio content in windy conditions. An equalizer (810) is used to restore low-frequency voice audio content in wind-free conditions. Depending on the measured wind power, the output will derive more signal from the equalizer (810) or more signal from the vibration sensor (1130).
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: June 14, 2022
    Assignee: Dopple IP B.V.
    Inventors: Jacobus Cornelis Haartsen, Aalbert Stek
  • Publication number: 20150149167
    Abstract: Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data.
    Type: Application
    Filed: September 30, 2011
    Publication date: May 28, 2015
    Applicant: GOOGLE INC.
    Inventors: Françoise Beaufays, Johan Schalkwyk, Vincent Olivier Vanhoucke, Petar Stanisa Aleksic
  • Patent number: 8953812
    Abstract: Improvements in voice signals transmitted within communication systems are obtained by use of adaptive filters, front and rear microphones, noise cancelling systems and other means and methods. Disclosed embodiments include the use of directional microphones, primary inputs, secondary inputs, adaptive weight generators, canceller outputs to improve signal to noise ratios and other communication attributes.
    Type: Grant
    Filed: July 20, 2013
    Date of Patent: February 10, 2015
    Inventor: Alon Konchitsky
  • Publication number: 20140074464
    Abstract: Some embodiments of the inventive subject matter may include a method for detecting speech loss and supplying appropriate recollection data to the user. The method can include detecting a speech stream from a user. The method can include converting the speech stream to text. The method can include storing the text. The method can include detecting an interruption to the speech stream, wherein the interruption to the speech stream indicates speech loss by the user. The method can include searching a catalog using the text as a search parameter to find relevant catalog data. The method can include presenting the relevant catalog data to remind the user about the speech stream.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Applicant: International Business Machines Corporation
    Inventor: Scott H. Berens
  • Publication number: 20140067387
    Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong
  • Publication number: 20140012573
    Abstract: A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 9, 2014
    Inventors: Chia-Yu Hung, Tsung-Li Yeh, Yi-Chang Tu
  • Publication number: 20130311176
    Abstract: A wireless headset capable of receiving audio signals transmitted wirelessly and compatible for use in an MRI scanner is disclosed. The headset includes a first wireless module connected to the first earphone and a second wireless module connected to the second earphone. Each wireless module is electrically connected to a speaker in the respective earphone. The first wireless module receives the audio signal from a remote source and coordinates transmission of the audio signal to each of the speakers. The compact nature of each earphone minimizes the length of wire runs. In addition, the headset is made of materials having low magnetic susceptibility such that they will not be affected by the magnetic field from the MRI scanner.
    Type: Application
    Filed: June 8, 2012
    Publication date: November 21, 2013
    Inventors: Brian Brown, Manuel J. Ferrer Herrera, Richard J. Smaglick
  • Publication number: 20130304463
    Abstract: An embodiment of the invention provides a noise cancellation method for an electronic device. The method comprises: receiving an audio signal; applying a Fast Fourier Transform operation on the audio signal to generate a sound spectrum; acquiring a first spectrum corresponding to a noise and a second spectrum corresponding to a human voice signal from the sound spectrum; estimating a center frequency according to the first spectrum and the second spectrum; and applying a high pass filtering operation to the sound spectrum according to the center frequency.
    Type: Application
    Filed: May 14, 2012
    Publication date: November 14, 2013
    Inventors: Lei Chen, Yu-Chieh Lai, Chun-Ren Hu, Hann-Shi Tong
  • Publication number: 20130297305
    Abstract: A non-spatial speech detection system includes a plurality of microphones whose output is supplied to a fixed beamformer. An adaptive beamformer is used for receiving the output of the plurality of microphones and one or more processors are used for processing an output from the fixed beamformer and identifying speech from noise though the use of an algorithm utilizing a covariance matrix.
    Type: Application
    Filed: May 2, 2012
    Publication date: November 7, 2013
    Applicant: GENTEX CORPORATION
    Inventors: Robert R. Turnbull, Michael A. Bryson
  • Publication number: 20130297306
    Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.
    Type: Application
    Filed: May 4, 2012
    Publication date: November 7, 2013
    Applicant: QNX Software Systems Limited
    Inventors: Phillip Alan Hetherington, Xueman Li
  • Publication number: 20130246062
    Abstract: Method and system for tracking fundamental frequencies of pseudo-periodic signals in the presence of noise that include receiving a time-frequency representation of signals measured in a predefined environment; estimating and tracking a fundamental frequency of a respective pseudo-periodic signal at each time frame of the time-frequency representation by tracking detections of harmonious frequencies in the time-frequency representation over time; and outputting each respective estimated fundamental frequency associated with the pseudo-periodic signal of each respective time frame.
    Type: Application
    Filed: March 19, 2012
    Publication date: September 19, 2013
    Applicant: VOCALZOOM SYSTEMS LTD.
    Inventors: Yekutiel Avargel, Tal Bakish
  • Publication number: 20130226581
    Abstract: A communication method includes: capturing analog sound signals output by the audio output unit, and analyze the captured analog sound signals to obtain a corresponding digital audio information. Comparing the obtained digital audio information with a digital feature information stored in a storage unit to determine whether the obtained digital audio information includes the stored digital feature information. Playing a reply information stored in the storage unit if the obtained digital audio information includes the stored digital feature information.
    Type: Application
    Filed: September 26, 2012
    Publication date: August 29, 2013
    Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (Shenzhen) CO., LTD .
    Inventors: HONG FU JIN PRECISION INDUSTRY (Shenzhen, HON HAI PRECISION INDUSTRY CO., LTD.
  • Publication number: 20130211832
    Abstract: A method of speech recognition in a vehicle. Audio including noise and a speech signal representative of an utterance from a user is received via a microphone, and a signal-to-noise ratio (SNR) for the received audio is calculated using a processor. It is determined whether the calculated SNR is greater than a predetermined SNR. If so, then a noise distribution is identified for addition to the received audio, and noise corresponding to the identified noise distribution is injected into the received audio to produce noise-injected audio including the speech signal.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Robert D. Sims
  • Publication number: 20130191117
    Abstract: In speech processing systems, compensation is made for sudden changes in the background noise in the average signal-to-noise ratio (SNR) calculation. SNR outlier filtering may be used, alone or in conjunction with weighting the average SNR. Adaptive weights may be applied on the SNRs per band before computing the average SNR. The weighting function can be a function of noise level, noise type, and/or instantaneous SNR value. Another weighting mechanism applies a null filtering or outlier filtering which sets the weight in a particular band to be zero. This particular band may be characterized as the one that exhibits an SNR that is several times higher than the SNRs in other bands.
    Type: Application
    Filed: November 6, 2012
    Publication date: July 25, 2013
    Applicant: Qualcomm Incorporated
    Inventor: Qualcomm Incorporated
  • Patent number: 8494174
    Abstract: A clear, high quality voice signal with a high signal-to-noise ratio is achieved by use of an adaptive noise reduction scheme with two microphones in close proximity. The method includes the use of two omini directional microphones in a highly directional mode, and then applying an adaptive noise cancellation algorithm to reduce the noise.
    Type: Grant
    Filed: June 14, 2010
    Date of Patent: July 23, 2013
    Inventor: Alon Konchitsky
  • Publication number: 20130185065
    Abstract: An audio signal may be received, in a processor associated with a vehicle. Sound related vehicle information representing one or more sounds may be received by the processor. The sound related vehicle information may or may not include an audio signal. A speech recognition process or system may be modified based on the sound related vehicle information.
    Type: Application
    Filed: January 17, 2012
    Publication date: July 18, 2013
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Eli TZIRKEL-HANCOCK, Omer Tsimhoni
  • Publication number: 20130185066
    Abstract: Sound related vehicle information representing one or more sounds may be received in a processor associated with a vehicle. The sound related vehicle information may or may not include an audio signal. An audio signal output to a passenger may be modified based on the sound related vehicle information.
    Type: Application
    Filed: January 17, 2012
    Publication date: July 18, 2013
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Eli TZIRKEL-HANCOCK, Omer Tsimhoni
  • Publication number: 20130179163
    Abstract: An In-Car Communication (ICC) system supports the communication paths within a car by receiving the speech signals of a speaking passenger and playing it back for one or more listening passengers. Signal processing tasks are split into a microphone related part and into a loudspeaker related part. A sound processing system suitable for use in a vehicle having multiple acoustic zones includes a plurality of microphone In-Car Communication (Mic-ICC) instances coupled and a plurality of loudspeaker In-Car Communication (Ls-ICC) instances. The system further includes a dynamic audio routing matrix with a controller and coupled to the Mic-ICC instances, a mixer coupled to the plurality of Mic-ICC instances and a distributor coupled to the Ls-ICC instances.
    Type: Application
    Filed: January 10, 2012
    Publication date: July 11, 2013
    Inventors: Tobias Herbig, Markus Buck, Meik Pfeffinger
  • Publication number: 20130144618
    Abstract: A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.
    Type: Application
    Filed: March 12, 2012
    Publication date: June 6, 2013
    Inventors: Liang-Che Sun, Yiou-Wen Cheng, Chao-Ling Hsu, Jyh-Horng Lin
  • Publication number: 20130138437
    Abstract: A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.
    Type: Application
    Filed: July 25, 2012
    Publication date: May 30, 2013
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Hoon-Young Cho, Youngik Kim, Sanghun Kim
  • Publication number: 20130132077
    Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.
    Type: Application
    Filed: May 27, 2011
    Publication date: May 23, 2013
    Inventors: Gautham J. Mysore, Paris Smaragdis
  • Publication number: 20130103397
    Abstract: Exemplary embodiments provide systems, devices and methods that allow creation and management of lists of items in an integrated manner on an interactive graphical user interface. A user may speak a plurality of list items in a natural unbroken manner to provide an audio input stream into an audio input device. Exemplary embodiments may automatically process the audio input stream to convert the stream into a text output, and may process the text output into one or more n-grams that may be used as list items to populate a list on a user interface.
    Type: Application
    Filed: October 21, 2011
    Publication date: April 25, 2013
    Applicant: WAL-MART STORES, INC.
    Inventors: Dion Almaer, Bernard Paul Cousineau, Ben Galbraith
  • Publication number: 20130096915
    Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.
    Type: Application
    Filed: October 17, 2011
    Publication date: April 18, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
  • Publication number: 20130085753
    Abstract: A computing device is able to use an embedded speech recognizer and a network speech recognizer for speech recognition. In response to detecting speech in the captured audio, the computing device may forward the captured audio to its embedded speech recognizer and to a speech client for the network speech recognizer. The embedded speech recognizer provides an embedded-recognizer result for the captured audio. If a network-recognition criterion is met, the speech client forwards the captured audio to the network speech recognizer and receives a network-recognizer result for the captured audio from the network speech recognizer. A speech recognition result for the captured audio is forwarded to at least one application, wherein the speech recognition result is based on at least one of the embedded-recognizer result and the network-recognizer result.
    Type: Application
    Filed: August 15, 2012
    Publication date: April 4, 2013
    Applicant: GOOGLE INC.
    Inventors: Bjorn Erik Bringert, Johan Schalkwyk, Michael J. LeBeau, Richard Zarek Cohen, Luca Zanolin, Simon Tickner
  • Publication number: 20130060567
    Abstract: VoIP phones according to the present invention include a microphone, which may be internal or external, and allow the user to communicate unobtrusively, check voice mail and conduct other activities in an environment which can be noisy in general and extremely noisy sometimes. Speech recognition functionally may also be used to generate and send touch tone or DTMF tones such as in response to call trees or voice recognition functionality used by airlines, credit card companies, voice mail systems, and other applications. A system and method of audio processing which provides enhanced speech recognition is provided. Audio input is received at the microphone which is processed by adaptive noise cancellation to generate an enhanced audio signal. The operation of the speech recognition engine and the adaptive noise canceller may be advantageously controlled based on Voice Activity Detection (VAD).
    Type: Application
    Filed: October 31, 2012
    Publication date: March 7, 2013
    Inventor: Alon Konchitsky
  • Publication number: 20130054236
    Abstract: A method for the detection of noise and speech segments in a digital audio input signal, the input signal being divided into a plurality of frames including a first stage in which a first classification of a frame as noise is performed if the mean energy value for this frame and the previous N frames is not greater than a first energy threshold, N>1, a second stage in which for each frame that has not been classified as noise in the first stage it is decided if the frame is classified as noise or as speech based on combining at least a first criterion of spectral similarity of the frame with acoustic noise and speech models, a second criterion of analysis of the energy of the frame and a third criterion of duration, and of using a state machine for detecting the beginning of a segment as an accumulation of a determined number of consecutive frames with acoustic similarity greater than a first threshold and for detecting the end of the segment; a third stage in which the classification as speech or as noise
    Type: Application
    Filed: October 7, 2010
    Publication date: February 28, 2013
    Applicant: TELEFONICA, S.A.
    Inventors: Carlos Garcia Martinez, Helenca Duxans Barrobés, Mauricio Sendra Vicens, David Cadenas Sanchez
  • Publication number: 20130046536
    Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.
    Type: Application
    Filed: July 26, 2012
    Publication date: February 21, 2013
    Applicant: DOLBY LABORATORIES LICENSING CORPORATION
    Inventors: Lie Lu, Claus Bauer
  • Publication number: 20130035935
    Abstract: The present invention allows a man to recognize a location of a sound source in a three-dimensional space using two ears and applies a method of separating a sound source in a certain orientation to improve the performance of an application technology using a speech in a noisy environment. The present invention acquires a speech signal using two sensors and determines an orientation angle of a sound source in a zero-crossing point step with respect to a frequency separated signal with a band pass filter bank. An object of the present invention is to obtain excellent sound source orientation detection and division performance which is difficult to be obtained in an existing crossing correlation method calculated in units of time frames in a noisy environment with a plurality of sound sources.
    Type: Application
    Filed: May 1, 2012
    Publication date: February 7, 2013
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Young Ik KIM, Hoon Young Cho, Sang Hun Kim
  • Patent number: 8359020
    Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.
    Type: Grant
    Filed: August 6, 2010
    Date of Patent: January 22, 2013
    Assignee: Google Inc.
    Inventors: Michael J. Lebeau, John Nicholas Jitkoff, Dave Burke
  • Publication number: 20130006624
    Abstract: An apparatus and a method that achieve physical separation of sound sources by pointing directly a beam of coherent electromagnetic waves (i.e. laser). Analyzing the physical properties of a beam reflected from the vibrations generating sound source enable the reconstruction of the sound signal generated by the sound source, eliminating the noise component added to the original sound signal. In addition, the use of multiple electromagnetic waves beams or a beam that rapidly skips from one sound source to another allows the physical separation of these sound sources. Aiming each beam to a different sound source ensures the independence of the sound signals sources and therefore provides full sources separation.
    Type: Application
    Filed: September 12, 2012
    Publication date: January 3, 2013
    Applicant: AUDIOZOOM LTD
    Inventor: Tal Bakish
  • Publication number: 20120330657
    Abstract: A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.
    Type: Application
    Filed: September 6, 2012
    Publication date: December 27, 2012
    Applicant: International Business Machines Corporation
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Publication number: 20120330655
    Abstract: A voice recognition device includes a voice recognition dictionary in which a word which is recognized as a result of voice recognition on an inputted voice is registered, a reply voice data storage unit for storing recorded voice data about words registered in the voice recognition dictionary, a dialog control unit for, when a word registered in the voice recognition dictionary is recognized, acquiring recorded voice data corresponding to the word from the reply voice data storage unit, a reproduction noise reduction unit for carrying out a process of reducing noise included in the recorded voice data, an amplitude adjusting unit for adjusting an amplitude of the recorded voice data in which the noise has been reduced to a predetermined amplitude level, and a voice reproduction unit for reproducing a voice from the amplitude-adjusted recorded voice data.
    Type: Application
    Filed: June 28, 2010
    Publication date: December 27, 2012
    Inventors: Masanobu Osawa, Kazuyuki Nogi
  • Publication number: 20120330651
    Abstract: A voice data transferring device intermediates between an in-vehicle terminal and a voice recognition server. In order to check a change in voice recognition performance of the voice recognition server, the voice data transferring device performs a noise suppression processing on a voice data for evaluation in a noise suppression module; transmits the voice data for evaluation to the voice recognition server; and receives a recognition result thereof. The voice data transferring device sets a value of a noise suppression parameter used for a noise suppression processing or a value of a result integration parameter used for a processing of integrating a plurality of recognition results acquired from the voice recognition server, at an optimum value, based on the recognition result of the voice recognition server. This makes it possible to set a suitable parameter even if the voice recognition performance of the voice recognition server changes.
    Type: Application
    Filed: June 22, 2012
    Publication date: December 27, 2012
    Inventors: Yasunari Obuchi, Takeshi Homma
  • Publication number: 20120330656
    Abstract: Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.
    Type: Application
    Filed: September 4, 2012
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Zica Valsan
  • Publication number: 20120316872
    Abstract: Embodiments of the present invention provide an adaptive noise canceling system. The adaptive noise canceling system may be used in a handset to cancel background noise by generating an anti-noise signal. The adaptive noise canceling system may include first input to receive a first signal from a feedforward microphone; a second input to receive a second signal from an error microphone; a controller coupled to the inputs, the controller configured to adaptively generate an anti-noise signal according to the received signals, wherein the controller derives a profile of the anti-noise signal from the first signal and derives a magnitude of the anti-noise signal from both first and second signal; and an output to transmit the anti-noise signal to a speaker.
    Type: Application
    Filed: June 7, 2011
    Publication date: December 13, 2012
    Applicant: ANALOG DEVICES, INC.
    Inventors: Thomas Stoltz, Kim Spetzler Berthelsen, Robert Adams
  • Publication number: 20120310641
    Abstract: In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.
    Type: Application
    Filed: August 13, 2012
    Publication date: December 6, 2012
    Inventors: Riitta Elina Niemistö, Päivi Marianna Valve
  • Publication number: 20120310640
    Abstract: A personal audio device, such as a wireless telephone, includes noise canceling circuit that adaptively generates an anti-noise signal from a reference microphone signal and injects the anti-noise signal into the speaker or other transducer output to cause cancellation of ambient audio sounds. An error microphone may also be provided proximate the speaker to estimate an electro-acoustical path from the noise canceling circuit through the transducer. A processing circuit uses the reference and/or error microphone, optionally along with a microphone provided for capturing near-end speech, to determine whether one of the reference or error microphones is obstructed by comparing their received signal content and takes action to avoid generation of erroneous anti-noise.
    Type: Application
    Filed: September 30, 2011
    Publication date: December 6, 2012
    Inventors: Nitin Kwatra, Jeffrey Alderson, Jon D. Hendrix
  • Patent number: 8326328
    Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.
    Type: Grant
    Filed: September 29, 2011
    Date of Patent: December 4, 2012
    Assignee: Google Inc.
    Inventors: Michael J. LeBeau, John Nicholas Jitkoff, Dave Burke
  • Publication number: 20120303367
    Abstract: An enhancement system improves the estimate of noise from a received signal. The system includes a spectrum monitor that divides a portion of the signal at more than one frequency resolution. Adaptation logic derives a noise adaptation factor of the received signal. A plurality of devices tracks the characteristics of an estimated noise in the received signal and modifies multiple noise adaptation rates. Weighting logic applies the modified noise adaptation rates derived from the signal divided at a first frequency resolution to the signal divided at a second frequency resolution.
    Type: Application
    Filed: August 13, 2012
    Publication date: November 29, 2012
    Applicant: QNX Software Systems Limited
    Inventor: Phillip A. Hetherington
  • Publication number: 20120303366
    Abstract: A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.
    Type: Application
    Filed: August 3, 2012
    Publication date: November 29, 2012
    Inventors: Phillip Alan Hetherington, Mark Ryan Fallat