Data Science Leader, AIG Science
Dr. Nishant Chandra leads the AIG Science R&D group in India where he develops natural language, text mining, and machine learning models for the insurance industry. He also directs the development of natural language platform and applications such as privilege text classification, contextual summarization, and conversational sentiment abstraction.
Prior to AIG, Dr. Chandra has driven innovation in BFSI, e-commerce, R&D, and mobile telecom industries in USA and India. He developed and implemented natural language predictive models that are deployed in top banks and telecom companies resulting significant impact across value chain. For his contributions, Dr. Chandra was recently acknowledged as the top 10 data scientist in India by Analytics India magazine. He has received the prestigious Barrier fellowship and several other awards and recognition. The Department of Homeland Security, United States Government has classified Dr. Chandra as an outstanding researcher. He was the conference session chair for GSPx conference at San Jose, California.
He has been a reviewer for IEEE transactions, served in the editorial board of Human Language Technology conference, and speaker at several international conferences. He also has five assigned patents and several journal and conference publications. Dr. Chandra is a passionate puzzler who invents puzzles and has represented India in the World Puzzle Championship at Stamford, Connecticut. He received his Ph.D. in Electrical and Computer Engineering from Mississippi State University.
The inventive system can automatically annotate the relationship of text and acoustic units for the purposes of: (a) predicting how the text is to be pronounced as expressively synthesized speech, and (b) improving the proportion of expressively uttered speech as correctly identified text representing the speaker’s message. The system can automatically annotate text corpora for relationships of uttered speech for a particular speaking style and for acoustic units in terms of context and content of the text to the utterances. The inventive system can use kinesthetically defined expressive speech production phonetics that are recognizable and controllable according to kinesensic feedback principles. In speech synthesis embodiments of the invention, the text annotations can specify how the text is to be expressively pronounced as synthesized speech. Also, acoustically-identifying features for dialects or mispronunciations can be identified so as to expressively synthesize alternative dialects or stylistic mispronunciations for a speaker from a given text. In speech recognition embodiments of the invention, each text annotation can be uniquely identified from the corresponding acoustic features of a unit of uttered speech to correctly identify the corresponding text. By employing a method of rules-based text annotation, the invention enables expressiveness to be altered to reflect syntactic, semantic, and/or discourse circumstances found in text to be synthesized or in an uttered message.
Disclosed are novel embodiments of a speech synthesizer and speech synthesis method for generating human-like speech wherein a speech signal can be generated by concatenation from phonemes stored in a phoneme database. Wavelet transforms and interpolation between frames can be employed to effect smooth morphological fusion of adjacent phonemes in the output signal. The phonemes may have one prosody or set of prosody characteristics and one or more alternative prosodies may be created by applying prosody modification parameters to the phonemes from a differential prosody database. Preferred embodiments can provide fast, resource-efficient speech synthesis with an appealing musical or rhythmic output in a desired prosody style such as reportorial or human interest. The invention includes computer-determining a suitable prosody to apply to a portion of the text by reference to the determined semantic meaning of another portion of the text and applying the detennined prosody to the text by modification of the digitized phonemes. In this manner, prosodization can effectively be automated.
A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.