Back in college I fell in love with the process behind Text Markup and the programming tool that we used to analyze texts with, SPITBOL. The initial direction of text analysis was just a starting point for me and I have since created the outline for a much grander project, which has gone nowhere.
I have this theory that human speech isn’t just a repetition of words but, when done confidently, a subtle musical flow that can be extracted and then applied to texts. By finding that flow of music which is a combination of the sounds that are in each English word and the pauses and gaps between words, one could create a text to speech system that would sound completely natural to anyone listening to it.
In my spare time at college I would try to take the code I new and find out everything about how it functioned and what else i could do with it to meet this new idea. SPITBOL was designed so that it would allow me to access external programs and include them in the final compiled code. This then gave me the flexibility to create a program to create my own dictionary to build the initial phonetic sounds of each word.
The process at this point is simple. Take a list of all the words in the English language (downloaded from a website as a text file), have a program look up each word on a dictionary website and find the corresponding phonetic breakdown, and then write both the word and the breakdown to a new list.
Another list of the phonetic pronunciations of letters and letter combinations would need similar treatment but instead of just grabbing a corresponding list of data from online, one would need to create a series of musical notes behind those sounds and then apply them to the list of words from earlier. This then would give each word it’s own musical score.
From that preliminary data then, the main program could read any text and play back the resulting musical representation of it. You would be able to “hear” the words but not the actual pronunciation of the words.
Of course there would need to be a little more code to handle the various punctuation marks to add pauses and inflections as well.
That was the start of my project but things in life changed such that I never spent all the time I should have on developing it. Now, years later, I think about how this idea could be expanded upon with some of the new neural network and deep learning tools available. I wonder if by using my musical extractor with some code that learns if I could create a program that could read any text and sound completely natural but also listen to others speaking and hear exactly what is being said.
Right now if you interact with a system that is operated by a computer, it listens for key words and then responds with new menus. This unnatural way of communication on phone menus is always awkward to deal with and you are left feeling a little off because you really wanted to talk to a person.
But if you took my system, you could build something that would not only listen to your entire speech but also know where to pull the key elements out of it to continue on a meaningful, albeit artificial, conversation. I know this sounds dangerous as well but my intentions for it are a bit more personal. I want a computer at my house to act like Jarvis from the Iron Man movies. Something that has an air of intelligence behind it.
And despite my dreams and my ideas, my lack of progress has let others find the same path and create the solutions ahead of me. Adobe has created a program called VoCo that can listen to a person speak for about twenty minutes and then creates not only a transcript of what was said, but also gives someone the ability to change the speech by just changing the text. With just a good sample of speech, you can make anyone you like (or don’t like) say anything you want them to say.
Who knows what other idea I’ve been sitting on for the last ten years that someone else is about to release. This is why I’ve decided to just write about my ideas because they just fester and become nothing so I’m letting the small group of people I interact with see what is in my treasure chest and at least get it out there even if it goes nowhere.