The Backbone of Deep Learning: Neural Networks
Finding extremely complex patterns in vast volumes of data.
Neural networks are essentially algorithms used in deep learning. Deep learning is a subset of machine learning, which is itself a subset of artificial intelligence. In this article, I try to simplify the concept of neural networks, and the more specific neural network architecture used by Google’s Project Euphonia, which I will introduce in a bit.
I first became interested in deep learning and neural networks after watching an episode of the Youtube Original learning series, “Age of AI” This particular episode recounts the story of Tim Shaw, who had dedicated the earlier half of his life to becoming an NFL player. At 23, he achieved his dream of becoming professional when he was drafted as an NFL linebacker. But in 2013, six years after his draft, his body began to deteriorate due to Amyotrophic Lateral Sclerosis (ALS, also known as Lou Gehrig’s disease).
Since then, Shaw has become a powerful campaigner for the ALS Therapy Development Institute. The episode shows how AI researchers from Google’s Project Euphonia are working with ALS TDI and Shaw to improve automated speech recognition (ASR) technology for people with speaking impairments. The technology uses “datasets of audio from both native and non-native English speakers with neurodegenerative diseases and techniques from Parrotron, an AI tool for people with impediments.”(Joel Shor and Dotan Emanuel, Google AI blog) Project Euphonia’s aim is to drastically improve the quality of speech synthesis and generation.
Training Model for Project Euphonia
Their approach uses a two-step training process that starts with an initial, robust dataset; which is then fine-tuned with a personalized speech dataset. This fine-tuning phase involves much sparser data than the initial phase, since it is difficult — perhaps even impossible — to get the same amount of recorded speech from a single speaker, especially if they have medical conditions.To explain this a bit more simply, the initial model for the ASR technology is feeded lots and lots of recorded speech from various speakers with ALS in the training process. Then, the model would be fed limited recordings of Shaw before his speech impairment, to allow the model to adapt to this personalized voice, while still using the initial dataset as a sort of foundation for the model.
Let’s pause here and go back to the basics. What is a neural network? In fact, what is machine learning and deep learning?
Machine Learning & Deep Learning
Machine learning reminds me of how in Blade Runner, replicants are planted artificial memories so that humans can control them better. Similarly, machine learning is a branch of AI that programs machines to have pattern-recognition abilities by giving them a “memory bank,” a.k.a. datasets. What that really looks like are algorithms that read data and apply that data to solve new problems.
Then what is deep learning? Deep learning is an evolution of machine learning. The key distinction between machine learning and deep learning is that the latter relies on neural networks, a layered structure of algorithms. The concept is inspired by the biological neural network of the human brain, that allows machines to learn much more efficiently than standard machine learning models.
Neural networks consist of an input, output, and any number of hidden layers. Any neural networks with more than one hidden layer is considered a deep learning model.
At the most basic level, neurons that make up these neural networks have weights and a bias (or threshold). The algebraic formula would look something like this:
If you’re anything like me, this equation looks like pure gibberish at first. But don’t worry, it will make a lot more sense after my example, which I took inspiration from IBM’s blog post on machine learning.
First, let’s say you’re deciding whether or not to go home for the holidays during the pandemic, and there are three main factors:
- If it will be safe enough for you to travel during a pandemic (Yes: 1; No: 0)
- If it will be safe for the rest of your family (Yes: 1; No: 0)
- If you will save money (Yes: 1; No: 0)
Then, let’s calculate the values according to your specific situation:
- X1 = 1, since you will be driving to the gathering place by yourself in your own car.
- X2 = 1, since you and your family have decided to get tested before the gathering.
- X3 = 0, since it is a long drive and you will need to spend money on gas and also pay for a cat sitter since your cat hates long car rides.
Moving on, we now need to assign some weights to determine importance. Larger weights make a single input’s contribution to the output more significant compared to other inputs.
- W1 = 5, since you don’t want to contract the virus
- W2 = 5, since you value the safety of your family
- W3 = 1, since saving money isn’t really important to you right now.
Finally, we’ll also assume a threshold value of 5, which would translate to a bias value of –5. Now we plug the numbers into the equation above.
Y-hat (our predicted outcome) = Decide to see family for the holidays or not.
Y-hat = (1*5) + (1*5) + (0*1) — 5
Y-hat = 5 + 5 + 0–5
Y-hat = 5, which is greater than zero.
Since Y-hat is 2, the output from the activation function will be 1, meaning that you will visit family for the holidays.
If the output of a node is greater than the specified bias/threshold value, that node is activated, and data is sent to the next hidden layer within the network. Depending on the number of hidden layers involved in the deep learning algorithms, data from activated nodes are passed through multiple layers, which are then calculated as the final output of the neural network.
Although this is an extremely simplified version of deep learning algorithms, I think it helps get a sense of the approach behind neural networks.
I’ll end the post for now with this colorful chart of some different types of neural networks architectures…
Sources:
Joel Shor and Dotan Emanuel, Research Engineers, Google Research, Tel Aviv, Google AI blog, https://ai.googleblog.com/2019/08/project-euphonias-personalized-speech.html