Long Short-Term Memory - AI & ML Insights

Recurrent Neural Network (RNN)

RNNs are designed for sequential data like time series, text, or audio. They have a looping architecture that allows information to persist across time steps, enabling the network to learn temporal patterns.

Key Features of RNN:

Recurrent connections: Information from the previous step is used in the current computation.
Vanishing gradient problem: RNNs struggle to learn long-term dependencies because gradients diminish over time during back-propagation.

Long Short-Term Memory (LSTM)

LSTMs are a special type of RNN that solves the vanishing gradient problem. They achieve this by introducing gates that control the flow of information through the network.

Key Features of LSTM:

Forget Gate: Decides what information to discard from the cell state.
Input Gate: Decides what new information to store in the cell state.
Cell State: Acts as the long-term memory of the LSTM.
Output Gate: Decides what part of the memory to pass as the output.

Components of Long Short-Term Memory (LSTM)

Real-Life Example of What to Keep and Discard in LSTM

Let’s take a real-world example of predicting customer churn for a subscription-based service like Netflix. The goal is to determine if a customer will cancel their subscription based on their activity and behavior over time.

Scenario:

We have sequential data about a customer’s activity over the last 6 months. The features include:

Number of movies watched (Activity Level)
Customer satisfaction score (Feedback)
Days since last login (Engagement)

The LSTM must decide which information is important to keep for the prediction and which is irrelevant and can be discarded.

Steps in the LSTM Process:

1. Forget Gate – What to Discard?

The LSTM looks at the historical data and determines if some older behaviors are no longer relevant to predicting churn. For example:

If a customer was inactive three months ago but recently became active again, the “inactivity from 3 months ago” is discarded as it’s no longer useful.

2. Input Gate – What to Add?

The model examines recent activity to decide what new information to store. For example:

If the customer’s satisfaction score dropped significantly this month, the LSTM will add this information to the memory because it’s crucial for churn prediction.

3. Cell State – Updated Memory:

The forget gate and input gate work together to update the memory:

It removes outdated information (like old inactivity data) and incorporates new patterns (like recent dissatisfaction or a sudden drop in movie-watching).

4. Output Gate – What to Use?

Finally, the LSTM decides what part of the memory is important for the current prediction. For example:

If the recent drop in satisfaction score and increase in days since last login are strong indicators of churn, these will be passed to the next step for the prediction.

Here’s the complete calculation for the given 1-dimensional sequence with a single hidden state across 3 iterations, including forward pass, loss calculation, and backpropagation.