Deep Learning Architectures

What You Need to Learn

Neural Network Basics: Neurons, layers, backpropagation
CNNs: Convolutional Neural Networks for images
RNNs/LSTMs: Recurrent networks for sequences
Transformers: Attention mechanism, BERT, GPT
Training Techniques: Batch normalization, dropout, learning rate scheduling

ELI5: What are Neural Networks?

Your brain learning to recognize your friend's face:

Input Layer = Your eyes see features (hair color, eye shape, nose)
Hidden Layers = Your brain combines features ("brown hair + blue eyes + small nose")
Output Layer = Recognition! "That's Sarah!"

Artificial Neural Network:

Same idea, but with math
Each "neuron" is a simple calculation
Many layers of neurons = "deep" learning
Learns by adjusting connections (weights)

Example: Teaching a network to recognize cats:

Show 1000 cat pictures → network adjusts weights
Show 1000 non-cat pictures → network adjusts more
New picture → network predicts: cat or not cat!

System Design: Neural Network Architecture Types

┌─────────────────────────────────────────┐
│     Feedforward Neural Network (FNN)    │
│  Input → Hidden → Hidden → Output       │
│  Use: Tabular data, simple classification│
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│  Convolutional Neural Network (CNN)     │
│  Conv → Pool → Conv → Pool → Dense      │
│  Use: Images, spatial data              │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│  Recurrent Neural Network (RNN/LSTM)    │
│  Input[t] → Hidden[t] → Output[t]       │
│           ↓                              │
│  Input[t+1] → Hidden[t+1] → Output[t+1] │
│  Use: Time series, text sequences       │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│  Transformer Architecture               │
│  Input → Embedding → Attention →        │
│       → Feed Forward → Output            │
│  Use: NLP, translation, GPT/BERT        │
└─────────────────────────────────────────┘

CNN: Understanding Convolutional Layers

Why CNNs for Images?

Traditional neural network: 1000x1000 image = 1M pixels = 1M weights per neuron = HUGE!

CNN Solution: Local patterns matter more

Image (cat):
┌─────────────┐
│ ╱\_╱\_       │  ← Ears (pattern)
│ (• . •)     │  ← Eyes (pattern)
│  > ^ <      │  ← Whiskers (pattern)
└─────────────┘

Convolution:
- Small filter slides across image
- Detects edges, then shapes, then objects
- Shares weights (efficient!)

Layers:

Conv Layer: Detect features (edges, textures)
Pooling: Reduce size, keep important info
Conv Layer: Detect higher features (eyes, ears)
Pooling: Reduce more
Dense: Final classification

LSTM: Handling Sequential Data

Problem with basic RNNs: Forget long-term context

LSTM (Long Short-Term Memory): Remembers important info, forgets irrelevant

# Example: LSTM for time-series
import torch.nn as nn

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                           batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        predictions = self.fc(lstm_out[:, -1, :])
        return predictions

Use Cases:

Stock price prediction
Language translation
Speech recognition
Weather forecasting

Transformer Architecture

Revolutionary idea: Attention is all you need!

Problem: RNNs process sequentially (slow)

Solution: Process all words simultaneously, use "attention" to find relationships

Sentence: "The cat sat on the mat"

Attention Mechanism:
- "sat" pays attention to "cat" (who sat?)
- "sat" pays attention to "mat" (sat where?)
- Learns relationships without sequential processing

Key Components:

Self-Attention: Words relate to other words
Multi-Head Attention: Multiple attention patterns
Positional Encoding: Remember word order
Feed Forward: Standard neural network layers

Famous Transformers:

BERT: Bidirectional Encoder (understanding)
GPT: Generative Pre-trained (generation)
T5: Text-to-Text Transfer

Training Deep Networks

Challenges

Vanishing Gradients: Deep networks stop learning (gradients → 0) Solution:

Batch Normalization
Residual Connections (ResNet)
Better activations (ReLU, GELU)

Overfitting: Memorizes training data Solution:

Dropout (randomly turn off neurons)
Data augmentation
Early stopping

Best Practices

# Example training loop
import torch
import torch.nn as nn
import torch.optim as optim

model = MyModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(num_epochs):
    for batch in dataloader:
        # Forward pass
        outputs = model(batch['input'])
        loss = criterion(outputs, batch['labels'])
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Real-World Example

At UNT, energy consumption forecasting:

LSTM + GRU hybrid architecture
Input: Temperature, occupancy, time features
15% better accuracy than traditional models
SHAP for interpretability (which features matter?)

Choosing the Right Architecture

Data Type	Architecture	Example
Images	CNN	Face recognition
Text	Transformer	ChatGPT
Time Series	LSTM/GRU	Stock prediction
Tabular	FNN	Fraud detection
Audio	CNN + RNN	Speech-to-text

Deep Learning Architectures: CNNs, RNNs, and Transformers