Large Language Models (LLMs): GPT, BERT, and Beyond
November 20, 2024
11 min read
LLMGPTBERTNLPAI
Large Language Models (LLMs)
What You Need to Learn
- Transformer Architecture: Self-attention, encoder-decoder
- Pre-training Methods: Masked language modeling, causal language modeling
- Fine-tuning: Task-specific adaptation
- Prompt Engineering: Zero-shot, few-shot, chain-of-thought
- Deployment: API vs self-hosted, cost optimization
ELI5: What are Large Language Models?
Imagine a student who read the entire internet:
Traditional Program:
- You: "Translate 'hello' to Spanish"
- Computer: "Hola" (looks up in dictionary)
Large Language Model (LLM):
- Computer read billions of webpages
- Learned patterns in language
- You: "Translate 'hello' to Spanish"
- Computer: "Based on patterns I've seen, it's 'Hola'"
Magic: LLM can do tasks it was never explicitly trained for!
How? By understanding language patterns:
- Grammar rules (without being told)
- Context (what comes before/after)
- Relationships (synonyms, antonyms)
- Even reasoning (to some extent!)
System Design: LLM Architecture
┌─────────────────────────────────────────┐
│ Pre-training Phase │
│ Internet Text → Transformer → │
│ → Base Model (billions of params) │
└─────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Fine-tuning Phase │
│ Task-specific data → │
│ → Specialized Model │
└─────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Inference Phase │
│ User Prompt → Model → │
│ → Generated Response │
└─────────────────────────────────────────┘
Model Components:
┌─────────────────────────────────────────┐
│ Input Text │
│ ↓ │
│ Tokenization (words → numbers) │
│ ↓ │
│ Embedding Layer (numbers → vectors) │
│ ↓ │
│ Transformer Blocks (24-96 layers) │
│ - Self-Attention │
│ - Feed Forward │
│ ↓ │
│ Output Layer │
│ ↓ │
│ Generated Text │
└─────────────────────────────────────────┘
How LLMs Generate Text
Example: Complete "The cat sat on the"
-
Tokenization: Break into pieces
- ["The", "cat", "sat", "on", "the", "???"]
-
Embedding: Convert to numbers (vectors)
-
Attention: Look at context
- "mat" makes sense (cats sit on mats)
- "moon" doesn't make sense (cats don't sit on moons)
-
Probability Distribution:
- 60% → "mat"
- 20% → "floor"
- 10% → "couch"
- 10% → other
-
Sampling: Pick "mat" (highest probability)
Result: "The cat sat on the mat"
GPT vs BERT: Key Differences
GPT (Generative Pre-trained Transformer)
Goal: Generate next word
Training: "The cat sat on the [MASK]"
GPT learns: Predict what comes next
Use: Text generation, completion, chat
BERT (Bidirectional Encoder Representations)
Goal: Understand context
Training: "The cat [MASK] on the mat"
BERT learns: What word fits here? (looks both ways!)
Use: Classification, Q&A, understanding
Key Difference: GPT = Generator, BERT = Understander
Prompt Engineering
The art of asking LLMs the right way
Zero-Shot
Prompt: "Classify sentiment: 'This movie was terrible'"
Response: "Negative"
Few-Shot
Prompt:
"Classify sentiment:
Example 1: 'I loved it' → Positive
Example 2: 'It was okay' → Neutral
Example 3: 'Worst ever' → Negative
New: 'This movie was terrible'"
Response: "Negative"
Chain-of-Thought
Prompt: "What's 15% tip on $82.50? Think step by step."
Response:
"Step 1: Calculate 10% = $8.25
Step 2: Calculate 5% = $4.13
Step 3: Add them: $8.25 + $4.13 = $12.38"
Fine-Tuning vs Prompting
When to Fine-Tune
- Domain-specific language (legal, medical)
- Consistent output format needed
- Privacy concerns (can't send data to API)
- Cost optimization (many queries)
When to Prompt
- Quick prototyping
- Varied tasks
- No labeled data for training
- Using latest model capabilities
LLM in Production
Challenges
- Cost: GPT-4 API = $0.03-0.06 per 1K tokens
- Latency: Responses take 2-10 seconds
- Hallucinations: Makes up facts confidently
- Context Limits: Max 4K-128K tokens
Solutions
# Example: Optimize cost with caching
import lru_cache
@lru_cache(maxsize=1000)
def get_llm_response(prompt):
# Only call API for new prompts
return openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
Real-World Example
At Nike, Agentic AI sustainability assistant:
- Base: GPT-4 for reasoning
- RAG: Retrieve carbon metrics from Databricks
- Prompt: Structured for compliance reporting
- Validation: Cross-check with regulatory frameworks
- Result: Accurate, compliant responses for sustainability queries
Key Insight: LLM + Domain Data + Validation = Production-Ready AI
Best Practices
- Prompt Templates: Standardize for consistency
- Temperature Control: 0 = deterministic, 1 = creative
- Validation: Always verify critical outputs
- Cost Monitoring: Track token usage
- Fallbacks: Handle API failures gracefully
- Version Control: Lock model versions for reproducibility