(editing) Data Science Interview Prep - LLM/AI
Interview Questions & Answers
LLM & AI
Questions from:
LinkedIn Post by Daniel Lee : LLM Interview Questions
LLM
Basic Concepts - LLM
- Architecture & Training
- Transformer Architecture (attention mechanisms
- Pre-training vs Fine-tuning
- Training Objectives (next token prediction)
- Context Window and Position Embeddings
- Tokenization Strategies
- Model Scaling Laws
- Parameter Efficient Fine-tuning (LoRA, QLoRA, Prefix Tuning)
- Generation & Control
- Temperature and Top-p Sampling
- Prompt Engineering Techniques
- Few-shot Learning
- In-context Learning
- Chain-of-Thought Prompting
- Hallucination Prevention
Answers
Architecture & Training
- Transformer Architecture (Attention Mechanisms):
- Transformers use self-attention to weigh the importance of each word in a sequence, allowing the model to capture dependencies regardless of distance. This architecture has led to breakthroughs in NLP by effectively modeling long-range context.
- Pre-training vs. Fine-tuning:
- Pre-training involves training on vast datasets to develop general language understanding. Fine-tuning adapts the model to specific tasks with a smaller, domain-specific dataset, enhancing performance in target areas.
- Training Objectives (Next Token Prediction):
- For language models, their main goal is to predict the next token in a sequence. This capability fosters the learning of contextual relationships within the text, enabling the model to produce sequences that are both **coherent** and **accurate** in context.
- Context Window and Position Embeddings:
- The context window defines the maximum token length the model can process at once. Position embeddings retain token order because Transformers inherently do not depend on the order of sequences.
- Tokenization Strategies:
- Tokenization splits text into smaller units (tokens) for model processing. Common strategies include Byte Pair Encoding (BPE) and WordPiece, which balance vocabulary size and generalization.
- Byte Pair Encoding (BPE) is a tokenization method that compresses text by iteratively merging the most frequent pairs of characters or subwords. This approach creates a manageable vocabulary of subword units, allowing models to handle rare or compound words by breaking them into smaller, reusable tokens.
- Model Scaling Laws:
- Scaling laws reveal that larger models (with more parameters and data) typically perform better, but with diminishing returns. These laws guide resource allocation for optimal model size and training.
- Parameter Efficient Fine-tuning (LoRA, QLoRA, Prefix Tuning):
- Techniques like LoRA, QLoRA, and Prefix Tuning enable fine-tuning large models efficiently by adjusting only specific layers or adding lightweight adapters, making the process faster and more cost-effective.
- LoRA (Low-Rank Adaptation): LoRA fine-tunes large models by injecting low-rank matrices into specific layers, adjusting only a few parameters. This makes the process more efficient by minimizing memory and computational requirements.
- QLoRA (Quantized LoRA): QLoRA combines low-rank adaptation with quantization, reducing model precision to save memory and computing costs. It’s beneficial for deploying large models on resource-constrained devices.
- Prefix Tuning: Prefix Tuning fine-tunes models by adding a learned "prefix" of virtual tokens to the input, which guides the model in task-specific ways without modifying the core model parameters. This method is lightweight and effective for many NLP tasks.
Generation Control
- Temperature and Top-p Sampling:
- Temperature controls randomness in generation (lower values make the output more focused, and higher values increase creativity). Top-p sampling (nucleus sampling) selects tokens from the smallest set whose cumulative probability meets a threshold, reducing unlikely outcomes.
- Prompt Engineering Techniques:
- Crafting prompts to guide model responses improves task-specific performance. Good prompts often include explicit instructions, context, or examples to direct the model effectively.
- Few-shot Learning:
- Few-shot learning uses a handful of examples within a prompt to enable the model to generalize to similar tasks without extensive re-training, leveraging prior knowledge from pre-training.
- In-context Learning:
- In-context learning involves showing the model a context or sequence that influences its response, allowing it to generate relevant output based on preceding examples or information.
- Chain-of-Thought Prompting:
- This method encourages the model to think step-by-step, improving reasoning in complex tasks by breaking down its response into logical, sequential steps, which enhances interpretability and accuracy.
- Hallucination Prevention:
- Techniques like fact-checking, retrieval augmentation, and reinforcement tuning reduce hallucinations, ensuring the model’s responses are accurate and grounded in factual information.
Leave a comment