Day108 Deep Learning Lecture Review - Advanced Techniques in DL
HW5: Out-of-distribution (OOD) Detection (Maximum Softmax Probability & ODIN) and Continual Learning (SLDA & IID Streaming)
As deep learning models are increasingly deployed in real-world applications, ensuring their reliability and adaptability is crucial. This posting focused on two key areas:
- Out-of-distribution (OOD) Detection – Identifying whether an input sample belongs to a distribution that the model was trained on.
- Continual Learning (CL) – Developing models that retain past knowledge while learning from new data streams.
In this post, I will cover the methods and insights from my experiments with Maximum Softmax Probability (MSP), ODIN(Out-of-Distribution detector for Neural Networks), and Streaming Linear Discriminant Analysis (SLDA).
1. Out-of-distribution (OOD) Detection
Deep learning models are often overconfident in their predictions, even when encountering unfamiliar data. OOD detection aims to flag such inputs, preventing unreliable predictions.
1.1 Maximum Softmax Probability (MSP)
MSP is a baseline approach for OOD detection that relies on the idea that a well-calibrated model should assign low confidence scores to OOD samples. The MSP score is defined as :
Where:
- $\sigma_y$ is the softmax function,
- $T$ is the temperature scaling factor.
Implementation
- Loaded a pre-trained WideResNet model on CIFAR-10.
- Created a test set combining:
- In-distribution (ID): CIFAR-10 test set.
- Out-of-distribution (OOD): LSUN dataset resized to CIFAR-10 dimensions.
- Computed MSP scores for each samples
- Generated an ROC curve and evaluated detection performance.
Results

Metric | Value |
---|---|
AUROC ($\uparrow$) | 0.76 |
FPR@95 ($\downarrow$) | 0.41 |
AUPR ($\uparrow$) | 0.52 |
Detection Accuracy ($\uparrow$) | 0.50 |
-
AUROC of 0.76 indicates moderate OOD detection ability.
-
High FPR@95(41%) means the model falsely classifies many OOD samples as in distribution.
-
MSP is a weak baseline, as it does not actively distinguish ID from OOD samples.
# ... Load Datasets # Step 3: Initialize the MaxSoftmax detector and OOD metrics detector = MaxSoftmax(model) metrics = OODMetrics() # Step 4: Perform OOD detection scores = [] labels = [] with torch.no_grad(): for inputs, targets in test_loader: inputs = inputs.to(device) score = detector(inputs) scores.extend(score.cpu().numpy()) labels.extend(targets.numpy()) # Convert targets to binary labels: 0 for in-distribution, 1 for OOD labels_binary = [0 if label in range(10) else 1 for label in labels] # Step 5: Generate the ROC curve fpr, tpr, thresholds = roc_curve(labels_binary, scores) roc_auc = auc(fpr, tpr) # Calculate FPR@95 tpr_95_index = next(i for i, x in enumerate(tpr) if x >= 0.95) fpr_at_95 = fpr[tpr_95_index] # Calculate AUPR precision, recall, _ = precision_recall_curve(labels_binary, scores) aupr = auc(recall, precision) # Detection Accuracy threshold = 0.5 predictions = [1 if score > threshold else 0 for score in scores] detection_accuracy = sum([pred == true for pred, true in zip(predictions, labels_binary)]) / len(labels_binary)
1.2 ODIN: Out-of-Distribution Detector
ODIN improves upon MSP by perturbing the input and applying temperature scaling, making OOD samples easier to detect.
-
Perturbation - Modifies input $x$ using gradient-based perturbation.
$\hat{x} = x - \epsilon \cdot \text{sign}(\nabla_xL(f(x)/T,\hat{y}))$
Where $\epsilon$ is a small perturbation and $T$ is the temperature parameter.
-
Temperature Scaling - Amplifies softmax probabilities to sharpen the confidence differences.
Implementation
-
Used ODIN with default and tuned hyperparameters:**
- Default: $\epsilon=0.0014,\ T=1000$
- Tuned: $\epsilon = 0.002, \ T=1000$
-
Generated ROC curves for different settings.
-
Code implementation
# ... # Define ODIN parameters norm_std = WideResNet.norm_std_for("cifar10-pt") # Define detectors detectors = { "MSP": MaxSoftmax(model), "ODIN_Default": ODIN(model, eps=0.0014, temperature=1000, norm_std=norm_std), "ODIN_Tuned": ODIN(model, eps=0.002, temperature=1000, norm_std=norm_std), }
Results
- How does the perturbation process in ODIN relate to adversarial examples?
- The perturbation process in ODIN involves adding a small, carefully designed input perturbation that aligns with the direction of the gradient of the loss for the input. This makes OOD samples more distinguishable from in-distribution samples. This process is similar to generating adversarial examples crafted to maximize the loss. In ODIN, the perturbation helps amplify the differences between in-distribution and OOD samples rather than fooling the model.
-
Metric Results
Detector AUC ($\uparrow$) FPR@95($\downarrow$) Comments MSP 0.76 0.41 Baseline method ODIN_Default 0.86 0.26 Default ODIN hyperparameters ODIN_Tuned 0.89 0.18 Tuned hyperparameters - ODIN significantly improves OOD detection compared to MSP.
- Lower FPR@95 means fewer OOD samples are misclassified.
- Tuning $\epsilon$ further improves performance.

- Key Highlights
- MSP struggles to separate OOD samples due to softmax overconfidence.
- ODIN successfully amplifies the distnction between ID and OOD distributions.
- Trade-off: ODIN requires extra computation (perturbation step).
2. Continual Learning with Streaming LDA
Traditional deep learning models suffer from catastrophic forgetting – when learning new tasks, they overwrite previous knowledges. Continual learning (CL) tackles this problem.
2.1 Comparing Fine-Tuning vs. Streaming Linear Discriminant Analysis (SLDA)
SLDA updats only class statistics (means and variance) incrementally, preventing catastrophic forgetting.
Implementation
-
Dataset Splitting: CIFAR-10 divided into 5 disjoint class splits.
-
Experiments Conducted:
- Fine-Tuning: Standard training on sequential splits.
- SLDA: Incrementally updated class statistics without retraining.
-
Code Implementation
python main.py --label_file oxford_pet_split.csv --img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 5 python main.py --label_file oxford_pet_split.csv --img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 5 --use_slda
Results
Method | Split 0 Accuracy (After Split 1) | Split 0 Accuracy (After Split 4) |
---|---|---|
Fine-Tuning | 0.0714 | 0.0 |
SLDA | 0.8643 | 0.8429 |
- Fine-tuning exhibits catastrophic forgetting— the model forgets previously learned classes.
- SLDA retains past knowledge while learning new tasks.
2.2 IID vs. Non-IId Data Streams
When training data arrives IID (independent and identically distributed), the model sees all classes simultaneously. In non-IID streams, data arrives class-specific chunks.
Method | Data Distribution | Average Accuracy |
---|---|---|
Fine-Tuning | IID | 88.50% |
Fine-Tuning | Non-IID | 43.04% |
SLDA | IID | 90.80% |
SLDA | Non-IID | 90.00% |
- Fine-Tuning collapses under Non-IId conditions, while SLDA remains stable.
-
IID training mitigates catastrophic forgetting, but is unrealistic in real-world settings.
-
Code Implementation
python main.py --label_file oxford_pet_split.csv --img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 1 python main.py --label_file oxford_pet_split.csv --img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 1 --use_slda
3. Final Insights
- OOD detection: ODIN is a strong alternative to MSP
- MSP is unreliable in detecting OOD samples.
- ODIN enhances robustness with perturbation and temperature scaling.
- Tuning ODIN hyperparameters significantly improves detection performance.
- Continual Learnig: SLDA mitigates catastrophic forgetting
- Fine-tuning fails in Non-IID settings, making it unsuitable for real-world continual learning.
- SLDA achieves stable accuracy across tasks by preserving class statistics.
This is the end of the lecture and homeworks.
Leave a comment