Day108 Deep Learning Lecture Review - Advanced Techniques in DL

December 21, 2024 4 minute read

HW5: Out-of-distribution (OOD) Detection (Maximum Softmax Probability & ODIN) and Continual Learning (SLDA & IID Streaming)

B07A283A-6E97-4810-BFFA-6811C788A37A_1_105_c

As deep learning models are increasingly deployed in real-world applications, ensuring their reliability and adaptability is crucial. This posting focused on two key areas:

Out-of-distribution (OOD) Detection – Identifying whether an input sample belongs to a distribution that the model was trained on.
Continual Learning (CL) – Developing models that retain past knowledge while learning from new data streams.

In this post, I will cover the methods and insights from my experiments with Maximum Softmax Probability (MSP), ODIN(Out-of-Distribution detector for Neural Networks), and Streaming Linear Discriminant Analysis (SLDA).

1. Out-of-distribution (OOD) Detection

Deep learning models are often overconfident in their predictions, even when encountering unfamiliar data. OOD detection aims to flag such inputs, preventing unreliable predictions.

1.1 Maximum Softmax Probability (MSP)

MSP is a baseline approach for OOD detection that relies on the idea that a well-calibrated model should assign low confidence scores to OOD samples. The MSP score is defined as :

$max_y \sigma_y(f(x)/T)$

Where:

$\sigma_y$ is the softmax function,
$T$ is the temperature scaling factor.

Implementation

Loaded a pre-trained WideResNet model on CIFAR-10.
Created a test set combining:
- In-distribution (ID): CIFAR-10 test set.
- Out-of-distribution (OOD): LSUN dataset resized to CIFAR-10 dimensions.
Computed MSP scores for each samples
Generated an ROC curve and evaluated detection performance.

Results

Metric	Value
AUROC ($\uparrow$)	0.76
FPR@95 ($\downarrow$)	0.41
AUPR ($\uparrow$)	0.52
Detection Accuracy ($\uparrow$)	0.50

AUROC of 0.76 indicates moderate OOD detection ability.
High FPR@95(41%) means the model falsely classifies many OOD samples as in distribution.

MSP is a weak baseline, as it does not actively distinguish ID from OOD samples.

# ... Load Datasets
  
# Step 3: Initialize the MaxSoftmax detector and OOD metrics
detector = MaxSoftmax(model)
metrics = OODMetrics()
  
# Step 4: Perform OOD detection
scores = []
labels = []
with torch.no_grad():
  for inputs, targets in test_loader:
    inputs = inputs.to(device)
    score = detector(inputs)
    scores.extend(score.cpu().numpy())
    labels.extend(targets.numpy())
  
      
# Convert targets to binary labels: 0 for in-distribution, 1 for OOD
labels_binary = [0 if label in range(10) else 1 for label in labels]
  
  
# Step 5: Generate the ROC curve
fpr, tpr, thresholds = roc_curve(labels_binary, scores)
roc_auc = auc(fpr, tpr)
  
# Calculate FPR@95
tpr_95_index = next(i for i, x in enumerate(tpr) if x >= 0.95)
fpr_at_95 = fpr[tpr_95_index]
  
# Calculate AUPR
precision, recall, _ = precision_recall_curve(labels_binary, scores)
aupr = auc(recall, precision)
  
# Detection Accuracy
threshold = 0.5
predictions = [1 if score > threshold else 0 for score in scores]
detection_accuracy = sum([pred == true for pred,
                          true in zip(predictions,
                                      labels_binary)]) / len(labels_binary)

1.2 ODIN: Out-of-Distribution Detector

ODIN improves upon MSP by perturbing the input and applying temperature scaling, making OOD samples easier to detect.

Perturbation - Modifies input $x$ using gradient-based perturbation.
$\hat{x} = x - \epsilon \cdot \text{sign}(\nabla_xL(f(x)/T,\hat{y}))$

Where $\epsilon$ is a small perturbation and $T$ is the temperature parameter.
Temperature Scaling - Amplifies softmax probabilities to sharpen the confidence differences.

Implementation

Used ODIN with default and tuned hyperparameters:**
- Default: $\epsilon=0.0014,\ T=1000$
- Tuned: $\epsilon = 0.002, \ T=1000$
Generated ROC curves for different settings.

Code implementation

# ...
# Define ODIN parameters
norm_std = WideResNet.norm_std_for("cifar10-pt")
# Define detectors
detectors = {
"MSP": MaxSoftmax(model),
"ODIN_Default": ODIN(model, eps=0.0014, temperature=1000, norm_std=norm_std),
"ODIN_Tuned": ODIN(model, eps=0.002, temperature=1000, norm_std=norm_std),
}

Results

How does the perturbation process in ODIN relate to adversarial examples?
- The perturbation process in ODIN involves adding a small, carefully designed input perturbation that aligns with the direction of the gradient of the loss for the input. This makes OOD samples more distinguishable from in-distribution samples. This process is similar to generating adversarial examples crafted to maximize the loss. In ODIN, the perturbation helps amplify the differences between in-distribution and OOD samples rather than fooling the model.

Metric Results

Detector	AUC ($\uparrow$)	FPR@95($\downarrow$)	Comments
MSP	0.76	0.41	Baseline method
ODIN_Default	0.86	0.26	Default ODIN hyperparameters
ODIN_Tuned	0.89	0.18	Tuned hyperparameters

ODIN significantly improves OOD detection compared to MSP.
Lower FPR@95 means fewer OOD samples are misclassified.
Tuning $\epsilon$ further improves performance.

Key Highlights
- MSP struggles to separate OOD samples due to softmax overconfidence.
- ODIN successfully amplifies the distnction between ID and OOD distributions.
- Trade-off: ODIN requires extra computation (perturbation step).

2. Continual Learning with Streaming LDA

Traditional deep learning models suffer from catastrophic forgetting – when learning new tasks, they overwrite previous knowledges. Continual learning (CL) tackles this problem.

2.1 Comparing Fine-Tuning vs. Streaming Linear Discriminant Analysis (SLDA)

SLDA updats only class statistics (means and variance) incrementally, preventing catastrophic forgetting.

Implementation

Dataset Splitting: CIFAR-10 divided into 5 disjoint class splits.
Experiments Conducted:
- Fine-Tuning: Standard training on sequential splits.
- SLDA: Incrementally updated class statistics without retraining.

Code Implementation

python main.py --label_file oxford_pet_split.csv
--img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 5
python main.py --label_file oxford_pet_split.csv
--img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 5 --use_slda

Results

Method	Split 0 Accuracy (After Split 1)	Split 0 Accuracy (After Split 4)
Fine-Tuning	0.0714	0.0
SLDA	0.8643	0.8429

Fine-tuning exhibits catastrophic forgetting— the model forgets previously learned classes.
SLDA retains past knowledge while learning new tasks.

2.2 IID vs. Non-IId Data Streams

When training data arrives IID (independent and identically distributed), the model sees all classes simultaneously. In non-IID streams, data arrives class-specific chunks.

Method	Data Distribution	Average Accuracy
Fine-Tuning	IID	88.50%
Fine-Tuning	Non-IID	43.04%
SLDA	IID	90.80%
SLDA	Non-IID	90.00%

Fine-Tuning collapses under Non-IId conditions, while SLDA remains stable.
IID training mitigates catastrophic forgetting, but is unrealistic in real-world settings.

Code Implementation

python main.py --label_file oxford_pet_split.csv
--img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 1
python main.py --label_file oxford_pet_split.csv
--img_dir "/Users/leahshin/Downloads/E2EDL_HW5_2/P2/images" --num_split 1 --use_slda

3. Final Insights

OOD detection: ODIN is a strong alternative to MSP
- MSP is unreliable in detecting OOD samples.
- ODIN enhances robustness with perturbation and temperature scaling.
- Tuning ODIN hyperparameters significantly improves detection performance.
Continual Learnig: SLDA mitigates catastrophic forgetting
- Fine-tuning fails in Non-IID settings, making it unsuitable for real-world continual learning.
- SLDA achieves stable accuracy across tasks by preserving class statistics.

This is the end of the lecture and homeworks.

Share on

Twitter Facebook LinkedIn

Wonha Leah Shin

Day108 Deep Learning Lecture Review - Advanced Techniques in DL

HW5: Out-of-distribution (OOD) Detection (Maximum Softmax Probability & ODIN) and Continual Learning (SLDA & IID Streaming)

1. Out-of-distribution (OOD) Detection

1.1 Maximum Softmax Probability (MSP)

Implementation

Results

1.2 ODIN: Out-of-Distribution Detector

Implementation

Results

2. Continual Learning with Streaming LDA

2.1 Comparing Fine-Tuning vs. Streaming Linear Discriminant Analysis (SLDA)

Implementation

Results

2.2 IID vs. Non-IId Data Streams

3. Final Insights

Share on

Leave a comment

You may also enjoy

Day175 - MLOps Review: Data Distribution Shifts And Monitoring (2)

Day174 - MLOps Review: Data Distribution Shifts And Monitoring (1)

Day173 - MLOps Review: Model Deployment And Prediction Service (3)

Day172 - MLOps Review: Model Deployment and Prediction Service (2)