Day208 - Leetcode: Python 121 & SQL 175,176 & DL Review
Python 121: Best Time to Buy and Sell Stock / SQL 175,176: Second-highest Salary / DL Review: Transformers, Self-Attention Mechanism & Positional Encoding

🟩 Python Review
121. Best Time to Buy and Sell Stock
You are given an array prices, where prices[i] represents the price of a stock on the ith day.
You want to maximize your profit by choosing a single day to buy one stock and choosing a different day in the future to sell that stock.
Return the maximum profit you can achieve from this transaction. If you cannot make a profit, return 0.
Solution
We have an array of prices, where prices[i] is the price of a stock on day i.
- We may buy once and sell once (buy before sell).
- Return the maximum profit that can be achieved.
- If no profit is possible, return
0.
Concepts
1. Naïve Approach
Try all (buy, sell) pairs → 0(n^2) time. It is too slow for large inputs.
2. Key Observation
- Profit =
sell_price - buy_price. - To maximize profit, we need to buy the lowest before selling the largest.
- The minimum prices so far (best day to buy thus far).
- The maximum profit so far if it is sold today.
3. Greedy One-Pass Algorithm
Each day:
min_price = min(min_price, price[i])profit = price[i] - min_pricemax_profit = max(max_profit, profit)
Solution Code
from typing import List
class Solution:
def maxProfit(self, prices: List[int]) -> int:
min_price = float('inf')
max_profit = 0
for price in prices:
if price < min_price:
min_price = price # update buying day
else:
max_profit = max(max_profit, price - min_price)
return max_profit
- Only one pass is needed:
O(n)time,O(1)space. - Greedy works because the best profit depends solely on the lowest seen price before the current day.
🟨 SQL Review
175. Combine Two Tables
-- My Answer
select a.firstName, a.lastName, b.city, b.state
from Person a
outer join Address b on a.personId = b.personId;
-- Solution
SELECT a.firstName, a.lastName, b.city, b.state
FROM Person a
LEFT JOIN Address b
ON a.personId = b.personId;
In PostgreSQL, the OUTER JOIN must be specified more explicitly — we can use LEFT JOIN, RIGHT JOIN, or FULL OUTER JOIN depending on the requirement.
176. Second-Highest Salary
-- My Answer
select dist(salary) from Employee
Limit 1;
-- Do I have to use group by? ==> NO, You do not need to.
-- Do I have to put any constraints for null value? ==> No, Postgre automatically ignores Nulls.
-- Solution
select distinct salary from Employee
order by salary desc
offset 1 limit 1;
From my attempt.
DIST ()is not a valid function in PostgreSQL.- I must write up
SELECT DISTINCT.
- I must write up
- I forgot to use
offsetto calculate the second-highest one.
From the solution,
offsetskips a certain number of rows. From here, we skip the first highest salary, and then provide the second highest one as requested.
🟦 DL Review
1. Transformers
Transformers are sequence models that rely entirely on the self-attention mechanisms instead of recurrence or convolutions. They process inputs in parallel and use attention layers to capture dependencies across the entire sequence.
Why It Matters
- Parallelizable → faster training than RNN/LSTM.
- Handle long-range dependencies better.
- Forms the foundation of modern NLP (BERT, GPT, T5) and Vision Transformers.
“Transformers replace recurrence with self-attention, enabling parallel processing and strong modeling of long-range dependencies. They have become the dominant architecture in NLP and are expanding into vision and multimodal tasks.”
MLOps Angle
Transformers are large and resource-intensive, requiring:
- Model compression (quantization, distillation).
- Distributed training (data/model parallelism).
- Scalable deployment with GPU/TPU inference optimization.
2. Self-Attention Mechanism
Self-attention computes a weighted representation of input tokens by relating each token to every other token in the sequence.
Where $Q, K, V$ are query, key, and value matrices.
Why It Matters
- Captures contextual relationships between all elements of the input.
- Enables parallel computation, unlike RNNs.
- Core building block of Transformers.
“Self-attention enables each element in a sequence to focus on all others, learning contextual relationships directly. It’s computationally efficient, parallelizable, and central to modern deep learning architectures.”
MLOps Angle
- Attention weights can be monitored for interpretability (e.g., which tokens are emphasized). In production, memory usage can be high (quadratic scaling), requiring optimizations like sparse attention or low-rank approximations.
3. Positional Encoding
Since Transformers lack recurrence or convolution, they use positional encodings to inject sequence order information.
- Can be sinusoidal (fixed functions of position).
- Or learned (trainable embeddings).
Why It Matters
- Essential for sequence tasks (text, speech) where order matters.
- Without positional encodings, Transformers would treat input as a bag of tokens.
“Transformers need positional encodings to preserve sequence order, since self-attention itself is order-agnostic. These eoncodings can be fixed (sinusoidal) or learned, enabling the model to distinguish between different token positions.”
MLOps Angle
- Different positional encoding strategies impact model generalization. For instance, sinusoidal encodings enable extrapolation to longer sequences, whereas learned embeddings may fail if sequence lengths at inference exceed those used during training.
Leave a comment