Amazon ML Summer School Previous Year Questions 2025
The Amazon Machine Learning (ML) Summer School is a premier program aimed at equipping engineering students in India with industry-relevant ML skills. Launched in 2021, its fifth edition in 2025 promises an immersive learning experience led by Amazon’s top scientists. This blog covers what the program is, who can apply, how to apply, and what to expect.
What is the Amazon ML Summer School?
This free four-weekend program (Aug 9–31, 2025) blends theory and practice across key ML topics such as Supervised Learning, Deep Neural Networks, Generative AI, Reinforcement Learning, and more. Sessions include live lectures and Q&As with Amazon scientists.
Key Highlights:
- Curriculum: 8 modules covering ML topics including Generative AI, Causal Inference, and Reinforcement Learning.
- Schedule: Saturdays and Sundays, 9 AM–1 PM IST (sessions) and 2 PM–5 PM IST (Q&A).
- Networking: Includes access to Amazon Research Days (ARD) and mentorship from scientists.
- Internship Pathway: Top performers may be eligible for internships in Data Science or Applied Scientist roles at Amazon.
Over 17,500 students applied in 2022 alone—underscoring the program’s growing popularity.
Eligibility
- Who Can Apply: Engineering students enrolled in Bachelor’s, Master’s, or PhD programs at recognized Indian institutes.
- Graduation Year: 2026 or 2027.
- Background: All engineering disciplines welcome. Familiarity with Python and basic ML concepts is helpful but not mandatory.
Selection Process
Candidates take a 60-minute test assessing math and programming skills:
- Part A: 20 MCQs on probability, statistics, and linear algebra.
- Part B: 2 DSA programming problems (Python recommended).
Top 3,000 scorers will be selected for the program.
How to apply for Amazon ML Summer School- Check now
💻 Part A – Amazon ML Summer School MCQ Questions
Question 1: Variance Calculation
Given a dataset, calculate the variance of the numbers: [2, 4, 6, 8, 10].
a) 4
b) 6
c) 8
d) 10
Show Answer
Answer: c) 8
Explanation: To calculate variance:
- Find the mean: (2+4+6+8+10)/5 = 30/5 = 6
- Calculate squared deviations from mean: (2-6)² + (4-6)² + (6-6)² + (8-6)² + (10-6)² = 16 + 4 + 0 + 4 + 16 = 40
- Divide by n: 40/5 = 8
Question 2: Matrix Eigenvalues
The eigenvalues of a 4x4 square matrix having 0's as the diagonal elements and 1's on the off diagonal elements is:
a) 2, -2, 0, 0
b) 1, -1, 1, -1
c) 3, -1, -1, -1
d) 4, 0, 0, 0
Show Answer
Answer: c) 3, -1, -1, -1
Explanation: The matrix is J - I where J is the all-ones 4×4 matrix and I is the identity matrix. The eigenvalues of the all-ones matrix J are: n=4 (with multiplicity 1) and 0 (with multiplicity 3). For J - I, the eigenvalues become: 4-1=3, 0-1=-1, 0-1=-1, 0-1=-1.
Question 3: Probability (Two-Headed Coin)
You pick a coin at random from a bag containing a fair coin and a two-headed coin. You flip the coin twice and get heads both times. What is the probability that you picked the fair coin?
a) 1/3
b) 4/13
c) 1/5
d) 2/5
Show Answer
Answer: c) 1/5
Explanation: Using Bayes' theorem: P(Fair|HH) = P(HH|Fair) × P(Fair) / P(HH)
- P(HH|Fair) = 1/4, P(HH|Two-headed) = 1
- P(Fair) = P(Two-headed) = 1/2
- P(HH) = P(HH|Fair) × P(Fair) + P(HH|Two-headed) × P(Two-headed) = (1/4)(1/2) + (1)(1/2) = 1/8 + 1/2 = 5/8
- P(Fair|HH) = (1/4)(1/2) / (5/8) = (1/8) / (5/8) = 1/5
Question 4: Medical Screening P-Value
In medical screening, it is sometimes more important to avoid false negatives than false positives. How would this affect the p-value we might use for a medical test?
a) we would choose a higher p-value
b) we would choose a lower p-value
c) we would choose the same p-value
d) false negatives do not depend on p-value
Show Answer
Answer: a) we would choose a higher p-value
Explanation: To avoid false negatives (missing actual cases), we want high sensitivity. This means we're willing to accept more false positives (Type I error). A higher p-value threshold makes it easier to reject the null hypothesis, increasing sensitivity but also increasing Type I error rate.
Question 5: Derivative Calculation
Given y=x^x, what is dy/dx at x = 2?
a) 4(1+log2)
b) 4(2+log2)
c) 4
d) 8
Show Answer
Answer: a) 4(1+log2)
Explanation: For y = x^x, use logarithmic differentiation:
- ln(y) = x ln(x)
- Differentiating: (1/y)(dy/dx) = ln(x) + 1
- Therefore: dy/dx = x^x(ln(x) + 1)
- At x = 2: dy/dx = 2^2(ln(2) + 1) = 4(ln(2) + 1) = 4(1 + log2)
Question 6: Algorithm Selection (Clustering)
In a scenario where you need to group customers based on purchasing behavior without prior labels, which algorithm should you use?
a) Linear regression
b) K-means clustering
c) Logistic regression
d) Random forest
Show Answer
Answer: b) K-means clustering
Explanation: This is an unsupervised learning problem since there are no prior labels. K-means clustering is designed to group similar data points together without requiring labeled training data. Linear regression and logistic regression are supervised learning algorithms that need labeled data.
Question 7: Correlation Calculation
For random variables X and Y, we have Var(X)=1, Var(Y)=4, and Var(2X-3Y)=34, then the correlation between X and Y is:
a) 1/2
b) 1/4
c) 1/3
d) None of the above
Show Answer
Answer: b) 1/4
Explanation:
- Var(2X - 3Y) = 4Var(X) + 9Var(Y) - 12Cov(X,Y)
- 34 = 4(1) + 9(4) - 12Cov(X,Y)
- 34 = 4 + 36 - 12Cov(X,Y)
- 34 = 40 - 12Cov(X,Y)
- 12Cov(X,Y) = 6
- Cov(X,Y) = 1/2
- Correlation = Cov(X,Y)/√(Var(X)×Var(Y)) = (1/2)/√(1×4) = (1/2)/2 = 1/4
Question 8: Probability (Chessboard - Version 1)
Two squares are chosen at random on a chessboard. What is the probability that they have a side in common?
a) 8/13
b) 17/18
c) 5/13
d) 1/18
Show Answer
Answer: d) 1/18
Explanation:
- Total ways to choose 2 squares from 64: C(64,2) = 2016
- Adjacent pairs: 4 corner squares (2 neighbors each) + 24 edge squares (3 neighbors each) + 36 interior squares (4 neighbors each)
- Total adjacent pairs = (4×2 + 24×3 + 36×4)/2 = 112
- Probability = 112/2016 = 1/18
Question 9: Gradient Descent vs Normal Equation
Suppose you have a dataset with m=50 examples and n=200000 features for each example. You want to use multivariate linear regression to fit the parameters Θ. Should you prefer gradient descent or the normal equation?
a) Gradient descent, since inverse(X^T X) will be very slow to compute
b) Gradient descent, since it will always converge to the optimal Θ
c) The normal equation, since it provides an efficient way to directly find the solution
d) The normal equation, since gradient descent might be unable to find the optimal Θ
Show Answer
Answer: a) Gradient descent, since inverse(X^T X) will be very slow to compute
Explanation: With n=200000 features, computing (X^T X)^(-1) requires O(n³) operations, which is computationally expensive. Gradient descent scales much better with the number of features and is preferred for high-dimensional problems. The normal equation becomes impractical when n is very large.
Question 10: Unbiased Estimators
Let M and S² be the mean and variance of a random sample of size > 1 from a normal population with unknown mean μ and unknown finite variance σ²>0. Which statements are true?
- S² is an unbiased estimator of σ², and S is an unbiased estimator of σ
- ((n-1)/n)M is an unbiased estimator of μ, and ((n-1)/n)S² is an unbiased estimator of σ²
a) 1 only
b) 2 only
c) Both 1 and 2
d) Neither 1 nor 2
Show Answer
Answer: d) Neither 1 nor 2
Explanation: Statement 1: S² (sample variance with n-1 denominator) is unbiased for σ², but S (sample standard deviation) is NOT unbiased for σ due to Jensen's inequality. Statement 2: M is unbiased for μ, but ((n-1)/n)M is biased. Also, ((n-1)/n)S² is biased for σ². Therefore, neither statement is completely true.
Question 11: Overfitting/Underfitting Detection
Given a graph of a model's training and validation error, determine if it represents underfitting or overfitting.
a) Underfitting
b) Overfitting
c) Neither
d) Both
Show Answer
Answer: Cannot be determined without the graph
Explanation: This question refers to a graph that is not provided in the document. To determine overfitting vs underfitting:
- Overfitting: Training error much lower than validation error
- Underfitting: Both training and validation errors are high and similar
Question 12: Probability (Biased Coins)
You have two coins. One is fair (1/2 heads probability) and the other is biased (3/4 heads probability). You randomly pick a coin and flip it twice, getting heads both times. What is the probability that you picked the fair coin?
a) 13/32
b) 4/13
c) 2/13
d) 19/32
Show Answer
Answer: b) 4/13
Explanation: Using Bayes' theorem:
- P(HH|fair) = (1/2)² = 1/4
- P(HH|biased) = (3/4)² = 9/16
- P(fair) = P(biased) = 1/2
- P(fair|HH) = P(HH|fair)×P(fair) / [P(HH|fair)×P(fair) + P(HH|biased)×P(biased)]
- = (1/4)(1/2) / [(1/4)(1/2) + (9/16)(1/2)] = (1/8) / (1/8 + 9/32) = (1/8) / (13/32) = 4/13
Question 13: Matrix Rank Relationships
If rank(A) is 2 and rank(AB) is 3, then:
a) rank(B) = 3
b) rank(B) <= 3
c) rank(B) >= 3
d) data insufficient
Show Answer
Answer: c) rank(B) >= 3
Explanation: We know that rank(AB) ≤ min(rank(A), rank(B)). Given rank(AB) = 3 and rank(A) = 2, this seems impossible under normal circumstances. However, if we interpret this as asking what must be true about rank(B), then since rank(AB) = 3, we must have rank(B) ≥ 3.
Question 14: System of Equations (Version 1)
The number of solutions for the system: 2x+y-z=4, x-2y+z=-2, -x+2y-z=-2 is:
a) 0
b) 1
c) Inf
d) Can't be determined
Show Answer
Answer: a) 0
Explanation: Looking at equations 2 and 3:
- x - 2y + z = -2
- -x + 2y - z = -2 Adding these equations: 0 = -4, which is a contradiction. Therefore, the system has no solutions.
Question 15: Logistic Classification Likelihood
When classifying data with logistic classification, what is the upper bound of the likelihood in the maximum likelihood method? Is this value attainable?
a) 1, Yes
b) e, No
c) 1, No
d) 0, Yes
Show Answer
Answer: a) 1, Yes
Explanation: The likelihood is the product of probabilities, each between 0 and 1. The maximum possible value is 1, achieved when all predictions are correct with probability 1. This is theoretically attainable when the data is perfectly separable.
Question 16: Neural Network Output Calculation
A 3-input neuron has weights 1, 4 and 3. The transfer function is linear with constant of proportionality = 3. The inputs are 4, 8 and 5 respectively. What will be the output?
a) 51
b) 153
c) 54
d) 160
Show Answer
Answer: b) 153
Explanation:
- Weighted sum = w₁x₁ + w₂x₂ + w₃x₃ = 1×4 + 4×8 + 3×5 = 4 + 32 + 15 = 51
- Output = transfer function × weighted sum = 3 × 51 = 153
Question 17: PCA Statements (Version 1)
Which statements about PCA are correct?
(i) We must standardize the data before applying
(ii) We should select principal components which explain the highest variance
(iii) We should select principal components which explain the lowest variance
(iv) We can use PCA for visualizing data in lower dimensions
a) (i), (ii) and (iv)
b) (ii) and (iv)
c) (iii) and (iv)
d) (i) and (iii)
Show Answer
Answer: b) (ii) and (iv)
Explanation:
- (i) False: Standardization is recommended but not always necessary
- (ii) True: We select components explaining the highest variance to retain most information
- (iii) False: We want high variance components, not low variance ones
- (iv) True: PCA is commonly used for dimensionality reduction and visualization
Question 18: Algorithm Selection (Regression)
Given a scenario involving a dataset with labeled data and a need to predict a continuous output, which algorithm is most suitable?
a) K-means clustering
b) Linear regression
c) Decision tree classifier
d) Support vector machine (SVM) with RBF kernel
Show Answer
Answer: b) Linear regression
Explanation: For predicting continuous output with labeled data, this is a supervised regression problem. Linear regression is specifically designed for predicting continuous values. K-means is unsupervised, decision tree classifier is for classification, and SVM with RBF can work but linear regression is more direct.
Question 19: Infinite Solutions Parameter
The system x + y + z = 1, ax - ay + 3z = 5, 5x - 3y + az = 6 has infinite solutions if a =?
a) -3
b) 3
c) -4
d) 4
Show Answer
Answer: d) 4
Explanation: For infinite solutions, the system must be consistent and the coefficient matrix must have rank less than the number of variables. This occurs when the equations become dependent, which happens when a = 4. This can be verified by setting up the augmented matrix and finding when rank(A) = rank([A|b]) < 3.
Question 20: Probability (Speed Trap - Version 1)
Police enforce speed limits on routes A, B, C, D operated 40%, 30%, 20%, 30% of time. Biff speeds with route probabilities 0.2, 0.1, 0.5, 0.2. What's the probability he gets a ticket?
a) 0.27
b) 0.93
c) 0.73
d) 0.07
Show Answer
Answer: a) 0.27
Explanation: P(ticket) = Σ P(route i) × P(enforcement on route i) = 0.2×0.4 + 0.1×0.3 + 0.5×0.2 + 0.2×0.3 = 0.08 + 0.03 + 0.10 + 0.06 = 0.27
Question 21: Matrix Eigenvalues (A^19)
Let A be a 2×2 matrix with a11=a12=a21=+1 and a22=-1. The eigenvalues of matrix A^19 are:
a) 1024 and -1024
b) 1024√2 and -1024√2
c) 4√2 and -4√2
d) 512√2 and -512√2
Show Answer
Answer: d) 512√2 and -512√2
Explanation:
- Matrix A = [1 1; 1 -1]
- Characteristic equation: det(A - λI) = (1-λ)(-1-λ) - 1 = λ² - 2 = 0
- Eigenvalues of A: λ₁ = √2, λ₂ = -√2
- Eigenvalues of A^19: (√2)^19 = 2^(19/2) = 2^9.5 = 512√2, (-√2)^19 = -512√2
Question 22: Characteristic Equation and Matrix Inverse
If the characteristic equation of matrix A is t²-t-1=0, then:
a) A^(-1) does not exist
b) A^(-1) exists but cannot be determined
c) A^(-1) = A - I
d) A^(-1) = A + I
Show Answer
Answer: c) A^(-1) = A - I
Explanation:
- From the characteristic equation: A² - A - I = 0
- Rearranging: A² = A + I
- Multiplying both sides by A^(-1): A = I + A^(-1)
- Therefore: A^(-1) = A - I
- Since 0 is not an eigenvalue (from t²-t-1=0), A^(-1) exists.
Question 23: Bias-Variance Tradeoff
Fitting data from a cubic function corrupted by Gaussian noise using linear (M1) and 5th degree polynomial (M5) models:
a) Bias(M1) ≤ Bias(M5), Variance(M1) ≤ Variance(M5)
b) Bias(M1) ≥ Bias(M5), Variance(M1) ≤ Variance(M5)
c) Bias(M1) ≤ Bias(M5), Variance(M1) ≥ Variance(M5)
d) Bias(M1) ≥ Bias(M5), Variance(M1) ≥ Variance(M5)
Show Answer
Answer: b) Bias(M1) ≥ Bias(M5), Variance(M1) ≤ Variance(M5)
Explanation: Linear model (M1) cannot capture the cubic relationship, so it has high bias but low variance (simple model). 5th degree polynomial (M5) can fit the cubic function well, so it has low bias but high variance (complex model). This is the classic bias-variance tradeoff: simpler models have higher bias and lower variance.
Question 24: Probability (Chessboard - Version 2)
Two squares are chosen at random on a chessboard. What is the probability that they have a side in common?
a) 1/36
b) 1/9
c) 2/9
d) 1/18
Show Answer
Answer: d) 1/18
Explanation: Same as Question 8:
- Total ways to choose 2 squares from 64: C(64,2) = 2016
- Adjacent pairs: 4 corner squares (2 neighbors each) + 24 edge squares (3 neighbors each) + 36 interior squares (4 neighbors each)
- Total adjacent pairs = (4×2 + 24×3 + 36×4)/2 = 112
- Probability = 112/2016 = 1/18
Question 25: System of Equations (Version 2)
The system 2x+y-z=1, x-2y+z=-2, -x+2y-z=-1 has how many solutions?
a) 0
b) 1
c) 2
d) Infinitely many
Show Answer
Answer: a) 0
Explanation: Looking at equations 2 and 3:
- x - 2y + z = -2
- -x + 2y - z = -1 Adding these equations: 0 = -3, which is a contradiction. Therefore, the system has no solutions.
💻 Part B – Amazon ML Summer SchoolCoding Questions
Problem 1: Statistical Calculations (Mean, Median, Mode)
Given 'n' integers, calculate their mean, median and mode with specific precision requirements.
Function Signature
function calculateStats(input1: number, input2: number[]): [number, number, number]
Requirements
- Mean and median: Must be correct to six decimal places
- Mode: Return as integer (smallest number in case of ties)
Definitions
Mean
Defined as the average of all numbers in the array:
Mean = (sum of all elements) / (number of elements)
Median
Defined as the middle element of the sorted array:
- If n is even: median = average of two middle elements
- If n is odd: median = middle element
Note: Array must be sorted in ascending order for median calculation.
Mode
Defined as the number with the highest frequency:
- If multiple numbers have same highest frequency, choose the smallest number
Input Specification
input1
: Integer (1 ≤ input1 ≤ 1000) - length of arrayinput2
: Integer array containing 'input1' integers
Output Specification
output1
: Mean (double with 6 decimal places)output2
: Median (double with 6 decimal places)output3
: Mode (integer)
Test Cases
Example 1
Input1: 3
Input2: [1, 2, 3]
Output: 2.000000, 2.000000, 1
Example 2
Input1: 5
Input2: [41, 18467, 6334, 26500, 19169]
Output: 14102.200000, 18467.000000, 41
Calculation for Example 2:
- Mean: (41 + 18467 + 6334 + 26500 + 19169) / 5 = 14102.200000
- Median: Sorted array [41, 6334, 18467, 19169, 26500] → 18467.000000
- Mode: All elements appear once, smallest is 41
Problem 2: Robot String Manipulation
There are three robots named Ray, Ben and Kevin. Initially Ray has a string S of length N, while the other two robots have empty strings. We can make either of the following moves:
- Move 1: Remove the first character from Ray's string and append it to Ben's string
- Move 2: Remove the last character from Ben's string and append it to Kevin's string
You must perform either of the two moves mentioned above in such a way that the strings left with Ray and Ben are empty and the string left with Kevin is lexicographically the smallest.
Task
Return the lexicographically smallest string that Kevin has after completing this activity.
Important Note
For any two given strings, a string is said to be lexicographically smaller than the other if it comes before the other string in the dictionary.
Constraints
- Input: String S of length N
- Output: Lexicographically smallest string that Kevin will have
Approach Hints
- This is essentially a stack-based problem
- Ben's string acts like a stack (LIFO - Last In, First Out)
- Need to find optimal sequence of moves to minimize lexicographic order
Problem 3: Minimum Cost Flight Path
There are N cities in a country. George starts at city 1 (airport) and wants to reach city N. From any city i, he can travel to:
- City (i+1) if it exists, OR
- City (i+3) if it exists
Cost Calculation
Flight cost between cities i and j is calculated as:
Cost = |Cost[i] - Cost[j]|
where Cost[i] represents the cost value for city i in the given array.
Task
Find and return the minimum possible cost to reach city N from city 1.
Constraints
- Number of cities N > 3
- Use 1-based indexing
- Can only move to (i+1) or (i+3) from city i
Input Specification
input1
: Integer N (number of cities)input2
: Integer array A (cost values for each city)
Output Specification
- Return minimum total cost to reach city N
Test Cases
Example 1
Input1: 4
Input2: [1, 4, 5, 2]
Output: 1
Explanation:
- Direct path: City 1 → City 4 (using i+3 rule)
- Cost: |Cost[1] - Cost[4]| = |1 - 2| = 1
Example 2
Input1: 6
Input2: [4, 12, 13, 18, 10, 12]
Output: 10
Explanation: Optimal path: City 1 → City 2 → City 3 → City 6
- City 1 → City 2: |4 - 12| = 8
- City 2 → City 3: |12 - 13| = 1
- City 3 → City 6: |13 - 12| = 1
- Total cost: 8 + 1 + 1 = 10
Algorithm Approach
- This is a Dynamic Programming problem
- Can also be solved using BFS/Dijkstra's algorithm
- State:
dp[i] = minimum cost to reach city i
- Transition:
dp[i] = min(dp[i-1] + cost(i-1,i), dp[i-3] + cost(i-3,i))
Below are problem statements similar to those reportedly asked in previous Amazon ML Summer School coding tests, based on participant experiences:
1. Mean, Median, Mode Calculation
Given an array of integers, write a function to compute the mean, median, and mode.
2. Climbing Stairs Variant
Given a staircase with n
steps, you can climb 1 or 2 steps at a time. Find the number of distinct ways to reach the top.
3. Array Manipulation with Constraints
Given an array of package weights and a maximum weight limit, compute the minimum cost to deliver all packages by choosing consecutive packages in a single trip, where the cost is based on the sum of weights in each trip.
4. Run Length Encoding
Given a string, return its run-length encoding (e.g., "aaaaabb"
→ "a5b2"
).
5. Median in a Stream
Given a stream of integers, write a function to add numbers to the stream and return the median after each addition.
6. Maximum Subarray Sum
Given an array of integers (positive and negative), find the maximum sum of any contiguous subarray.
7. Valid Parentheses
Given a string containing only parentheses characters ('('
, ')'
, '{'
, '}'
, '['
, ']'
), determine if the input string is valid. A string is valid if all open brackets are closed in the correct order.
8. Top K Frequent Elements
Given an array of integers and an integer k
, return the k most frequent elements in the array. You can return the answer in any order.
9. Longest Substring Without Repeating Characters
Given a string, find the length of the longest substring without repeating characters.
10. Minimum Cost to Merge Stones
Given an array of integers representing stone weights and an integer k
, you can merge exactly k consecutive stones into one stone with weight equal to their sum. Find the minimum cost (sum of merged weights) to merge all stones into one.
Apply now if you haven't applied it yet- Check now
Join Telegram group for any doubts & discussions!