Nvidia Coding Interview Questions : OA, Technical Rounds & Onsite

Nvidia is one of the most sought-after tech companies for engineers interested in GPU computing, AI infrastructure, and high-performance systems. Nvidia hires for SDE, GPU Software Engineer, Systems Software Engineer, Compiler Engineer, and ML Infrastructure roles. The interview process is rigorous, with a strong focus on C++, data structures, algorithms, and systems-level thinking.

Hiring Process

Nvidia's interview process typically has 5–6 stages:

Step 1 — Online Application Apply via Nvidia Careers. Strong GPA, relevant internships, and projects in GPU computing, systems programming, or ML infrastructure stand out.

Step 2 — Online Assessment (OA) Hosted on HackerRank or CodeSignal. 2–3 coding problems in 90 minutes. Focus on DSA — arrays, strings, DP, graphs. Some roles include C++ MCQs on memory management and templates.

Step 3 — Phone Screen 45–60 minute technical interview with an engineer. 1–2 medium/hard coding problems. Expect follow-ups on time and space complexity, edge cases, and code correctness.

Step 4 — Technical Onsite Round 1 (Coding) 60–75 minutes. 1–2 coding problems, often involving graphs, trees, or dynamic programming. Interviewer evaluates problem-solving approach and clean code.

Step 5 — Technical Onsite Round 2 (Systems / Architecture) Deep dive into systems design, memory management, parallel computing concepts, or compiler design depending on the role. GPU-specific roles may include CUDA or parallel algorithm questions.

Step 6 — Hiring Manager + HR Round Behavioral questions, team fit, and offer negotiation. Nvidia values ownership, technical depth, and curiosity about GPU/AI tech.

Tips: Nvidia interviewers pay close attention to how you handle edge cases and optimize solutions. Practice writing clean, production-quality C++ code. For GPU/systems roles, understand cache coherency, memory hierarchies, and parallel execution models.

Preparation Resources

Part 1 — Online Assessment (OA) Questions

1. Trapping Rain Water Hard

OA Two Pointers / Stack LeetCode: #42

Given n non-negative integers representing an elevation map where the width of each bar is 1, compute how much water it can trap after raining.

Input: height = [0,1,0,2,1,0,1,3,2,1,2,1]
Output: 6

Approach: Use two pointers from both ends. Track leftMax and rightMax. At each step, process the side with the smaller max — water at that position is max - height[i]. Complexity: Time O(n) · Space O(1) Follow-up: Solve using a monotonic stack approach.

2. Minimum Window Substring Hard

OA Sliding Window LeetCode: #76

Given strings s and t, return the minimum window substring of s that contains all characters of t. Return empty string if no such window exists.

Input: s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"

Approach: Sliding window with two frequency maps. Expand right pointer until all chars of t are covered, then shrink left pointer to minimize window. Track best window seen. Complexity: Time O(|s| + |t|) · Space O(|s| + |t|) Follow-up: What if t has duplicate characters?

3. Word Break Medium

OA Dynamic Programming LeetCode: #139

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Input: s = "leetcode", wordDict = ["leet","code"]
Output: true

Approach: DP array dp[i] = true if s[0..i-1] can be segmented. For each position i, check all substrings ending at i — if dp[j] is true and s[j..i] is in the dictionary, set dp[i] = true. Complexity: Time O(n^2) · Space O(n) Follow-up: Return all possible segmentations (Word Break II).

4. Merge Intervals Medium

OA Arrays / Sorting LeetCode: #56

Given an array of intervals, merge all overlapping intervals and return an array of the non-overlapping intervals.

Input: intervals = [[1,3],[2,6],[8,10],[15,18]]
Output: [[1,6],[8,10],[15,18]]

Approach: Sort intervals by start time. Iterate and check if current interval overlaps with the last merged interval. If yes, extend end. Otherwise, add new interval. Complexity: Time O(n log n) · Space O(n) Follow-up: Insert a new interval into a sorted list of non-overlapping intervals.

5. Maximum Subarray Medium

OA Dynamic Programming / Kadane LeetCode: #53

Given an integer array nums, find the subarray with the largest sum, and return its sum.

Input: nums = [-2,1,-3,4,-1,2,1,-5,4]
Output: 6  (subarray [4,-1,2,1])

Approach: Kadane's algorithm. Track currentSum = max(nums[i], currentSum + nums[i]) and update maxSum at each step. Complexity: Time O(n) · Space O(1) Follow-up: Return the actual subarray indices, not just the sum.

6. Number of Islands Medium

OA BFS / DFS / Union-Find LeetCode: #200

Given an m x n 2D binary grid of '1's (land) and '0's (water), return the number of islands.

Input: grid = [["1","1","0"],["1","1","0"],["0","0","1"]]
Output: 2

Approach: DFS/BFS from each unvisited '1'. Mark all connected land cells as visited. Count how many DFS/BFS calls you make — that is the number of islands. Complexity: Time O(m*n) · Space O(m*n) Follow-up: Solve using Union-Find. What about islands in 3D?

Part 2 — Phone Screen Questions

7. Serialize and Deserialize Binary Tree Hard

Phone Screen Trees / BFS LeetCode: #297

Design an algorithm to serialize and deserialize a binary tree. There is no restriction on how your serialization/deserialization algorithm should work.

Input: root = [1,2,3,null,null,4,5]
Output: [1,2,3,null,null,4,5]

Approach: BFS-based serialization — level-order traversal with null markers. For deserialization, use a queue to reconstruct the tree level by level, processing each node's children in order. Complexity: Time O(n) · Space O(n) Follow-up: Serialize using preorder DFS. Which is more compact?

8. Find Median from Data Stream Hard

Phone Screen Heap / Design LeetCode: #295

Design a data structure that supports adding integers and finding the median of all elements seen so far.

Input: addNum(1), addNum(2), findMedian() -> 1.5, addNum(3), findMedian() -> 2.0

Approach: Maintain a max-heap for the lower half and a min-heap for the upper half. Keep them balanced (size diff at most 1). Median is the top of the larger heap or average of both tops. Complexity: Time O(log n) add · O(1) findMedian · Space O(n) Follow-up: What if 99% of numbers are in [0, 100]? How can you optimize memory?

9. Course Schedule II Medium

Phone Screen Topological Sort / DAG LeetCode: #210

There are n courses with prerequisites given as pairs. Return an ordering of courses to finish all courses, or an empty array if it is impossible.

Input: numCourses = 4, prerequisites = [[1,0],[2,0],[3,1],[3,2]]
Output: [0,1,2,3]

Approach: Topological sort using Kahn's algorithm (BFS with in-degree tracking). If all nodes are processed, return order; otherwise, cycle exists — return empty array. Complexity: Time O(V + E) · Space O(V + E) Follow-up: How would you detect which specific courses form the cycle?

10. Longest Palindromic Substring Medium

Phone Screen Expand Around Center / DP LeetCode: #5

Given a string s, return the longest palindromic substring in s.

Input: s = "babad"
Output: "bab"

Approach: Expand Around Center — for each character (and between each pair), expand outward while left and right characters match. Track the longest palindrome found. Complexity: Time O(n^2) · Space O(1) Follow-up: Manacher's algorithm achieves O(n) — explain the idea.

11. LRU Cache Medium

Phone Screen HashMap + Doubly Linked List LeetCode: #146

Design a data structure that follows the Least Recently Used (LRU) cache eviction policy with get and put operations in O(1).

Input: LRUCache(2), put(1,1), put(2,2), get(1)->1, put(3,3), get(2)->-1

Approach: Combine a HashMap (O(1) lookup) with a doubly linked list (O(1) move-to-front and evict-from-tail). Maintain dummy head and tail sentinel nodes. Complexity: Time O(1) · Space O(capacity) Follow-up: Implement LFU (Least Frequently Used) cache.

12. Decode Ways Medium

Phone Screen Dynamic Programming LeetCode: #91

A string of digits can be decoded using the mapping 'A'->1, 'B'->2, ..., 'Z'->26. Given a string s, return the number of ways to decode it.

Input: s = "226"
Output: 3  ("2,2,6" / "22,6" / "2,26")

Approach: DP where dp[i] = number of ways to decode s[0..i-1]. Check single digit decode and two-digit decode (10-26) at each step. Complexity: Time O(n) · Space O(n) (optimizable to O(1)) Follow-up: What if the encoded string has '*' wildcards?

13. Maximum Product Subarray Medium

Phone Screen Dynamic Programming LeetCode: #152

Given an integer array nums, find a subarray that has the largest product, and return the product.

Input: nums = [2,3,-2,4]
Output: 6  (subarray [2,3])

Approach: Track both maxProd and minProd at each position (negative negative = positive). At each step: maxProd = max(nums[i], maxProd*nums[i], minProd\*nums[i]). *Complexity: Time O(n)· SpaceO(1) Follow-up: What if the array contains zeros?

Part 3 — Onsite Interview Questions

14. Binary Tree Maximum Path Sum Hard

Onsite Trees / DFS LeetCode: #124

A path in a binary tree is a sequence of nodes where each pair of adjacent nodes has an edge. Return the maximum path sum where the path can start and end at any node.

Input: root = [-10,9,20,null,null,15,7]
Output: 42  (path: 15 -> 20 -> 7)

Approach: Post-order DFS. For each node, compute max gain from left and right subtrees (clamped to 0). Update global max with node.val + leftGain + rightGain. Return node.val + max(leftGain, rightGain) to parent. Complexity: Time O(n) · Space O(h) Follow-up: What if negative values should never be included?

15. Sliding Window Maximum Hard

Onsite Monotonic Deque LeetCode: #239

Given an array nums and an integer k, return the max value of each sliding window of size k.

Input: nums = [1,3,-1,-3,5,3,6,7], k = 3
Output: [3,3,5,5,6,7]

Approach: Monotonic deque (decreasing). For each element, remove indices outside the window from front. Remove elements smaller than current from back. Front always holds the max index. Complexity: Time O(n) · Space O(k) Follow-up: How would you extend this to a 2D sliding window max?

16. Design GPU Memory Allocator Hard

Onsite Systems Design / Memory LeetCode: #2502

Design a memory allocator for a fixed-size memory space. Implement allocate(size, mID) to find the first free block of size consecutive free units, and free(mID) to free all memory units with the given ID.

Input: allocate(1,1)->0, allocate(1,2)->1, allocate(1,3)->2, free(2)->1, allocate(3,4)->3

Approach: Maintain an array representing memory units. For allocate, scan for size consecutive free units (first-fit). For free, scan and unmark all units with matching mID. For Nvidia roles, discuss fragmentation and best-fit vs first-fit tradeoffs. Complexity: Time O(n) allocate/free · Space O(n) Follow-up: Nvidia follow-up: How does GPU memory management differ from CPU? Discuss VRAM allocation, memory pools, and buddy allocator systems.

17. Implement Trie (Prefix Tree) Medium

Onsite Trie / Design LeetCode: #208

Implement a Trie with insert, search, and startsWith methods. All inputs consist of lowercase English letters.

Input: insert("apple"), search("apple")->true, search("app")->false, startsWith("app")->true

Approach: Each TrieNode has an array of 26 children and a boolean isEnd. Insert by traversing/creating nodes. Search checks isEnd at last character. StartsWith only checks path existence. Complexity: Time O(m) per operation · Space O(26 * m * n) Follow-up: Implement delete operation. How would you compress the trie (Patricia/Radix tree)?

18. Pacific Atlantic Water Flow Medium

Onsite BFS / DFS / Multi-source LeetCode: #417

Given an m x n matrix of heights, find all cells from which water can flow to both the Pacific (top/left) and Atlantic (bottom/right) oceans.

Input: heights = [[1,2,2,3,5],[3,2,3,4,4],[2,4,5,3,1],[6,7,1,4,5],[5,1,1,2,4]]
Output: [[0,4],[1,3],[1,4],[2,2],[3,0],[3,1],[4,0]]

Approach: Reverse BFS — start from ocean borders and flood inward (where height is >= current). Find intersection of Pacific-reachable and Atlantic-reachable cells. Complexity: Time O(m*n) · Space O(m*n) Follow-up: What if water flows to neighboring cells regardless of height difference?

19. Design a Thread-Safe Bounded Queue Hard

Onsite Concurrency / Systems LeetCode: #1188

Implement a thread-safe bounded blocking queue with enqueue (blocks if full) and dequeue (blocks if empty) operations.

Input: BoundedBlockingQueue(2), enqueue(1), enqueue(0), dequeue()->1, dequeue()->0

Approach: Use a deque with a mutex and two condition variables (not_full, not_empty). Enqueue waits on not_full, signals not_empty. Dequeue does the reverse. Critical section protects deque operations. Complexity: Time O(1) amortized · Space O(capacity) Follow-up: Nvidia context: How is this pattern used in GPU command queues and driver-kernel communication?

20. K Closest Points to Origin Medium

Onsite Heap / Quickselect LeetCode: #973

Given an array of points on a 2D plane, return the k closest points to the origin (0, 0), sorted by distance.

Input: points = [[1,3],[-2,2]], k = 1
Output: [[-2,2]]

Approach: Max-heap of size k — for each point, if heap has less than k elements or point is closer than heap top, push it and pop the farthest. Alternatively, use Quickselect for O(n) average. Complexity: Time O(n log k) heap / O(n) Quickselect · Space O(k) Follow-up: What if points are streamed in real-time and you need top-k continuously?

21. Jump Game II Medium

Onsite Greedy LeetCode: #45

Given array nums where nums[i] is the max jump length at position i, return the minimum number of jumps to reach the last index.

Input: nums = [2,3,1,1,4]
Output: 2  (jump to index 1, then to last)

Approach: Greedy BFS. Track the farthest reachable position in the current jump (curEnd) and the overall farthest reachable (farthest). Increment jumps when reaching curEnd. Complexity: Time O(n) · Space O(1) Follow-up: What is the minimum number of jumps if you can also jump backward?

22. Unique Paths in a Grid with Obstacles Medium

Onsite Dynamic Programming LeetCode: #63

A robot on an m x n grid moves only right or down. Some cells have obstacles. Count the number of unique paths from top-left to bottom-right.

Input: obstacleGrid = [[0,0,0],[0,1,0],[0,0,0]]
Output: 2

Approach: DP table where dp[i][j] = dp[i-1][j] + dp[i][j-1] if no obstacle, else 0. Initialize first row and column carefully — block propagation after any obstacle. Complexity: Time O(m*n) · Space O(m*n) -> optimize to O(n) Follow-up: Nvidia context: Model GPU thread block scheduling as a path-counting problem.

23. Reverse Nodes in k-Group Hard

Onsite Linked List LeetCode: #25

Given the head of a linked list, reverse the nodes of the list k at a time, and return the modified list. Nodes that are left with fewer than k remaining should stay in original order.

Input: head = [1,2,3,4,5], k = 2
Output: [2,1,4,3,5]

Approach: Count k nodes ahead. If fewer than k remain, stop. Reverse the group in-place using three pointers, then recursively process the next group. Connect reversed group to previous tail. Complexity: Time O(n) · Space O(n/k) recursive stack Follow-up: Solve iteratively to achieve O(1) space.

24. Design a Parallel Task Scheduler Hard

Onsite Systems Design / Scheduling LeetCode: #621

Given a list of tasks (each labeled A-Z) and a cooldown period n, find the least number of CPU intervals to finish all tasks. CPU can be idle between task executions.

Input: tasks = ["A","A","A","B","B","B"], n = 2
Output: 8

Approach: Count task frequencies. Most frequent task determines the frame size. Formula: max((maxFreq-1)*(n+1) + countOfMaxFreq, tasks.length). Nvidia follow-up: extend to GPU kernel scheduling with warp occupancy constraints. Complexity: Time O(n) · Space O(1) Follow-up: Design for GPU: how would warp divergence and SM occupancy affect scheduling decisions?

25. Alien Dictionary Hard

Onsite Topological Sort / Graph LeetCode: #269

Given a list of words sorted lexicographically in an alien language, derive the order of characters in that language.

Input: words = ["wrt","wrf","er","ett","rftt"]
Output: "wertf"

Approach: Build a directed graph from character order implied by adjacent word comparisons. Run topological sort (BFS/Kahn's). Return empty string if cycle detected. Complexity: Time O(C) (C = total chars) · Space O(1) (at most 26 chars) Follow-up: What if two words violate lexicographic order (prefix before shorter word)?

26. Count of Smaller Numbers After Self Hard

Onsite Merge Sort / BIT / Segment Tree LeetCode: #315

Given an integer array nums, return an integer array counts where counts[i] is the number of smaller elements to the right of nums[i].

Input: nums = [5,2,6,1]
Output: [2,1,1,0]

Approach: Modified merge sort — during the merge step, count inversions. When a right-side element is placed before left-side elements, those left-side elements all have a count increment. Complexity: Time O(n log n) · Space O(n) Follow-up: How would you solve this using a Binary Indexed Tree (Fenwick Tree)?

27. Longest Increasing Subsequence Medium

Onsite Dynamic Programming / Binary Search LeetCode: #300

Given an integer array nums, return the length of the longest strictly increasing subsequence.

Input: nums = [10,9,2,5,3,7,101,18]
Output: 4  (subsequence [2,3,7,101])

Approach: Patience sorting / binary search. Maintain a tails array where tails[i] is the smallest tail element of all increasing subsequences of length i+1. Binary search to update. Complexity: Time O(n log n) · Space O(n) Follow-up: Reconstruct the actual subsequence, not just the length.

28. Basic Calculator II Medium

Onsite Stack / String Parsing LeetCode: #227

Implement a basic calculator to evaluate a string expression containing non-negative integers and operators +, -, *, /.

Input: s = "3+2*2"
Output: 7

Approach: Use a stack. Scan left to right. For +/- push number (with sign) to stack. For *// pop top, compute with current number, push result. Sum the stack at end. Complexity: Time O(n) · Space O(n) Follow-up: Extend to support parentheses (Basic Calculator III).

29. Reconstruct Itinerary Hard

Onsite Hierholzer's Algorithm / Euler Path LeetCode: #332

Given a list of airline tickets as [from, to] pairs, reconstruct the itinerary starting from "JFK" using all tickets exactly once in lexical order.

Input: tickets = [["MUC","LHR"],["JFK","MUC"],["SFO","SJC"],["LHR","SFO"]]
Output: ["JFK","MUC","LHR","SFO","SJC"]

Approach: Hierholzer's algorithm for Eulerian path. Sort adjacency lists lexicographically. DFS — only add airport to result when its adjacency list is exhausted (post-order). Reverse at end. Complexity: Time O(E log E) · Space O(V + E) Follow-up: What if multiple valid itineraries exist and you need the lexicographically smallest?

30. Maximum Frequency Stack Hard

Onsite Design / HashMap LeetCode: #895

Design a stack-like data structure that pushes integers and pops the most frequent element (ties broken by most recently pushed).

Input: push(5),push(7),push(5),push(7),push(4),push(5), pop()->5, pop()->7, pop()->5, pop()->4

Approach: Maintain a freq map and a group map (freq -> stack of elements at that freq). Track maxFreq. On push, increment freq, add to group. On pop, remove from maxFreq group; decrement if group empty. Complexity: Time O(1) · Space O(n) Follow-up: What if you need to pop the least frequent element instead?

31. Remove Invalid Parentheses Hard

Onsite BFS / Backtracking LeetCode: #301

Remove the minimum number of invalid parentheses to make the input string valid. Return all possible results in any order.

Input: s = "()())()"
Output: ["(())()", "()()()"]

Approach: BFS — generate all strings by removing one character at a time. Stop at the first level that produces valid strings. Use a HashSet to avoid duplicate strings at each BFS level. Complexity: Time O(n * 2^n) · Space O(2^n) Follow-up: Can you solve it in O(n) with a two-pass approach if you only need count of removals?

32. Design a Distributed Key-Value Store Hard

Onsite System Design

Design a distributed key-value store that supports get/put operations with high availability, fault tolerance, and eventual consistency.

Key Design Points:

Consistent hashing for key distribution across nodes (virtual nodes for load balance)
Replication factor N (e.g., 3): each key replicated to N nodes on the ring
Read/write quorums: W + R > N for strong consistency, or W=1/R=1 for availability
Vector clocks for conflict resolution in eventual consistency model
Gossip protocol for node membership and failure detection
Merkle trees for anti-entropy and replica synchronization

Nvidia context: Discuss how distributed KV stores are used in ML training (parameter servers, gradient aggregation, checkpoint storage). Follow-up: How does NVLink change assumptions about distributed training vs. traditional network-based approaches?

33. Largest Rectangle in Histogram Hard

Onsite Monotonic Stack LeetCode: #84

Given an array of integers heights representing the histogram bar height, return the area of the largest rectangle in the histogram.

Input: heights = [2,1,5,6,2,3]
Output: 10

Approach: Monotonic increasing stack. Push indices. When a bar shorter than stack top is found, pop and calculate area using the popped bar as the shortest bar — width extends from current stack top to current index. Complexity: Time O(n) · Space O(n) Follow-up: Maximal Rectangle in a binary matrix (LeetCode #85) — apply histogram solution row by row.

34. Implement a Read-Write Lock Hard

Onsite Concurrency / Systems

Design a Read-Write Lock where multiple readers can read simultaneously but writers need exclusive access. Discuss fairness policies.

Key Implementation Points:

Reader-preference: readers never wait for writers unless a writer holds the lock — writer starvation risk
Writer-preference: new readers block if a writer is waiting — reader starvation risk
Fair (FIFO): use a ticket-based or queue-based approach — both readers and writers queued in arrival order
Use a mutex + condition variables: readCount atomic counter, writeLock condition

Nvidia context: GPU driver synchronization — command buffers can be read by many contexts simultaneously but must be written exclusively. Discuss cudaEventRecord and stream synchronization analogies. Follow-up: How do you handle upgrade from read lock to write lock atomically?

35. Sudoku Solver Hard

Onsite Backtracking LeetCode: #37

Write a program to solve a Sudoku puzzle by filling the empty cells. Each row, column, and 3x3 box must contain digits 1-9 exactly once.

Input: board (9x9 with some filled cells)
Output: Solved board (modified in-place)

Approach: Backtracking with constraint propagation. For each empty cell, try digits 1-9 that don't violate constraints. Recurse. If no valid digit exists, backtrack. Optimize with bitmask sets for row/col/box constraints. Complexity: Time O(9^m) (m = empty cells) · Space O(1) extra Follow-up: How would you parallelize Sudoku solving across multiple GPU threads?

36. Integer to English Words Hard

Onsite String / Recursion LeetCode: #273

Convert a non-negative integer num to its English words representation.

Input: num = 1234567
Output: "One Million Two Hundred Thirty Four Thousand Five Hundred Sixty Seven"

Approach: Process in chunks of 3 digits using a helper helper(n) for numbers 0-999. Apply group names (Thousand, Million, Billion) at each chunk level. Handle edge cases: zero, teens (11-19), multiples of 10. Complexity: Time O(1) (bounded by max int) · Space O(1) Follow-up: Generate words in a different language (internationalization). How would you structure this?

37. Smallest Range Covering Elements from K Lists Hard

Onsite Heap / Sliding Window LeetCode: #632

You have k lists of sorted integers. Find the smallest range [a, b] such that there is at least one number from each list within the range.

Input: nums = [[4,10,15,24,26],[0,9,12,20],[5,18,22,30]]
Output: [20,24]

Approach: Min-heap with one element from each list. Track the current maximum. Pop min, update range if [min, curMax] is smaller, push next element from same list. Stop when any list is exhausted. Complexity: Time O(n log k) · Space O(k) Follow-up: How does this problem model finding a common window in distributed log streams?

38. Bitwise AND of Numbers Range Medium

Onsite Bit Manipulation LeetCode: #201

Given two integers left and right, return the bitwise AND of all numbers in the range [left, right].

Input: left = 5, right = 7
Output: 4

Approach: Find the common prefix of left and right in binary — shift both right until equal, tracking shift count. Result is the common prefix shifted back left. Bits that differ across the range become 0. Complexity: Time O(log n) · Space O(1) Follow-up: Nvidia context: How is bit masking used in GPU warp predicate execution and SIMT divergence?

39. Find All Anagrams in a String Medium

Onsite Sliding Window / HashMap LeetCode: #438

Given strings s and p, return an array of all start indices of p's anagrams in s.

Input: s = "cbaebabacd", p = "abc"
Output: [0,6]

Approach: Fixed-size sliding window of size p.length. Maintain a frequency diff counter — when all frequencies match (count == 0), record the window start. Slide right, updating frequencies. Complexity: Time O(n) · Space O(26) = O(1) Follow-up: What if characters in p can be Unicode (not just lowercase ASCII)?

40. Design a Rate Limiter Medium

Onsite System Design / Sliding Window LeetCode: #359

Design a rate limiter that allows at most N requests per second per user. Discuss token bucket, leaky bucket, fixed window, and sliding window approaches.

Approach Comparison:

Fixed window: simple, but burst at window boundary (2x allowed in 1 second)
Sliding window log: precise, but high memory (store timestamps per user)
Sliding window counter: approximate, O(1) memory using two counters
Token bucket: allows controlled bursts; tokens regenerate at fixed rate
Leaky bucket: smooths traffic; useful for Nvidia's GPU job submission queuing

Nvidia context: Rate limiting GPU API calls per process, throttling CUDA kernel launches, managing NVLink bandwidth allocation. Follow-up: Implement distributed rate limiting across multiple servers using Redis.

41. Median of Two Sorted Arrays Hard

Onsite Binary Search LeetCode: #4

Given two sorted arrays nums1 and nums2, return the median of the two sorted arrays. The overall run time complexity should be O(log(m+n)).

Input: nums1 = [1,3], nums2 = [2]
Output: 2.0

Approach: Binary search on the smaller array. Find partition points in both arrays such that elements left of partitions are less than elements right. Use maxLeft and minRight across both partitions to determine median. Complexity: Time O(log(min(m,n))) · Space O(1) Follow-up: Generalize to find the Kth smallest element in two sorted arrays.

42. Interleaving String Hard

Onsite Dynamic Programming LeetCode: #97

Given strings s1, s2, and s3, determine if s3 is formed by an interleaving of s1 and s2.

Input: s1 = "aabcc", s2 = "dbbca", s3 = "aadbbcbcac"
Output: true

Approach: 2D DP where dp[i][j] = true if s3[0..i+j-1] can be formed by interleaving s1[0..i-1] and s2[0..j-1]. Transition: match s1[i-1] or s2[j-1] with current s3 char. Complexity: Time O(m*n) · Space O(m*n) -> optimize to O(n) Follow-up: Interleaving of K strings — how does complexity scale?

43. Minimum Cost to Hire K Workers Hard

Onsite Heap / Greedy LeetCode: #857

There are n workers with given quality and wage values. Hire exactly k workers such that every chosen worker is paid at least their expected wage, and all are paid in proportion to their quality. Return the minimum total wage.

Input: quality = [10,20,5], wage = [70,50,30], k = 2
Output: 105.0

Approach: Sort workers by wage/quality ratio. For each worker as the highest ratio (captain), find cheapest k workers from those with lower ratios using a max-heap of quality. Total cost = captain's ratio * sum of k qualities. Complexity: Time O(n log n) · Space O(k) Follow-up: What if you need to minimize maximum wage disparity rather than total cost?

44. Split Array Largest Sum Hard

Onsite Binary Search / DP LeetCode: #410

Given an integer array nums and an integer k, split nums into k non-empty subarrays to minimize the largest subarray sum.

Input: nums = [7,2,5,10,8], k = 2
Output: 18  (split: [7,2,5] and [10,8])

Approach: Binary search on the answer (range: max(nums) to sum(nums)). For each candidate answer, greedily check if array can be split into at most k parts with each part sum <= candidate. Complexity: Time O(n log(sum)) · Space O(1) Follow-up: Nvidia context: load balancing GPU workloads across streaming multiprocessors.

45. Design a GPU Kernel Launch System Hard

Onsite System Design / GPU Architecture

Design a system for scheduling and executing GPU kernels across multiple CUDA streams on a multi-GPU server. Discuss dependencies, concurrency, and resource management.

Key Design Points:

CUDA Streams: kernels within same stream execute sequentially; across streams can overlap
CUDA Graphs: capture kernel launch sequences, replay with minimal CPU overhead — key for inference
Dependency tracking: use CUDA events (cudaEventRecord, cudaStreamWaitEvent) to enforce cross-stream ordering
Multi-GPU: NVLink for direct GPU-GPU transfers, peer-to-peer memory access
MPS (Multi-Process Service): share GPU across multiple processes with context isolation
Occupancy: balance thread blocks per SM — too few wastes parallelism, too many causes register spilling

Expected discussion: Trade-off between fine-grained stream parallelism and synchronization overhead. How to maximize SM utilization while respecting memory bandwidth limits. Follow-up: How would you extend this for a training cluster with thousands of GPUs?

46. All O`one Data Structure Hard

Onsite Design / Doubly Linked List LeetCode: #432

Design a data structure to store strings with O(1) inc/dec operations and O(1) getMaxKey/getMinKey.

Input: inc("a"), inc("b"), inc("b"), inc("c"), inc("c"), inc("c"), getMaxKey()->"c", getMinKey()->"a"

Approach: Doubly linked list of buckets (each bucket = a frequency level with a set of keys). HashMap from key to its bucket node. Inc: move key to bucket+1 (create if not exists). Dec: move to bucket-1. Head = min freq, tail = max freq. Complexity: Time O(1) all operations · Space O(n) Follow-up: How would you adapt this for tracking GPU kernel execution frequencies?

47. Maximum Gap Hard

Onsite Bucket Sort / Radix Sort LeetCode: #164

Given an unsorted array, return the maximum difference between successive elements in its sorted form. Must run in O(n) time and space.

Input: nums = [3,6,9,1]
Output: 3

Approach: Bucket sort / pigeonhole principle. With n elements, create n-1 buckets between min and max. Each bucket stores only min and max. Maximum gap must span at least one empty bucket — compare adjacent bucket min/max values. Complexity: Time O(n) · Space O(n) Follow-up: How is radix sort used in GPU sorting (e.g., CUB library's DeviceRadixSort)?

48. Parallel Prefix Sum (Scan) Hard

Onsite Parallel Algorithms / GPU

Implement the parallel prefix sum (exclusive scan) algorithm. Given an array A of n elements, compute output[i] = A[0] + A[1] + ... + A[i-1] (exclusive scan) using parallel operations.

Input:  [3, 1, 7, 0, 4, 1, 6, 3]
Output: [0, 3, 4, 11, 11, 15, 16, 22]

Approach (Blelloch Scan):

Up-sweep (reduce): pairwise summation up a binary tree — log n parallel steps
Set root to 0
Down-sweep: distribute prefix sums down the tree — log n more steps

Total: O(n) work, O(log n) depth — optimal for GPU.

Nvidia context: Prefix scan is a fundamental GPU primitive used in stream compaction, sorting (radix sort carries), histogram building, and sparse matrix operations. Understanding it is expected for GPU software roles. Follow-up: How does warp-level scan using __shfl_down_sync differ from shared memory scan?

49. Minimum Number of Refueling Stops Hard

Onsite Greedy / Max-Heap LeetCode: #871

A car starts with startFuel liters and travels to a target distance. Fuel stations are at given positions with given fuel amounts. Return the minimum number of stops, or -1 if impossible.

Input: target = 100, startFuel = 10, stations = [[10,60],[20,30],[30,30],[60,40]]
Output: 2

Approach: Greedy + max-heap. Travel as far as possible. When stuck, retroactively refuel at the station with the most fuel (greedily pick the best past option using a max-heap). This simulates optimal choices without trying all combinations. Complexity: Time O(n log n) · Space O(n) Follow-up: Generalize to a vehicle with limited tank capacity.

50. Design a Compiler Symbol Table Hard

Onsite Design / Compiler Systems

Design a symbol table for a compiler that supports: declaring variables in nested scopes, looking up the nearest enclosing declaration of a variable, and exiting/entering scopes efficiently.

Key Design Points:

Scoped hash table: stack of HashMaps — each scope gets its own map
Enter scope: push a new HashMap onto the stack
Exit scope: pop the top HashMap
Declare(name, type): insert into top map; detect re-declaration in same scope
Lookup(name): search from top to bottom of stack — first match is nearest scope
Optimization: chained hash tables (each scope points to parent) for O(1) push/pop instead of O(depth) lookup

Nvidia context: NVCC (Nvidia's CUDA compiler) maintains symbol tables for host and device code separately. Variable qualifiers (__device__, __shared__, __constant__) add extra scope/namespace dimensions. Follow-up: How would you handle template instantiation in C++ symbol tables? What about CUDA cooperative groups and their scoping rules?

Preparation Tips

Nvidia's interviews are technically deep, especially for GPU and systems roles. Here is how to prepare effectively:

For Coding Rounds:

Master arrays, strings, trees, graphs, DP, and heap problems at Medium-Hard level
Practice writing clean, modular C++ code — Nvidia strongly prefers C++ candidates
Know STL containers: priority_queue, unordered_map, deque, set
Focus on bit manipulation — very common for GPU/embedded roles
Study monotonic stacks and deques — appear frequently in Nvidia onsite rounds

For Systems/Architecture Rounds:

Understand CPU/GPU memory hierarchy: registers, shared memory, L1/L2 cache, DRAM
Learn basic CUDA concepts: thread/block/grid hierarchy, warp execution, memory coalescing
Study parallel algorithms: prefix scan, parallel reduction, parallel sort
Know OS fundamentals: memory management, synchronization primitives, process scheduling
Practice system design: distributed key-value stores, rate limiters, task schedulers

For All Rounds:

Think out loud — Nvidia interviewers value your reasoning process as much as the solution
Always discuss time and space complexity and potential optimizations
Ask clarifying questions about constraints and edge cases upfront
Be prepared to code in C++ — not just Python

Join our Telegram group for Nvidia interview experiences shared by candidates who recently cleared the process.

PYQ's

Nvidia Coding Interview Questions : OA, Technical Rounds & Onsite

Hiring Process

Preparation Resources

Part 1 — Online Assessment (OA) Questions

1. Trapping Rain Water Hard

2. Minimum Window Substring Hard

3. Word Break Medium

4. Merge Intervals Medium

5. Maximum Subarray Medium

6. Number of Islands Medium

Part 2 — Phone Screen Questions

7. Serialize and Deserialize Binary Tree Hard

8. Find Median from Data Stream Hard

9. Course Schedule II Medium

10. Longest Palindromic Substring Medium

11. LRU Cache Medium

12. Decode Ways Medium

13. Maximum Product Subarray Medium

Part 3 — Onsite Interview Questions

14. Binary Tree Maximum Path Sum Hard

15. Sliding Window Maximum Hard

16. Design GPU Memory Allocator Hard

17. Implement Trie (Prefix Tree) Medium

18. Pacific Atlantic Water Flow Medium

19. Design a Thread-Safe Bounded Queue Hard

20. K Closest Points to Origin Medium

21. Jump Game II Medium

22. Unique Paths in a Grid with Obstacles Medium

23. Reverse Nodes in k-Group Hard

24. Design a Parallel Task Scheduler Hard

25. Alien Dictionary Hard

26. Count of Smaller Numbers After Self Hard

27. Longest Increasing Subsequence Medium

28. Basic Calculator II Medium

29. Reconstruct Itinerary Hard

30. Maximum Frequency Stack Hard

31. Remove Invalid Parentheses Hard

32. Design a Distributed Key-Value Store Hard

33. Largest Rectangle in Histogram Hard

34. Implement a Read-Write Lock Hard

35. Sudoku Solver Hard

36. Integer to English Words Hard

37. Smallest Range Covering Elements from K Lists Hard

38. Bitwise AND of Numbers Range Medium

39. Find All Anagrams in a String Medium

40. Design a Rate Limiter Medium

41. Median of Two Sorted Arrays Hard

42. Interleaving String Hard

43. Minimum Cost to Hire K Workers Hard

44. Split Array Largest Sum Hard

45. Design a GPU Kernel Launch System Hard

46. All O`one Data Structure Hard

47. Maximum Gap Hard

48. Parallel Prefix Sum (Scan) Hard

49. Minimum Number of Refueling Stops Hard

50. Design a Compiler Symbol Table Hard

Preparation Tips

Useful Resources