Get into GameDev:
cheat-codes for gamedev interviews
━ Data Structures ━
Introduction
Arrays vs Lists
The most common interview questions on data structure compare arrays and lists. As you can see from the graphic below (Source: LINK), arrays have lightning-fast access speeds which make them the top choice for the vast majority of algorithms. What this graphic fails to capture is the benefit of an array's data locality; since an array stores its memory contiguously, you are less likely to cache-miss during most operations. There is no such guarantee for lists which may store adjacent node data on opposite sides of the heap.
Array Entry Fast Deletion
Arrays are listed in this graphic with a deletion time of O(n) but that is only if we enforce an order to the elements. If we instead swap out a deleted entry with the last entry of an array, then deletion too becomes a constant time operation. Likewise, if you can insert a new element into the back, then insertion too becomes a constant time operation. So the graphic below really only tells half of the story.
List Entry Deletion
The graphic below lists the insertion time for lists as O(1) but that is assuming you don't need to preserve order in which case it certainly would be O(n) because you have to check every element to see if you are larger. So in summary, maybe don't trust random graphics like this one from the internet 😂. The main benefit of a doubly-linked list is that any given element is that it sores pointers to the next and previous entries (singly-linked lists don't store the previous entry). So singly-linked lists have to walk all the way up the chain to find the entry to delete before they can perform pointer fixup. That operation is O(n), certainly not O(1) like the graphic below states. Order preservation makes a big difference.
- "Array lists" these are simply arrays with dynamic size. They have contiguous data.
- Lists can be used to make circular linked lists
- Arrays can be used to make ring buffers
Array Entry Fast Deletion
Arrays are listed in this graphic with a deletion time of O(n) but that is only if we enforce an order to the elements. If we instead swap out a deleted entry with the last entry of an array, then deletion too becomes a constant time operation. Likewise, if you can insert a new element into the back, then insertion too becomes a constant time operation. So the graphic below really only tells half of the story.
List Entry Deletion
The graphic below lists the insertion time for lists as O(1) but that is assuming you don't need to preserve order in which case it certainly would be O(n) because you have to check every element to see if you are larger. So in summary, maybe don't trust random graphics like this one from the internet 😂. The main benefit of a doubly-linked list is that any given element is that it sores pointers to the next and previous entries (singly-linked lists don't store the previous entry). So singly-linked lists have to walk all the way up the chain to find the entry to delete before they can perform pointer fixup. That operation is O(n), certainly not O(1) like the graphic below states. Order preservation makes a big difference.
Since arrays are of fixed-length, their worst case scenario is running out of entries and needing to reallocate a new section of memory to move all of their data to. The performance of this 'resize' operation is horrible! And it's in those circumstances where lists really shine because they can freely add elements to the back of their list, or even into their middle (while preserving order), without needing to moving data that has already been stored.
I want to further emphasize that it is rarely a toss-up between these data structures. Cache misses are so bad (so very bad! 😈) therefore we pretty much always want to use arrays in game development. Arrays are nearly always better and we need to design our games to prevent array resizing rather than elect to use data structures that will break the continuity of the data.
I want to further emphasize that it is rarely a toss-up between these data structures. Cache misses are so bad (so very bad! 😈) therefore we pretty much always want to use arrays in game development. Arrays are nearly always better and we need to design our games to prevent array resizing rather than elect to use data structures that will break the continuity of the data.
Practice Question: Let's say that I am storing data entries of type Foo and their size is very small (far less than a cache line). I have no idea how many I will need to store, it could be 10 or 10 thousand. Their order matters. Design the optimal data structure for storing these entries.
We have to store all the data values so there is no discussion here to concern space-complexity, we will focus on time-complexity. Arrays are almost always the best option but since the order matters here and we may have thousands, performing a reallocation could become nasty. Our best trade off is to store as many of the entries as we can in an array and then add additional arrays for additional entries. The fact that these entries are small was a bit of a hint informing us that we can fit multiple into a cache line. Our array size should be structured such that one cache line's worth of entries can fit into each array. If entries are 4B each, A 64B cacheline would have room for 16 entries so the array should be of length 16. That allows us to load a full array into the cacheline for access. If order matters, and in this case it does, we could have a bit of a problem if we need to insert an element into the middle of an array ( since we cannot add it to the end our our last array). However this approach still gives us the best trade off, we simply need to split the array into different arrays.
Practice Question: How would you detect if a linked list included a loop?
Walk two pointers A & B forward from the start node. Move A at 1 step at a time, B moves at 2 steps at a time. End if A and B land on the same node (loop detected), or if they both reach the end (no loop detected).
Hashing & HashMaps
Hashing vs Serialization
The main idea behind a hashing algorithm is that it provides a fast uni-directional conversion from one data type to another. It is not serialization. Serialization is bi-directional whereas hashing is uni-directional, meaning that you cannot take a hashed value and apply the hashing algorithm backwards to determine the original value.
How HashMaps use Hashing
Hashing has an important role in data storage. C++'s Map ("omap") and Unordered Map ("umap") both use hashing to covert a key to an index value. This speeds up the look-up process because when code needs to determine if a key is present in a map it can jump all the way to that index rather than searching every index.
C++'s Map ("omap") vs Unordered Map ("umap")
The main idea behind a hashing algorithm is that it provides a fast uni-directional conversion from one data type to another. It is not serialization. Serialization is bi-directional whereas hashing is uni-directional, meaning that you cannot take a hashed value and apply the hashing algorithm backwards to determine the original value.
How HashMaps use Hashing
Hashing has an important role in data storage. C++'s Map ("omap") and Unordered Map ("umap") both use hashing to covert a key to an index value. This speeds up the look-up process because when code needs to determine if a key is present in a map it can jump all the way to that index rather than searching every index.
C++'s Map ("omap") vs Unordered Map ("umap")
- omap defines a comparison function and stores elements in order
- omap is good for narrow key domain, umap is good for wide/unknown domains
- Time Complexity
- omaps use Red-Black Trees
- umaps use hash tables
- As you can see from the comparisons below (Source: Link), umaps are faster on average but worse in the worst cases. The access for hash tables is listed as N/A because the hashing algorithm speed will vary, but generally hashing is a very fast operation and relative to these other operations it should be must faster. omaps can be slow bc of reference stability (when they place an item, they need to keep it there in case it is being referenced).
- I recommend this Avalanche Talk on many hashmap optimization techniques
HashMap Implementations
- Hashtables are usually implemented with arrays that have a prime number of entries. During a typical hashing algorithm, you modulo the number of slots with the hash function result and prime numbers will lead to better stratified results.
- Hashing algorithms are best if they can spread the input well.
- Collision Handling:
- Collision Handling refers to the process of how a hashmap deals with multiple keys hashing to the same hashtable index. There are two popular methods for dealing with collisions.
- #1: Open Addressing - This method requires you to ensure that your hash table is large enough to store all entries. Instead of storing the key/value exactly at the hashed key's index, we walk forward through the indices until we find an available index.
- There are a few variations of this approach based on how we determine the order of indicies to search:
- Linear addressing (aka Linear Probing): we walk from indices 1⇾2⇾3.
- Quadratic addressing: we jump from indices 1^2 ⇾ 2^2 ⇾ 3^2.
- Double hashing: we use a second hashing algorithm to determine our step size.
- Linear addressing (aka Linear Probing): we walk from indices 1⇾2⇾3.
- At some point the array will become 'too full', this is probably not due to it being completely full but instead based on an 'arbitrary load factor' (typically 75% meaning 75% full). Once it is too full, the array length is extended and all entries are rehashed into the larger array.
- There are a few variations of this approach based on how we determine the order of indicies to search:
- #2: Linked Lists - This method (sometimes called Separate Chaining) requires a linked list at every index of the hash table. And once you hash a key, it determines which linked list you should add your entry to.
- #1: Open Addressing - This method requires you to ensure that your hash table is large enough to store all entries. Instead of storing the key/value exactly at the hashed key's index, we walk forward through the indices until we find an available index.
- Collision Handling refers to the process of how a hashmap deals with multiple keys hashing to the same hashtable index. There are two popular methods for dealing with collisions.
- String Hashing
- Beware, you may need Unicode support which will make most ASCII algorithms useless.
- Simple example: sum the products of each character’s ASCII multiplied by a prime number quadratic.
- "BED" → ( 66 * 31^0 ) + ( 79 * 31^1 ) + ( 68 * 31^2 ) → 67863
Trees
Tries are useful for string storage.
Binary Search Trees
AVL vs Red-Black
Binary Search Trees
- Binary trees are not necessarily sorted. Binary Search Trees are the sorted version.
- Balanced vs unbalanced explainer video: LINK
- Unbalanced tree lookup is O(n), balanced tree lookup ( AVL or Red-Black ) is O( log n )
- Balanced trees are pretty much always better, but they take up a little more storage.
AVL vs Red-Black
- AVL and RB both have O(logn) balance.
- Rebalance for AVL is O(logn) yet O(1) for RB.
- AVL stores an int per node (height / balance factor), Red-Black stores 1 bit (red/black)
- AVL better for lookup (databases), Red-Black better for insert / delete (std::map and std::set)
- AVL has faster lookups than Red Black because they are more strictly balanced.
- Red Black Trees provide faster insertion and removal operations than AVL trees because fewer rotations are done due to relatively relaxed balancing.
QuadTrees:
A quadtree is a structure used to partition space into 2D sections. In the below quadtree, the square is split into 4 squares, and within those four squares, both of the the right squares have children. Quadtree are used all the time for game dev, a common application is collision culling. We do not need to check if E and A are colliding since they are in two different sections and those sections are not touching.
A quadtree is a structure used to partition space into 2D sections. In the below quadtree, the square is split into 4 squares, and within those four squares, both of the the right squares have children. Quadtree are used all the time for game dev, a common application is collision culling. We do not need to check if E and A are colliding since they are in two different sections and those sections are not touching.
Tree Traversal
Depth First Search (using stack):
add root to stack while stack not empty: → pop and discover first element in stack → push undiscovered neighbors of first element |
Depth First Search (using recursion):
call DFS on root for each neighbor: → if neighbor undiscovered: → → call DFS on neighbor |
Breadth First Search (using queue):
add root to queue
while queue not empty:
→ dequeue and discover first element in queue
→ enqueue all undiscovered neighbors of first element
Binary Tree Traversal
Pre-Order / In-Order / Post-Order
There are several ways to traverse a tree. The main orders to know are pre-order, in-order, and post-order. These distinctions refer to the order that the root nodes are visited with respect to their children. For example if the root comes before the children we call it "pre"-order.
add root to queue
while queue not empty:
→ dequeue and discover first element in queue
→ enqueue all undiscovered neighbors of first element
Binary Tree Traversal
- Note: Binary trees are not necessarily sorted. Binary Search Trees are the sorted version.
- No need to verify that neighbors are undiscovered because all trees are DAGs
Pre-Order / In-Order / Post-Order
There are several ways to traverse a tree. The main orders to know are pre-order, in-order, and post-order. These distinctions refer to the order that the root nodes are visited with respect to their children. For example if the root comes before the children we call it "pre"-order.
These different orders are easy to implement in DFS where our choice of when to visit the root determines the order. See below how the root node visitation can be placed in the first spot for pre-order, or we can choose the third arrow's location for post-order.
- ← Visit root here for pre-order
- Recurse on left child
- ← Visit root here for in-order
- Recurse on right child
- ← Visit root here for post-order
Practice Question: How would you visit the nodes of a tree in level-order? Level order for the tree above would be 1 - 2 - 3 - 4 - 5 - 6 - 7.
https://www.geeksforgeeks.org/level-order-tree-traversal/
Pathfinding
A*
Dijkstra
Dijkstra vs A*
- Youtube Tutorial
- Each node is given a value f(n) = g(n) + h(n)
- g(n) = cost of the path from n to the start node
- h(n) = cost of the cheapest path from n to the goal node
- Each node should track its predecessor as well.
- Typically implemented with a priority queue where we repeatedly select the minimum cost node to expand until that node is the goal itself
Dijkstra
- Youtube Tutorial
- Dijkstra is blind (slow), Can’t handle negative weight edges (unlike Bellman-Ford), it works on directed and undirected graphs
- Algorithm:
- label each node’s distance from the start node as infinity: for all i, dist( i ) = inf
- If node M is connected to the start, then replace dist( M ) with edge( Start, M )
- AKA we replace the distance to M from inf to instead be the cost to go from Start to M.
- AKA we replace the distance to M from inf to instead be the cost to go from Start to M.
- While there remains unvisited nodes, visit the one of the least distance, L, and see if it can relax each one of its neighbors, N, by replacing dist( N ) with dist( L ) + edge( L, N )
- AKA we are replacing the distance to N with the distance to L plus the distance from L to N.
- Once we have calculated the distance to the goal node, we are done.
Dijkstra vs A*
- A* uses a heuristic h(n) to calculate f(n) which is smarter than Dijkstra's blind search and usually performs better as a result.
- A* can work with negative weight edges, Dijkstra cannot.
AI & Animation Data Structures
Certainly these data structures can be used for more than just these applications, but that is how I have categorized them for now.
FSMs (Finite State Machines)
Behavior Trees
Blend Shapes
Nav Mesh
FSMs (Finite State Machines)
- Extremely common for tracking animation states such as a state for "Idle" with transitions to "Run" and "Walk" states.
- Sometimes we have a separate FSM for layers body parts such as an FSM for just the hands animations that operates on top of the full body animations. This allows us to control the hands while walking.
- Because it can be burdensome to put all logic on transitions between states, some studios use passthrough-states which allow you to test conditions in order to enter a state but you can end on that state so if you fail a condition to enter a passthrough or fail a condition to exit a passthrough you will falter back to the last non-passthrough state you visited. This approach can drastically reduce the number of transitions needed.
- Much easier to debug than Behavior Trees.
- It's a little harder to visualize more complex if-else statements in FSMs.
Behavior Trees
- Much more modular components that only need to return success, fail, or in-progress.
- Useful for more complex AI that are rapidly changing goals.
- They scale much better than FSMs and it's much easier to have subroutines (nested behavior trees).
Blend Shapes
- Youtube Link
- Stores common poses for facial features such as a smile and a frown and allows animators to tune the current appearance as a ration between these baked poses.
- Vastly reduces animation data.
- Vast reduces the amount of time it takes to animate (animators can just adjust the sliders).
Nav Mesh
- Bakes environment setup for designers to fine-tune which areas of a level are traversable
- Can be dynamically updated (if for example a door opens), this is usually done by toggling a mesh link versus rebaking everything.
Queues
Queues
Deques (pronounced "Decks") AKA Double Ended Queues
- Can be implemented with arrays or linked lists
- If a queue is an array, it's likely a ring buffer to make use of space previously occupied by elements that are now dequeued. If a queue gets filled up then we have run out of space, bad. One method for dealing with this is enqueuing a pointer to a new array as if it was an element.
Deques (pronounced "Decks") AKA Double Ended Queues
- similar to arrays, but with fast insert/delete of elems also at both bookend vals (not just end)
- elements of a deque can be scattered in different chunks of storage, with the container keeping the necessary information internally to find any element in constant time
- push_back, push_front, pop_back, pop_front
- Nice if we only want to store n things, so we can remove from the end while adding to front
- Nice if we want to let privileged clients who already “waited in line” jump to the front of the line
- Unlike a deque, a double-linked list allows insertions/erasures in the middle.
Common Data Structure Algorithms
Sorting
Quick Sort is typically regarded as the fastest sorting algorithm but it does require a non-constant amount of additional space so in some limited circumstances Insertion Sort may be preferred since that algorithm is more space efficient.
Quick Sort is typically regarded as the fastest sorting algorithm but it does require a non-constant amount of additional space so in some limited circumstances Insertion Sort may be preferred since that algorithm is more space efficient.
You should understand both of these sorting algorithms to the extent that you would feel comfortable coding one from scratch with guidance from an interviewer. I think that would be a somewhat silly question but I have been asked to do this on interviews.
Randomization: The simplest randomization algorithm is probably Fischer Yates, here is the code for you to familiarize yourself.
Randomization: The simplest randomization algorithm is probably Fischer Yates, here is the code for you to familiarize yourself.
String Algorithms
Interviewers at big tech companies are a little obsessed with string algorithms. Make sure you feel very comfortable manipulating strings; Be aware of your target language's features for manipulating strings and characters. Here are a few algorithms I suggest you memorize. You can go on LeetCode to practice a bunch of them but here are just a few that I saw came up particularly often.
Memoization Problems AKA Dynamic Programming
You should be familiar with memoization patterns which is typically "recursion plus book-keeping".
Here is an example for you to use for practice: Youtube Link
Make sure you are able to solve at least one of this kind of problem, they can be a little tricky at first.
Interviewers at big tech companies are a little obsessed with string algorithms. Make sure you feel very comfortable manipulating strings; Be aware of your target language's features for manipulating strings and characters. Here are a few algorithms I suggest you memorize. You can go on LeetCode to practice a bunch of them but here are just a few that I saw came up particularly often.
- Reverse String (Iteratively): Use two pointers. One will iterate from the start of the string and move up until it is half-way through. The other will start at the end and will go down until it is half-way through. As you go, swap the characters pointer to by each pointer. For example the first step would swap the first and last character. The last step would swap the middle two characters or if the string is an odd number of characters we don't need to move the middle character. Be mindful not to swap the null terminator.
- Palindrome Detection: Start two pointers at the middle index and iterate out, checking if equal every step.
- Palindrome Substring Detection: Start two pointers at every index and iterate out, checking if equal every step.
Memoization Problems AKA Dynamic Programming
You should be familiar with memoization patterns which is typically "recursion plus book-keeping".
Here is an example for you to use for practice: Youtube Link
Make sure you are able to solve at least one of this kind of problem, they can be a little tricky at first.