Generators are one of the most powerful and underestimated features in Python. In interviews, people often ask about yield, but few can explain what happens under the hood. Let's dive deep.
What is a Generator?
A generator is a function that returns an iterator. Sounds simple, but behind this lies powerful machinery for lazy evaluation and state management.
def simple_generator():
yield 1
yield 2
yield 3
gen = simple_generator()
print(next(gen)) # 1
print(next(gen)) # 2
print(next(gen)) # 3
print(next(gen)) # StopIterationKey difference from regular functions: a generator doesn't execute immediately. It returns a generator object that can be iterated.
How yield Works
When Python encounters yield, magic happens:
- Execution pause — the function "freezes" in its current state
- Value return — the value from
yieldis returned to the calling code - Context preservation — all local variables and code position are saved
- Resumption — on the next
next()call, the function continues from where it stopped
def counter(start=0):
n = start
while True:
print(f"Before yield: n = {n}")
yield n
print(f"After yield: n = {n}")
n += 1
gen = counter(10)
print(next(gen)) # Before yield: n = 10
# 10
print(next(gen)) # After yield: n = 10
# Before yield: n = 11
# 11Important: code after yield executes only on the next next() call.
Iterator Protocol
Generators automatically implement the iterator protocol. To understand how this works, let's compare a manual iterator implementation with a generator.
Manual Iterator
class Countdown:
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
# Save current value before decrementing
value = self.current
self.current -= 1 # Decrement for next iteration
return value # Return saved value
# Usage
for num in Countdown(5):
print(num) # 5, 4, 3, 2, 1Equivalent Generator
def countdown(start):
while start > 0:
yield start
start -= 1
# Usage (identical)
for num in countdown(5):
print(num) # 5, 4, 3, 2, 1Conclusion: a generator is syntactic sugar for creating iterators, but with less code and automatic state management.
Generator Advantages
1. Memory Efficiency (Lazy Evaluation)
Classic interview example:
# Bad: creates a list of billion elements in memory
def get_numbers_list(n):
return [i for i in range(n)]
numbers = get_numbers_list(1_000_000_000) # MemoryError!
# Good: generator creates elements on demand
def get_numbers_gen(n):
for i in range(n):
yield i
numbers = get_numbers_gen(1_000_000_000) # Instant!Memory measurement:
import sys
# List
lst = [i for i in range(1_000_000)]
print(sys.getsizeof(lst)) # ~8 MB
# Generator
gen = (i for i in range(1_000_000))
print(sys.getsizeof(gen)) # ~120 bytes2. Infinite Sequences
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Can take as many as needed
from itertools import islice
first_10 = list(islice(fibonacci(), 10))
print(first_10) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]3. Data Processing Pipeline
def read_large_file(path):
with open(path) as f:
for line in f:
yield line.strip()
def filter_comments(lines):
for line in lines:
if not line.startswith('#'):
yield line
def parse_data(lines):
for line in lines:
yield line.split(',')
# Generator composition
pipeline = parse_data(filter_comments(read_large_file('data.csv')))
# Process one line at a time, without loading entire file
for row in pipeline:
process(row)Advanced Techniques
Send() — Two-way Communication
Generators can not only return values but also receive them:
def moving_average():
total = 0
count = 0
average = None
while True:
value = yield average
total += value
count += 1
average = total / count
avg = moving_average()
next(avg) # Prime the generator
print(avg.send(10)) # 10.0
print(avg.send(20)) # 15.0
print(avg.send(30)) # 20.0How send() works:
- Sends value to the generator
- Value becomes the result of the
yieldexpression - Generator continues execution until next
yield - Returns new value from
yield
Throw() — Exception Handling
Method throw() allows you to throw an exception into the generator at the point where it stopped at yield.
def resilient_processor():
print("Generator started")
while True:
try:
print("→ Waiting for data...")
data = yield # ← Exception is thrown HERE
print(f"→ Received: {data}")
result = data.upper()
print(f"→ Processed: {result}")
except ValueError as e:
print(f"✗ Error caught: {e}")
# Loop continues — return to while True start
# Step-by-step execution
gen = resilient_processor()
print("\n1. Start generator:")
next(gen)
# Output:
# Generator started
# → Waiting for data...
print("\n2. Send valid data:")
gen.send("hello")
# Output:
# → Received: hello
# → Processed: HELLO
# → Waiting for data...
print("\n3. Throw exception:")
gen.throw(ValueError, "Invalid data!")
# Output:
# ✗ Error caught: Invalid data!
# → Waiting for data...
print("\n4. Continue working:")
gen.send("world")
# Output:
# → Received: world
# → Processed: WORLD
# → Waiting for data...Key mechanics:
throw()throws exception at the point of lastyield- If generator catches exception (
try/except) — it continues working - If not caught — exception propagates outward
Close() — Generator Termination
Method close() terminates the generator by throwing a special GeneratorExit exception inside it. This is needed for proper resource cleanup.
def resource_handler():
print("→ Opening resource (e.g., file)")
try:
while True:
data = yield
print(f"→ Processing: {data}")
except GeneratorExit:
print("→ Received termination signal (GeneratorExit)")
raise # Important! Must re-raise
finally:
print("→ Closing resource (cleanup)")
# Usage
gen = resource_handler()
next(gen)
# → Opening resource (e.g., file)
gen.send("data1")
# → Processing: data1
gen.send("data2")
# → Processing: data2
gen.close() # Terminate
# → Received termination signal (GeneratorExit)
# → Closing resource (cleanup)
print("Generator closed")Yield from — Generator Delegation
yield from is especially useful for recursive data structures.
class Node:
def __init__(self, value, children=None):
self.value = value
self.children = children or []
# Create tree:
# 1
# / \
# 2 3
# / \
# 4 5
tree = Node(1, [
Node(2, [
Node(4),
Node(5)
]),
Node(3)
])Option 1: WITHOUT yield from (verbose)
def traverse_tree_manual(node):
# 1. Return current node value
yield node.value
# 2. Traverse children
for child in node.children:
# Problem: traverse_tree_manual(child) returns GENERATOR
# Need to manually extract all values from it
for value in traverse_tree_manual(child):
yield value # And pass each value outward
# Usage
result = list(traverse_tree_manual(tree))
print(result) # [1, 2, 4, 5, 3]Option 2: WITH yield from (concise)
def traverse_tree(node):
yield node.value
for child in node.children:
yield from traverse_tree(child) # Delegate everything to subgenerator!
result = list(traverse_tree(tree))
print(result) # [1, 2, 4, 5, 3]What yield from does:
yield from some_generator()
# Fully equivalent to:
for item in some_generator():
yield itemGenerator Expressions
Compact form for simple cases:
# Generator expression
squares = (x**2 for x in range(1000000))
# Equivalent generator function
def squares_gen():
for x in range(1000000):
yield x**2Usage in functions:
# Efficient: doesn't create intermediate list
sum_of_squares = sum(x**2 for x in range(1000))
# Inefficient: creates list, then sums
sum_of_squares = sum([x**2 for x in range(1000)])Coroutines (pre-Python 3.5)
Before async/await appeared, generators were used for asynchronous programming:
def coroutine(func):
def wrapper(*args, **kwargs):
gen = func(*args, **kwargs)
next(gen) # Prime
return gen
return wrapper
@coroutine
def grep(pattern):
print(f"Searching: {pattern}")
while True:
line = yield
if pattern in line:
print(line)
# Usage
g = grep("python")
g.send("I love python") # Will print
g.send("I love java") # Won't printInterview Questions
Iterator — an object with __iter__() and __next__() methods. Requires explicit state management.
Generator — a special case of iterator created via a function with yield. Automatically manages state.
Iterator example:
class Counter:
def __init__(self, max):
self.max = max
self.current = 0
def __iter__(self):
return self
def __next__(self):
if self.current >= self.max:
raise StopIteration
self.current += 1
return self.currentEquivalent generator:
def counter(max):
current = 0
while current < max:
current += 1
yield currentNo! Generators are single-use — after exhaustion they don't reset.
gen = (x for x in range(3))
print(list(gen)) # [0, 1, 2]
print(list(gen)) # [] — generator exhausted!Solution: create a new generator for each iteration.
def make_gen():
return (x for x in range(3))
gen1 = make_gen()
gen2 = make_gen() # Independent generator
print(list(gen1)) # [0, 1, 2]
print(list(gen2)) # [0, 1, 2]def mystery():
x = yield 1
print(f"x = {x}")
y = yield 2
print(f"y = {y}")
gen = mystery()
a = next(gen)
b = gen.send(10)
c = gen.send(20)Answer:
a = 1(first yield)- Prints:
x = 10 b = 2(second yield)- Prints:
y = 20 craisesStopIteration(generator exhausted)
Important: value from send() becomes the result of yield expression.
def gen_with_return():
yield 1
yield 2
return "Done"
gen = gen_with_return()
print(next(gen)) # 1
print(next(gen)) # 2
try:
print(next(gen))
except StopIteration as e:
print(e.value) # "Done"Important: return in a generator:
- Stops iteration (raises
StopIteration) - Passes value through
StopIteration.value - Used with
yield fromto return result
| Characteristic | return | yield |
|---|---|---|
| What it returns | Value | Generator |
| How many times | Once | Many times |
| Function state | Destroyed | Preserved |
| Memory | All results at once | One element at a time |
Error! Generator must be "primed" with next() before first send().
def gen():
x = yield
print(f"Received: {x}")
g = gen()
g.send(10) # ❌ TypeError: can't send non-None value to a just-started generatorCorrect:
g = gen()
next(g) # ✅ Prime generator (reach first yield)
g.send(10) # ✅ Now can send dataPattern: decorator for automatic priming
def coroutine(func):
def wrapper(*args, **kwargs):
gen = func(*args, **kwargs)
next(gen) # Automatic priming
return gen
return wrapper
@coroutine
def gen():
x = yield
print(f"Received: {x}")
g = gen() # Already primed!
g.send(10) # ✅ Works immediatelyPerformance and Patterns
Pattern: Chunked Processing
Problem: sometimes it's more efficient to process data in batches rather than one element at a time. For example, when inserting into a database or sending over network.
Solution: a generator that groups elements into fixed-size chunks.
from itertools import islice
def chunked(iterable, size):
"""Splits iterator into fixed-size chunks"""
iterator = iter(iterable)
while True:
# Take next size elements
chunk = list(islice(iterator, size))
if not chunk: # Iterator exhausted
break
yield chunk
# Usage example
numbers = range(10)
for chunk in chunked(numbers, 3):
print(chunk)
# Output:
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9] ← last chunk may be smallerPractical applications:
1. Batch database insert
def read_large_csv(path):
"""Reads CSV line by line"""
with open(path) as f:
for line in f:
yield parse_csv_line(line)
# ❌ Bad: one record at a time (slow!)
for record in read_large_csv('data.csv'):
db.insert(record) # 1,000,000 DB queries
# ✅ Good: batches of 1000 (fast!)
for chunk in chunked(read_large_csv('data.csv'), 1000):
db.bulk_insert(chunk) # 1000 queries instead of 1,000,000Pattern: Tee — Multiple Iterators
Problem: generator can only be iterated once. What if you need to process data in two different ways?
Solution: itertools.tee() creates multiple independent iterators from one source.
from itertools import tee
# ❌ Bad: generator exhausts after first pass
def process_data_bad(items):
stats = calculate_stats(items) # First pass
results = transform(items) # Empty! Generator exhausted
return stats, results
# ✅ Good: tee creates independent iterators
def process_data(items):
# Create 2 independent iterator copies
items1, items2 = tee(items, 2)
# Two independent passes
stats = calculate_stats(items1) # First iterator
results = transform(items2) # Second iterator
return stats, results
# Usage
data = (x**2 for x in range(1000)) # Generator
stats, results = process_data(data)Important: tee caches data in memory, so:
- Don't use for huge data streams
- Consume iterators at roughly the same speed
- If only one pass needed — don't use
tee
Pattern: Pipeline with Transformation
Idea: connect generators in a chain (pipeline) for sequential data processing. Each element goes through all stages, but without creating intermediate lists.
Advantages:
- Lazy evaluation — process one element at a time
- Memory efficient — no intermediate collections
- Composable — easy to add new processing stages
def map_gen(func, iterable):
"""Applies function to each element"""
for item in iterable:
yield func(item)
def filter_gen(predicate, iterable):
"""Keeps only elements matching condition"""
for item in iterable:
if predicate(item):
yield item
# ✅ Good: generator composition (pipeline)
numbers = range(100)
pipeline = map_gen(
lambda x: x * 2, # Stage 2: double
filter_gen(
lambda x: x % 2 == 0, # Stage 1: filter even
numbers # Data source
)
)
result = list(pipeline) # Computed lazily, one element at a timePitfalls
1. Generator Executes Lazily (Deferred Execution)
Problem: generator does NOT execute when created — only when iterated. This can lead to unexpected behavior with side effects.
def side_effect_gen():
print("Start") # Side effect 1
yield 1
print("Middle") # Side effect 2
yield 2
print("End") # Side effect 3
# ❌ Common mistake: expecting code to execute
gen = side_effect_gen() # Nothing printed!
print("Generator created")
# Output:
# Generator created ← only this!Code executes only when calling next():
gen = side_effect_gen()
print("1. Created generator")
# Nothing printed
print("2. First next()")
next(gen)
# Start ← code before first yield executed
# 1
print("3. Second next()")
next(gen)
# Middle ← code between yields executed
# 2
print("4. Third next()")
try:
next(gen)
except StopIteration:
pass
# End ← code after last yield executed2. Closures and Late Binding
Problem: when creating generators or lambdas in a loop, they may "capture" the wrong variable value.
Example 1: Generators in loop (works correctly ✅)
# Create 3 generators, each should multiply by its number
gens = [(x * 2 for _ in range(2)) for x in range(3)]
# Check result
for gen in gens:
print(list(gen))
# Output (correct!):
# [0, 0] ← first generator: x=0
# [2, 2] ← second generator: x=1
# [4, 4] ← third generator: x=2Example 2: Lambdas in loop (does NOT work! ❌)
# Create 3 functions, each should return its number
funcs = [lambda: i for i in range(3)]
# Check result
print([f() for f in funcs])
# [2, 2, 2] ← ALL return 2! Why?Why doesn't work? Lambdas capture reference to variable i, not its value. When loop finishes, i = 2, and all lambdas see this last value.
Solution: Default argument value (classic trick)
# ✅ Correct: capture value via default argument
funcs = [lambda x=i: x for i in range(3)]
print([f() for f in funcs])
# [0, 1, 2] ← correct!3. StopIteration Inside Generator (PEP 479)
Problem: starting with Python 3.7, if StopIteration is raised inside a generator, it automatically becomes RuntimeError.
Why changed? Previously StopIteration inside generator silently terminated the generator, leading to hidden bugs.
Imagine you're writing a function that should return 5 elements, but due to a bug StopIteration terminates the generator early.
def get_five_numbers():
"""Should return 5 numbers, but has a bug"""
numbers = iter([1, 2, 3]) # List of only 3 elements!
yield next(numbers) # 1
yield next(numbers) # 2
yield next(numbers) # 3
yield next(numbers) # ❌ StopIteration — list ended!
yield next(numbers) # Won't execute
# Python 3.6 and earlier:
result = list(get_five_numbers())
print(result) # [1, 2, 3] ← Silently returned only 3! Bug hidden
# Python 3.7+:
result = list(get_five_numbers()) # RuntimeError: generator raised StopIteration
# ✅ Error is explicit, bug immediately visible!Solution: Handle StopIteration explicitly
def correct_gen():
iterator = iter([1, 2])
while True:
try:
value = next(iterator)
yield value
except StopIteration:
break # ✅ Explicit generator termination
gen = correct_gen()
print(list(gen)) # [1, 2] — works correctlySummary: When to Use Generators
✅ Use generators when:
- Processing large volumes of data
- Need lazy evaluation
- Working with data streams (files, APIs, databases)
- Creating infinite sequences
- Need processing pipeline
❌ Don't use generators when:
- Need random access to elements (use lists)
- Multiple iterations required (cache to list)
- Need sequence length upfront
- Logic too simple (list comprehension more readable)
Practical Exercise
Try implementing a generator for traversing nested structure of any depth:
def flatten(nested):
"""
Recursively flattens nested iterable structures
>>> list(flatten([1, [2, 3], [[4], 5]]))
[1, 2, 3, 4, 5]
"""
# Your implementation here
passSolution
def flatten(nested):
for item in nested:
if isinstance(item, (list, tuple)):
yield from flatten(item)
else:
yield itemGenerator Method Reference
| Method | What it does | What it throws | Returns |
|---|---|---|---|
next(gen) | Starts/continues execution | — | Value from yield |
gen.send(value) | Sends value to generator | value → result of yield | Next value from yield |
gen.throw(exc) | Throws exception to generator | Exception at yield point | Next value (if exception handled) |
gen.close() | Terminates generator | GeneratorExit at yield point | None |
Understanding generators is a must-have for any Python developer. This isn't just a syntactic feature, but a fundamental pattern for efficient data handling. In interviews, generator questions often separate juniors from mids, and mids from seniors.
Practice, experiment with send(), throw(), yield from — and this topic will become your advantage in interviews.