Skip to main content

Python Generators and yield: Deep Dive

Константин Потапов
15 min

Complete guide to Python generators: from basic concepts to advanced patterns. Understanding yield mechanics, iterator protocol, coroutines, and practical interview cases.

Generators are one of the most powerful and underestimated features in Python. In interviews, people often ask about yield, but few can explain what happens under the hood. Let's dive deep.

What is a Generator?

A generator is a function that returns an iterator. Sounds simple, but behind this lies powerful machinery for lazy evaluation and state management.

def simple_generator():
    yield 1
    yield 2
    yield 3
 
gen = simple_generator()
print(next(gen))  # 1
print(next(gen))  # 2
print(next(gen))  # 3
print(next(gen))  # StopIteration

Key difference from regular functions: a generator doesn't execute immediately. It returns a generator object that can be iterated.

How yield Works

When Python encounters yield, magic happens:

  1. Execution pause — the function "freezes" in its current state
  2. Value return — the value from yield is returned to the calling code
  3. Context preservation — all local variables and code position are saved
  4. Resumption — on the next next() call, the function continues from where it stopped
def counter(start=0):
    n = start
    while True:
        print(f"Before yield: n = {n}")
        yield n
        print(f"After yield: n = {n}")
        n += 1
 
gen = counter(10)
print(next(gen))  # Before yield: n = 10
                  # 10
print(next(gen))  # After yield: n = 10
                  # Before yield: n = 11
                  # 11

Important: code after yield executes only on the next next() call.

Iterator Protocol

Generators automatically implement the iterator protocol. To understand how this works, let's compare a manual iterator implementation with a generator.

Manual Iterator

class Countdown:
    def __init__(self, start):
        self.current = start
 
    def __iter__(self):
        return self
 
    def __next__(self):
        if self.current <= 0:
            raise StopIteration
 
        # Save current value before decrementing
        value = self.current
        self.current -= 1  # Decrement for next iteration
        return value       # Return saved value
 
# Usage
for num in Countdown(5):
    print(num)  # 5, 4, 3, 2, 1

Equivalent Generator

def countdown(start):
    while start > 0:
        yield start
        start -= 1
 
# Usage (identical)
for num in countdown(5):
    print(num)  # 5, 4, 3, 2, 1

Conclusion: a generator is syntactic sugar for creating iterators, but with less code and automatic state management.

Generator Advantages

1. Memory Efficiency (Lazy Evaluation)

Classic interview example:

# Bad: creates a list of billion elements in memory
def get_numbers_list(n):
    return [i for i in range(n)]
 
numbers = get_numbers_list(1_000_000_000)  # MemoryError!
 
# Good: generator creates elements on demand
def get_numbers_gen(n):
    for i in range(n):
        yield i
 
numbers = get_numbers_gen(1_000_000_000)  # Instant!

Memory measurement:

import sys
 
# List
lst = [i for i in range(1_000_000)]
print(sys.getsizeof(lst))  # ~8 MB
 
# Generator
gen = (i for i in range(1_000_000))
print(sys.getsizeof(gen))  # ~120 bytes

2. Infinite Sequences

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b
 
# Can take as many as needed
from itertools import islice
first_10 = list(islice(fibonacci(), 10))
print(first_10)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

3. Data Processing Pipeline

def read_large_file(path):
    with open(path) as f:
        for line in f:
            yield line.strip()
 
def filter_comments(lines):
    for line in lines:
        if not line.startswith('#'):
            yield line
 
def parse_data(lines):
    for line in lines:
        yield line.split(',')
 
# Generator composition
pipeline = parse_data(filter_comments(read_large_file('data.csv')))
 
# Process one line at a time, without loading entire file
for row in pipeline:
    process(row)

Advanced Techniques

Send() — Two-way Communication

Generators can not only return values but also receive them:

def moving_average():
    total = 0
    count = 0
    average = None
 
    while True:
        value = yield average
        total += value
        count += 1
        average = total / count
 
avg = moving_average()
next(avg)  # Prime the generator
print(avg.send(10))  # 10.0
print(avg.send(20))  # 15.0
print(avg.send(30))  # 20.0

How send() works:

  1. Sends value to the generator
  2. Value becomes the result of the yield expression
  3. Generator continues execution until next yield
  4. Returns new value from yield

Throw() — Exception Handling

Method throw() allows you to throw an exception into the generator at the point where it stopped at yield.

def resilient_processor():
    print("Generator started")
    while True:
        try:
            print("→ Waiting for data...")
            data = yield  # ← Exception is thrown HERE
            print(f"→ Received: {data}")
            result = data.upper()
            print(f"→ Processed: {result}")
        except ValueError as e:
            print(f"✗ Error caught: {e}")
            # Loop continues — return to while True start
 
# Step-by-step execution
gen = resilient_processor()
 
print("\n1. Start generator:")
next(gen)
# Output:
# Generator started
# → Waiting for data...
 
print("\n2. Send valid data:")
gen.send("hello")
# Output:
# → Received: hello
# → Processed: HELLO
# → Waiting for data...
 
print("\n3. Throw exception:")
gen.throw(ValueError, "Invalid data!")
# Output:
# ✗ Error caught: Invalid data!
# → Waiting for data...
 
print("\n4. Continue working:")
gen.send("world")
# Output:
# → Received: world
# → Processed: WORLD
# → Waiting for data...

Key mechanics:

  1. throw() throws exception at the point of last yield
  2. If generator catches exception (try/except) — it continues working
  3. If not caught — exception propagates outward

Close() — Generator Termination

Method close() terminates the generator by throwing a special GeneratorExit exception inside it. This is needed for proper resource cleanup.

def resource_handler():
    print("→ Opening resource (e.g., file)")
    try:
        while True:
            data = yield
            print(f"→ Processing: {data}")
    except GeneratorExit:
        print("→ Received termination signal (GeneratorExit)")
        raise  # Important! Must re-raise
    finally:
        print("→ Closing resource (cleanup)")
 
# Usage
gen = resource_handler()
next(gen)
# → Opening resource (e.g., file)
 
gen.send("data1")
# → Processing: data1
 
gen.send("data2")
# → Processing: data2
 
gen.close()  # Terminate
# → Received termination signal (GeneratorExit)
# → Closing resource (cleanup)
 
print("Generator closed")

Yield from — Generator Delegation

yield from is especially useful for recursive data structures.

class Node:
    def __init__(self, value, children=None):
        self.value = value
        self.children = children or []
 
# Create tree:
#       1
#      / \
#     2   3
#    / \
#   4   5
 
tree = Node(1, [
    Node(2, [
        Node(4),
        Node(5)
    ]),
    Node(3)
])

Option 1: WITHOUT yield from (verbose)

def traverse_tree_manual(node):
    # 1. Return current node value
    yield node.value
 
    # 2. Traverse children
    for child in node.children:
        # Problem: traverse_tree_manual(child) returns GENERATOR
        # Need to manually extract all values from it
        for value in traverse_tree_manual(child):
            yield value  # And pass each value outward
 
# Usage
result = list(traverse_tree_manual(tree))
print(result)  # [1, 2, 4, 5, 3]

Option 2: WITH yield from (concise)

def traverse_tree(node):
    yield node.value
    for child in node.children:
        yield from traverse_tree(child)  # Delegate everything to subgenerator!
 
result = list(traverse_tree(tree))
print(result)  # [1, 2, 4, 5, 3]

What yield from does:

yield from some_generator()
 
# Fully equivalent to:
for item in some_generator():
    yield item

Generator Expressions

Compact form for simple cases:

# Generator expression
squares = (x**2 for x in range(1000000))
 
# Equivalent generator function
def squares_gen():
    for x in range(1000000):
        yield x**2

Usage in functions:

# Efficient: doesn't create intermediate list
sum_of_squares = sum(x**2 for x in range(1000))
 
# Inefficient: creates list, then sums
sum_of_squares = sum([x**2 for x in range(1000)])

Coroutines (pre-Python 3.5)

Before async/await appeared, generators were used for asynchronous programming:

def coroutine(func):
    def wrapper(*args, **kwargs):
        gen = func(*args, **kwargs)
        next(gen)  # Prime
        return gen
    return wrapper
 
@coroutine
def grep(pattern):
    print(f"Searching: {pattern}")
    while True:
        line = yield
        if pattern in line:
            print(line)
 
# Usage
g = grep("python")
g.send("I love python")  # Will print
g.send("I love java")    # Won't print

Interview Questions

Iterator — an object with __iter__() and __next__() methods. Requires explicit state management.

Generator — a special case of iterator created via a function with yield. Automatically manages state.

Iterator example:

class Counter:
  def __init__(self, max):
      self.max = max
      self.current = 0

  def __iter__(self):
      return self

  def __next__(self):
      if self.current >= self.max:
          raise StopIteration
      self.current += 1
      return self.current

Equivalent generator:

def counter(max):
  current = 0
  while current < max:
      current += 1
      yield current

No! Generators are single-use — after exhaustion they don't reset.

gen = (x for x in range(3))
print(list(gen))  # [0, 1, 2]
print(list(gen))  # [] — generator exhausted!

Solution: create a new generator for each iteration.

def make_gen():
return (x for x in range(3))

gen1 = make_gen()
gen2 = make_gen() # Independent generator

print(list(gen1)) # [0, 1, 2]
print(list(gen2)) # [0, 1, 2]
def mystery():
x = yield 1
print(f"x = {x}")
y = yield 2
print(f"y = {y}")

gen = mystery()
a = next(gen)
b = gen.send(10)
c = gen.send(20)

Answer:

  • a = 1 (first yield)
  • Prints: x = 10
  • b = 2 (second yield)
  • Prints: y = 20
  • c raises StopIteration (generator exhausted)

Important: value from send() becomes the result of yield expression.

def gen_with_return():
yield 1
yield 2
return "Done"

gen = gen_with_return()
print(next(gen)) # 1
print(next(gen)) # 2
try:
print(next(gen))
except StopIteration as e:
print(e.value) # "Done"

Important: return in a generator:

  • Stops iteration (raises StopIteration)
  • Passes value through StopIteration.value
  • Used with yield from to return result
Characteristicreturnyield
What it returnsValueGenerator
How many timesOnceMany times
Function stateDestroyedPreserved
MemoryAll results at onceOne element at a time

Error! Generator must be "primed" with next() before first send().

def gen():
x = yield
print(f"Received: {x}")

g = gen()
g.send(10) # ❌ TypeError: can't send non-None value to a just-started generator

Correct:

g = gen()
next(g) # ✅ Prime generator (reach first yield)
g.send(10) # ✅ Now can send data

Pattern: decorator for automatic priming

def coroutine(func):
def wrapper(*args, **kwargs):
gen = func(*args, **kwargs)
next(gen) # Automatic priming
return gen
return wrapper

@coroutine
def gen():
x = yield
print(f"Received: {x}")

g = gen() # Already primed!
g.send(10) # ✅ Works immediately

Performance and Patterns

Pattern: Chunked Processing

Problem: sometimes it's more efficient to process data in batches rather than one element at a time. For example, when inserting into a database or sending over network.

Solution: a generator that groups elements into fixed-size chunks.

from itertools import islice
 
def chunked(iterable, size):
    """Splits iterator into fixed-size chunks"""
    iterator = iter(iterable)
    while True:
        # Take next size elements
        chunk = list(islice(iterator, size))
        if not chunk:  # Iterator exhausted
            break
        yield chunk
 
# Usage example
numbers = range(10)
for chunk in chunked(numbers, 3):
    print(chunk)
 
# Output:
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9]  ← last chunk may be smaller

Practical applications:

1. Batch database insert

def read_large_csv(path):
    """Reads CSV line by line"""
    with open(path) as f:
        for line in f:
            yield parse_csv_line(line)
 
# ❌ Bad: one record at a time (slow!)
for record in read_large_csv('data.csv'):
    db.insert(record)  # 1,000,000 DB queries
 
# ✅ Good: batches of 1000 (fast!)
for chunk in chunked(read_large_csv('data.csv'), 1000):
    db.bulk_insert(chunk)  # 1000 queries instead of 1,000,000

Pattern: Tee — Multiple Iterators

Problem: generator can only be iterated once. What if you need to process data in two different ways?

Solution: itertools.tee() creates multiple independent iterators from one source.

from itertools import tee
 
# ❌ Bad: generator exhausts after first pass
def process_data_bad(items):
    stats = calculate_stats(items)  # First pass
    results = transform(items)       # Empty! Generator exhausted
    return stats, results
 
# ✅ Good: tee creates independent iterators
def process_data(items):
    # Create 2 independent iterator copies
    items1, items2 = tee(items, 2)
 
    # Two independent passes
    stats = calculate_stats(items1)   # First iterator
    results = transform(items2)        # Second iterator
 
    return stats, results
 
# Usage
data = (x**2 for x in range(1000))  # Generator
stats, results = process_data(data)

Important: tee caches data in memory, so:

  • Don't use for huge data streams
  • Consume iterators at roughly the same speed
  • If only one pass needed — don't use tee

Pattern: Pipeline with Transformation

Idea: connect generators in a chain (pipeline) for sequential data processing. Each element goes through all stages, but without creating intermediate lists.

Advantages:

  • Lazy evaluation — process one element at a time
  • Memory efficient — no intermediate collections
  • Composable — easy to add new processing stages
def map_gen(func, iterable):
    """Applies function to each element"""
    for item in iterable:
        yield func(item)
 
def filter_gen(predicate, iterable):
    """Keeps only elements matching condition"""
    for item in iterable:
        if predicate(item):
            yield item
 
# ✅ Good: generator composition (pipeline)
numbers = range(100)
pipeline = map_gen(
    lambda x: x * 2,           # Stage 2: double
    filter_gen(
        lambda x: x % 2 == 0,  # Stage 1: filter even
        numbers                 # Data source
    )
)
 
result = list(pipeline)  # Computed lazily, one element at a time

Pitfalls

1. Generator Executes Lazily (Deferred Execution)

Problem: generator does NOT execute when created — only when iterated. This can lead to unexpected behavior with side effects.

def side_effect_gen():
    print("Start")  # Side effect 1
    yield 1
    print("Middle")  # Side effect 2
    yield 2
    print("End")  # Side effect 3
 
# ❌ Common mistake: expecting code to execute
gen = side_effect_gen()  # Nothing printed!
print("Generator created")
 
# Output:
# Generator created  ← only this!

Code executes only when calling next():

gen = side_effect_gen()
 
print("1. Created generator")
# Nothing printed
 
print("2. First next()")
next(gen)
# Start  ← code before first yield executed
# 1
 
print("3. Second next()")
next(gen)
# Middle  ← code between yields executed
# 2
 
print("4. Third next()")
try:
    next(gen)
except StopIteration:
    pass
# End  ← code after last yield executed

2. Closures and Late Binding

Problem: when creating generators or lambdas in a loop, they may "capture" the wrong variable value.

Example 1: Generators in loop (works correctly ✅)

# Create 3 generators, each should multiply by its number
gens = [(x * 2 for _ in range(2)) for x in range(3)]
 
# Check result
for gen in gens:
    print(list(gen))
 
# Output (correct!):
# [0, 0]  ← first generator: x=0
# [2, 2]  ← second generator: x=1
# [4, 4]  ← third generator: x=2

Example 2: Lambdas in loop (does NOT work! ❌)

# Create 3 functions, each should return its number
funcs = [lambda: i for i in range(3)]
 
# Check result
print([f() for f in funcs])
# [2, 2, 2]  ← ALL return 2! Why?

Why doesn't work? Lambdas capture reference to variable i, not its value. When loop finishes, i = 2, and all lambdas see this last value.

Solution: Default argument value (classic trick)

# ✅ Correct: capture value via default argument
funcs = [lambda x=i: x for i in range(3)]
 
print([f() for f in funcs])
# [0, 1, 2]  ← correct!

3. StopIteration Inside Generator (PEP 479)

Problem: starting with Python 3.7, if StopIteration is raised inside a generator, it automatically becomes RuntimeError.

Why changed? Previously StopIteration inside generator silently terminated the generator, leading to hidden bugs.

Imagine you're writing a function that should return 5 elements, but due to a bug StopIteration terminates the generator early.

def get_five_numbers():
    """Should return 5 numbers, but has a bug"""
    numbers = iter([1, 2, 3])  # List of only 3 elements!
 
    yield next(numbers)  # 1
    yield next(numbers)  # 2
    yield next(numbers)  # 3
    yield next(numbers)  # ❌ StopIteration — list ended!
    yield next(numbers)  # Won't execute
 
# Python 3.6 and earlier:
result = list(get_five_numbers())
print(result)  # [1, 2, 3] ← Silently returned only 3! Bug hidden
 
# Python 3.7+:
result = list(get_five_numbers())  # RuntimeError: generator raised StopIteration
# ✅ Error is explicit, bug immediately visible!

Solution: Handle StopIteration explicitly

def correct_gen():
    iterator = iter([1, 2])
    while True:
        try:
            value = next(iterator)
            yield value
        except StopIteration:
            break  # ✅ Explicit generator termination
 
gen = correct_gen()
print(list(gen))  # [1, 2] — works correctly

Summary: When to Use Generators

Use generators when:

  • Processing large volumes of data
  • Need lazy evaluation
  • Working with data streams (files, APIs, databases)
  • Creating infinite sequences
  • Need processing pipeline

Don't use generators when:

  • Need random access to elements (use lists)
  • Multiple iterations required (cache to list)
  • Need sequence length upfront
  • Logic too simple (list comprehension more readable)

Practical Exercise

Try implementing a generator for traversing nested structure of any depth:

def flatten(nested):
    """
    Recursively flattens nested iterable structures
 
    >>> list(flatten([1, [2, 3], [[4], 5]]))
    [1, 2, 3, 4, 5]
    """
    # Your implementation here
    pass
Solution
def flatten(nested):
    for item in nested:
        if isinstance(item, (list, tuple)):
            yield from flatten(item)
        else:
            yield item

Generator Method Reference

MethodWhat it doesWhat it throwsReturns
next(gen)Starts/continues executionValue from yield
gen.send(value)Sends value to generatorvalue → result of yieldNext value from yield
gen.throw(exc)Throws exception to generatorException at yield pointNext value (if exception handled)
gen.close()Terminates generatorGeneratorExit at yield pointNone

Understanding generators is a must-have for any Python developer. This isn't just a syntactic feature, but a fundamental pattern for efficient data handling. In interviews, generator questions often separate juniors from mids, and mids from seniors.

Practice, experiment with send(), throw(), yield from — and this topic will become your advantage in interviews.