Monday, 10:37 AM. Andrew (mid-level) opens a Pull Request. Payment module refactoring, 847 lines of code, 3 days of work.
11:42 AM. First review from David (senior): "LGTM 👍"
Review time: 30 seconds. Yes, thirty seconds for 847 lines.
12:15 PM. Second review from me (tech lead): "Approved ✅"
Review time: 2 minutes. I scrolled through the files, checked that tests were green, and clicked the button.
2:00 PM. Andrew merges the PR to master.
2:47 PM. Production is down. Payments aren't working. Customers can't pay for orders.
3:30 PM. We find the bug. Andrew accidentally changed the error handling logic from the payment provider during refactoring. Previously, the system retried requests 3 times. Now it cancels the transaction immediately.
Result: $127,000 in lost revenue over 4 hours (40% of transactions failed instead of succeeding), 6 hours of team work on a hotfix, 230 support tickets from angry customers.
Question: Why did two experienced developers miss a critical bug in review?
Answer: Because our Code Review was theater. A formality. A checkbox.
This story happened 4 years ago. Since then, I've completely rethought my approach to Code Review. Here's what I learned.
Why Code Review Turns Into Theater
Symptoms of a Broken Process
Check yourself. If at least 3 points describe your team — you have a problem:
❌ "LGTM" syndrome:
- Review takes less than a minute for 200+ lines of code
- Comments are superficial or non-existent
- Approval without actually reading the code
❌ Nitpicking war:
- 90% of comments are about formatting and naming
- Half-hour debates about
camelCasevssnake_case - Architectural problems are ignored
❌ Review blocker:
- PR sits without review for 3+ days
- Developer is blocked, switches to another task
- Context switches kill productivity
❌ Ego battle:
- Comments sound like attacks ("this is terrible", "why did you do this?")
- PR author perceives review as personal criticism
- Defensive reactions instead of constructive dialogue
❌ No metrics:
- You don't know how long average review takes
- You don't know how many bugs are caught in review
- You don't know how many PRs sit for more than 24 hours
SmartBear Statistics (2023): 60% of teams conduct Code Review formally. Main reasons: no time (43%), unclear criteria (38%), lack of feedback culture (29%). Result: 70% of serious bugs pass through review unnoticed.
Root Causes of the Problem
1. CR is perceived as a barrier, not a tool
Developer thinks: "I need to pass review to merge code."
Should think: "I need to get feedback to improve the solution."
2. No clear criteria
What to check? How deep? When to approve, when to request changes?
Without answers to these questions, everyone reviews differently.
3. Wrong incentives
Team metrics: velocity, closed tasks, time to production.
CR metrics: none.
Result: Review is postponed, done superficially, perceived as wasted time.
4. Lack of psychological safety
Developer fears criticism → makes minimal PRs → doesn't experiment → doesn't grow.
Reviewer fears offending → writes "LGTM" → bugs reach production.
The Checklist That Works
I've tried dozens of checklists. Only this one works — tested on 200+ projects.
Review Structure: What to Check and In What Order
Level 1: Automation (30 seconds)
This should be automated, not checked by humans.
- Tests pass (CI is green)
- Linter has no errors
- Code style formatting
- Security scanners (SAST, dependency check)
- Coverage didn't drop (or there's an explanation)
Rule: If a reviewer spends time checking formatting or running tests manually — the process is broken. This is CI's job, not a human's.
Tools:
# .github/workflows/pull-request.yml
name: PR Checks
on: pull_request
jobs:
checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run linter
run: npm run lint
- name: Run tests
run: npm test
- name: Check coverage
run: npm run test:coverage
# Fail if coverage < 80%
- name: Security scan
uses: snyk/actions/node@master
- name: Check formatting
run: npm run format:checkResult: If CI is red — review doesn't even start.
Level 2: Functionality (5-10 minutes)
Check that code does what it should.
- Requirements are met (check task/PR description)
- Edge cases are handled (what if
null, empty array, 0, very large number?) - Errors are handled (no potential exceptions/crashes)
- Logic is clear and correct
Example comment:
// ❌ Bad
function calculateDiscount(user, total) {
if (user.isVIP) {
return total * 0.2;
}
return total * 0.1;
}
// ✅ Review comment:
// What happens if total === 0 or user === null?
// Need input validation:
function calculateDiscount(user, total) {
if (!user || total <= 0) {
throw new Error("Invalid input");
}
const discountRate = user.isVIP ? 0.2 : 0.1;
return total * discountRate;
}Level 3: Architecture and Design (10-15 minutes)
Check that solution fits the system.
- Consistency with existing architecture
- No code duplication (DRY principle)
- Separation of concerns (Single Responsibility)
- Abstractions at the right level (not over-engineered, but not copy-paste)
- Dependency direction is correct (no circular dependencies)
Example comment:
# ❌ Bad (validation logic duplication)
class UserService:
def create_user(self, data):
if not data.get('email'):
raise ValueError('Email required')
if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', data['email']):
raise ValueError('Invalid email')
# ...
def update_user(self, user_id, data):
if not data.get('email'):
raise ValueError('Email required')
if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', data['email']):
raise ValueError('Invalid email')
# ...
# ✅ Review comment:
# Email validation logic is duplicated.
# Suggest extracting to separate validator:
class EmailValidator:
@staticmethod
def validate(email):
if not email:
raise ValueError('Email required')
if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email):
raise ValueError('Invalid email')
return email.lower()
class UserService:
def create_user(self, data):
email = EmailValidator.validate(data.get('email'))
# ...Level 4: Performance and Scalability (5-10 minutes)
Check that code won't become a bottleneck.
- No N+1 queries (especially in DB)
- Complex operations not in loops
- Indexes on needed DB fields (if there are migrations)
- No memory leaks (especially for long-running processes)
- Pagination for large lists
Example comment:
# ❌ Bad (N+1 query problem)
def get_users_with_orders():
users = User.objects.all()
result = []
for user in users:
# Each iteration = new DB query
orders = Order.objects.filter(user_id=user.id).count()
result.append({
'user': user,
'orders_count': orders
})
return result
# ✅ Review comment:
# Classic N+1 problem. For 1000 users = 1001 DB queries.
# Use annotate:
def get_users_with_orders():
return User.objects.annotate(
orders_count=Count('orders')
).all()
# Now: 1 query instead of 1001Level 5: Security (5 minutes)
Check for vulnerabilities.
- No SQL injection (use parameterized queries)
- No XSS (escape user input)
- No hardcoded secrets (API keys, passwords)
- Authorization is checked (not just authentication)
- Rate limiting (if it's an API endpoint)
- Input validation (don't trust user input)
Example comment:
# ❌ Bad (SQL injection)
def get_user_by_email(email):
query = f"SELECT * FROM users WHERE email = '{email}'"
return db.execute(query)
# Attack: email = "' OR '1'='1"
# Query becomes: SELECT * FROM users WHERE email = '' OR '1'='1'
# Result: returns ALL users
# ✅ Review comment:
# SQL injection vulnerability. Use parameterized queries:
def get_user_by_email(email):
query = "SELECT * FROM users WHERE email = ?"
return db.execute(query, [email])Level 6: Readability and Maintainability (5 minutes)
Check that code will be understood in 6 months.
- Naming is clear (functions, variables, classes)
- Comments where logic is not obvious (but not where code is self-explanatory)
- Functions are short (< 50 lines as a rule)
- Cyclomatic complexity is low (< 10 per function)
- No magic numbers (constants are extracted)
Example comment:
// ❌ Bad
function p(u, a) {
if (u.t === 1 && a > 100) {
return a * 0.15;
} else if (u.t === 2) {
return a * 0.25;
}
return 0;
}
// ✅ Review comment:
// Unclear what function does. Suggest renaming:
enum UserTier {
BASIC = 1,
PREMIUM = 2,
}
const BASIC_DISCOUNT_THRESHOLD = 100;
const BASIC_DISCOUNT_RATE = 0.15;
const PREMIUM_DISCOUNT_RATE = 0.25;
function calculateDiscount(user: User, amount: number): number {
if (user.tier === UserTier.BASIC && amount > BASIC_DISCOUNT_THRESHOLD) {
return amount * BASIC_DISCOUNT_RATE;
}
if (user.tier === UserTier.PREMIUM) {
return amount * PREMIUM_DISCOUNT_RATE;
}
return 0;
}Level 7: Tests (5-10 minutes)
Check that code is tested properly.
- Tests exist for new functionality
- Happy path is covered
- Edge cases are covered (null, empty, boundary values)
- Error cases are covered (what if everything goes wrong)
- Tests are clear and isolated (don't depend on each other)
- No flaky tests
Example comment:
# ❌ Bad (only happy path)
def test_calculate_discount():
user = User(is_vip=True)
discount = calculate_discount(user, 100)
assert discount == 20
# ✅ Review comment:
# Need tests for edge cases:
def test_calculate_discount_happy_path():
user = User(is_vip=True)
assert calculate_discount(user, 100) == 20
def test_calculate_discount_zero_amount():
user = User(is_vip=True)
with pytest.raises(ValueError):
calculate_discount(user, 0)
def test_calculate_discount_null_user():
with pytest.raises(ValueError):
calculate_discount(None, 100)
def test_calculate_discount_non_vip():
user = User(is_vip=False)
assert calculate_discount(user, 100) == 10PR Template Checklist
Save to .github/PULL_REQUEST_TEMPLATE.md:
## Description of Changes
<!-- What does this PR do? Why is it needed? -->
## Related Issues
<!-- Closes #123 -->
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Refactoring
- [ ] Documentation
## Author's Checklist
- [ ] Code follows project code style
- [ ] Self-review conducted (read my code as a reviewer)
- [ ] Comments added in complex places
- [ ] Documentation updated (if needed)
- [ ] Tests added/updated
- [ ] All tests pass locally
- [ ] No breaking changes (or they are documented)
## Reviewer's Checklist
- [ ] Functionality works correctly
- [ ] Edge cases are handled
- [ ] No architectural issues
- [ ] No performance problems
- [ ] No security vulnerabilities
- [ ] Code is readable and clear
- [ ] Tests are adequate
## Screenshots (if UI)
<!-- Add before/after screenshots -->
## Questions for Reviewers
<!-- Uncertain about a decision? Ask here -->How to Give Feedback That Doesn't Demotivate
Main Principle: Criticize Code, Not People
❌ Bad:
"You wrote terrible code."
"How did you even think this would work?"
"This is obviously wrong."
✅ Good:
"This approach might lead to performance issues. Suggest considering an alternative."
"The logic here isn't obvious. Could you add a comment?"
"There's a risk of race condition. What do you think about adding locking?"
"Sandwich" Technique for Difficult Feedback
Structure: positive → criticism → positive + suggestion
Example:
## Review comment:
Great that you added error handling — before this was a naked request without checks. 👍
One risk zone: currently retry logic makes 10 attempts without delay.
This can create load on the API during failures.
Suggest adding exponential backoff:
- 1st attempt: immediate
- 2nd attempt: after 1 sec
- 3rd attempt: after 2 sec
- etc.
Here's an implementation example: [link to lib or code]
What do you think?Why it works:
- Acknowledges the good (person doesn't feel everything is bad)
- Focuses on the problem, not the person
- Concrete suggestion (not just "it's bad")
- Question at the end (dialogue, not orders)
Comment Classification by Importance
Use prefixes so author understands priority:
[CRITICAL] — blocker, must fix
[CRITICAL] SQL injection vulnerability on line 47.
Use parameterized queries instead of f-string.[MAJOR] — serious issue, should fix
[MAJOR] N+1 query problem. For 10k users this will be 10k DB queries.
Suggest using select_related().[MINOR] — small improvement, can ignore
[MINOR] Variable `data` is too generic.
Maybe `userData` or `profileData`?[NITPICK] — nitpick, at your discretion
[NITPICK] I personally prefer guard clauses instead of nested if,
but it's a matter of taste.[QUESTION] — question for understanding
[QUESTION] Why is setTimeout used here instead of Promise?
Want to understand the logic.[PRAISE] — positive feedback
[PRAISE] Excellent solution with memoization!
Wouldn't have thought of it myself.Google Research (2019): Teams that differentiate comment criticality merge PRs 40% faster and have 60% fewer review conflicts.
Examples of Constructive Comments
1. Performance Issue
❌ Bad:
"This will be slow."
✅ Good:
[MAJOR] Potential performance issue:
Loop over all users (lines 23-27) will be O(n\*m),
where n = users, m = orders.
For 10k users with 100 orders each = 1M iterations.
Suggest:
1. Use JOIN instead of nested loops
2. Or load only needed users with filter
Example:
```python
# Instead of
for user in all_users:
for order in user.orders:
process(order)
# Do
orders = Order.objects.filter(
user__is_active=True
).select_related('user')
for order in orders:
process(order)
```What do you think?
#### 2. Architectural Comment
**❌ Bad:**
> "This violates SOLID principles."
**✅ Good:**
```markdown
[MAJOR] Architectural concern:
The `UserService` class is now responsible for:
- Creating users
- Sending emails
- Validation
- Database work
This violates Single Responsibility Principle.
Suggest splitting:
- `UserRepository` — database work
- `EmailService` — email sending
- `UserValidator` — validation
Then `UserService` would only orchestrate these services.
Can help with refactoring if you want.
3. Question for Understanding
❌ Bad:
"Why did you do this?"
✅ Good:
[QUESTION] Want to understand the logic:
Why use `setTimeout(..., 0)` instead of direct call?
Maybe I'm missing something, but it seems a direct call would be simpler.
Is there an edge case I'm not seeing?4. Positive Feedback
❌ Bad:
"LGTM"
✅ Good:
[PRAISE] Excellent work! Especially liked:
1. Added comprehensive tests (even edge cases)
2. Error handling refactoring is much cleaner
3. Comments in complex places — very helpful
Small nit: could simplify line 47 with destructuring,
but that's details.
Approved ✅How to React to Feedback (for PR Authors)
Rules for PR authors:
-
Don't defend reflexively
First reaction to criticism is defense. Suppress it. Read the comment a second time.
-
Ask clarifying questions
Thanks for the comment! To clarify: - Do you mean extract all validation logic or just email? - Is there an example of the pattern you suggest? -
Agree or argue
✅ Agreement:
Great idea, didn't think of that. Will fix now.✅ Argument:
I understand your point. I did it this way because: 1. [reason 1] 2. [reason 2] Does this change the situation? -
Thank for feedback
Thanks for the review! Valuable feedback, didn't know about this pattern.
Automating Routine: Pre-commit Hooks and CI
Pre-commit Hooks: First Line of Defense
Goal: Catch problems before code gets into PR.
Installing Pre-commit
# Installation
npm install --save-dev husky lint-staged
# Initialization
npx husky initConfiguring .husky/pre-commit
#!/bin/sh
. "$(dirname "$0")/_/husky.sh"
npx lint-stagedConfiguring lint-staged.config.js
module.exports = {
// TypeScript/JavaScript files
"*.{ts,tsx,js,jsx}": [
"eslint --fix", // Auto-fix issues
"prettier --write", // Formatting
"vitest related --run", // Test related files
],
// Python files
"*.py": [
"black", // Formatting
"isort", // Sort imports
"flake8", // Linting
"mypy", // Type checking
"pytest --testmon", // Tests
],
// CSS/SCSS
"*.{css,scss}": ["stylelint --fix", "prettier --write"],
// Markdown
"*.md": ["markdownlint --fix", "prettier --write"],
// JSON/YAML
"*.{json,yaml,yml}": ["prettier --write"],
};Result: Your code is always clean, formatted, and tested before commit.
CI/CD: Automating Checks in Pull Requests
GitHub Actions Example
# .github/workflows/pull-request.yml
name: Pull Request Checks
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
# Quick checks (should pass in < 2 min)
quick-checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Lint
run: npm run lint
- name: Type check
run: npm run type-check
- name: Format check
run: npm run format:check
# Tests (can take longer)
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Needed for coverage diff
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm run test:coverage
- name: Coverage report
uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage-final.json
fail_ci_if_error: true
- name: Check coverage threshold
run: |
COVERAGE=$(cat coverage/coverage-summary.json | jq '.total.lines.pct')
if (( $(echo "$COVERAGE < 80" | bc -l) )); then
echo "Coverage is below 80%: $COVERAGE%"
exit 1
fi
# Security checks
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Snyk security scan
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high
- name: Check for secrets
uses: trufflesecurity/trufflehog@main
with:
path: ./
# Bundle size check
bundle-size:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Install dependencies
run: npm ci
- name: Build
run: npm run build
- name: Check bundle size
uses: andresz1/size-limit-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
# Complexity analysis
complexity:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Install dependencies
run: npm ci
- name: Analyze complexity
run: |
npx eslint . \
--format json \
--output-file eslint-report.json \
|| true
# Fail if there are functions with complexity > 15
COMPLEX_FUNCS=$(cat eslint-report.json | \
jq '[.[] | .messages[] | select(.ruleId == "complexity")] | length')
if [ "$COMPLEX_FUNCS" -gt 0 ]; then
echo "Found $COMPLEX_FUNCS functions with complexity > 15"
exit 1
fi
# Auto-comments in PR
pr-comments:
runs-on: ubuntu-latest
needs: [quick-checks, tests, security]
if: always()
steps:
- name: Comment PR
uses: actions/github-script@v6
with:
script: |
const status = '${{ needs.tests.result }}' === 'success' ? '✅' : '❌';
const coverage = '${{ needs.tests.outputs.coverage }}';
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Code Review Checks ${status}\n\n- Coverage: ${coverage}%\n- All checks passed!`
});Automatic Reviewers (AI Assistants)
1. Danger JS — Automatic Comments
// dangerfile.ts
import { danger, warn, fail, message } from "danger";
// PR must have description
if (!danger.github.pr.body || danger.github.pr.body.length < 10) {
fail("Please add a description to the PR");
}
// PR shouldn't be too large
const bigPRThreshold = 500;
if (danger.github.pr.additions + danger.github.pr.deletions > bigPRThreshold) {
warn(
`Large PR (${danger.github.pr.additions + danger.github.pr.deletions} lines). Consider splitting into multiple PRs.`
);
}
// Check that package.json changes are accompanied by package-lock.json changes
const packageChanged = danger.git.modified_files.includes("package.json");
const lockfileChanged = danger.git.modified_files.includes("package-lock.json");
if (packageChanged && !lockfileChanged) {
fail("package.json changed but package-lock.json did not. Run npm install.");
}
// Reminder about tests
const hasAppChanges = danger.git.modified_files.some((f) =>
f.startsWith("src/")
);
const hasTestChanges = danger.git.modified_files.some(
(f) => f.includes(".test.") || f.includes(".spec.")
);
if (hasAppChanges && !hasTestChanges) {
warn("Code changes but no test changes. Add tests?");
}
// Check TODO comments
const todos = danger.git.modified_files
.filter((f) => f.endsWith(".ts") || f.endsWith(".tsx"))
.flatMap((f) => {
const content = fs.readFileSync(f, "utf8");
return content
.split("\n")
.map((line, i) => ({ line: i + 1, text: line }))
.filter((l) => l.text.includes("TODO") || l.text.includes("FIXME"));
});
if (todos.length > 0) {
message(
`Found ${todos.length} TODO/FIXME comments. Don't forget to address them.`
);
}2. GitHub Copilot for PRs (Beta)
# Automatic PR description generation
name: AI PR Description
on:
pull_request:
types: [opened]
jobs:
generate-description:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate PR description
uses: github/copilot-pr-description@v1
with:
github-token: ${{ secrets.GITHUB_TOKEN }}3. CodeRabbit / CodeGuru — AI Review
CodeRabbit analyzes PRs and leaves comments:
- Finds bugs
- Suggests optimizations
- Checks best practices
- Assesses security risks
Installation:
- Add GitHub App: CodeRabbit
- Configure
.coderabbit.yaml:
reviews:
auto_review: true
request_changes_workflow: false
checks:
- name: "Security"
enabled: true
- name: "Performance"
enabled: true
- name: "Best Practices"
enabled: true
- name: "Code Smell"
enabled: true
filters:
- "!**/*.test.*"
- "!**/node_modules/**"Result: AI leaves comments in PR automatically.
Code Review Automation Tools
Code Review Effectiveness Metrics
You can't manage what you don't measure.
Key Metrics
1. Time to First Review
What it measures: How long a PR sits without review after creation.
Formula:
Time to First Review = Time of first comment - Time of PR creation
How to measure:
# GitHub CLI
gh pr list --state all --json number,createdAt,reviews \
--jq '.[] | select(.reviews | length > 0) |
{pr: .number,
time_to_review: (.reviews[0].submittedAt - .createdAt)}'Benchmarks:
- < 4 hours: Excellent
- 4-24 hours: Good
- 24-48 hours: Poor (context switches kill productivity)
- > 48 hours: Critical (PR is stale, conflicts, demotivation)
2. Review Throughput
What it measures: Time from PR creation to merge.
Formula:
Review Throughput = Merge time - Creation time
Tool:
// Analysis script (Node.js)
const { Octokit } = require("@octokit/rest");
const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
async function analyzeReviewThroughput(owner, repo) {
const prs = await octokit.pulls.list({
owner,
repo,
state: "closed",
per_page: 100,
});
const merged = prs.data.filter((pr) => pr.merged_at);
const throughputs = merged.map((pr) => {
const created = new Date(pr.created_at);
const merged = new Date(pr.merged_at);
const hours = (merged - created) / (1000 * 60 * 60);
return { pr: pr.number, hours };
});
const avg =
throughputs.reduce((sum, t) => sum + t.hours, 0) / throughputs.length;
console.log(`Average throughput: ${avg.toFixed(1)} hours`);
console.log(
`Median: ${median(throughputs.map((t) => t.hours)).toFixed(1)} hours`
);
}Benchmarks:
- < 24 hours: Excellent
- 24-72 hours: Good
- > 1 week: Poor
3. Review Coverage
What it measures: What percentage of PRs receive real review (not just LGTM).
Metric:
Review Coverage = (PRs with 2+ comments / Total PRs) × 100%
How to measure:
import requests
def analyze_review_coverage(owner, repo, token):
headers = {'Authorization': f'token {token}'}
# Get all PRs from last month
prs = requests.get(
f'https://api.github.com/repos/{owner}/{repo}/pulls',
params={'state': 'closed', 'per_page': 100},
headers=headers
).json()
reviewed = 0
for pr in prs:
# Get PR comments
reviews = requests.get(
pr['url'] + '/reviews',
headers=headers
).json()
# Count PR as "reviewed" if > 2 comments
if len(reviews) > 2:
reviewed += 1
coverage = (reviewed / len(prs)) * 100
print(f'Review Coverage: {coverage:.1f}%')Benchmarks:
- > 80%: Excellent
- 60-80%: Good
- < 60%: Poor (many formal reviews)
4. Bugs Found in Review
What it measures: How many serious bugs are caught in review (not production).
How to track:
Use labels in GitHub:
bug-found-in-review— bug found in reviewbug-in-production— bug in production
Metric:
Review Effectiveness = Bugs Found in Review / (Bugs Found in Review + Bugs in Production)
Example:
Per month:
- Found in review: 15 bugs
- Missed to production: 5 bugs
Effectiveness = 15 / (15 + 5) = 75%
Benchmarks:
- > 80%: Excellent (review works)
- 60-80%: Good
- < 60%: Poor (review is formal)
5. Review Participation
What it measures: How many team members actively review.
Metric:
Participation Rate = (Active reviewers / Team size) × 100%
Active reviewer: Someone who did at least 5 reviews per month.
Benchmarks:
- > 80%: Excellent (everyone participates)
- 60-80%: Good
- < 60%: Poor (only seniors review)
6. PR Size Distribution
What it measures: Whether PRs are reasonably sized.
Metric:
PR Size = Additions + Deletions
Benchmarks:
- < 200 lines: Perfect (easy to review)
- 200-500 lines: Good
- 500-1000 lines: Poor (hard to review)
- > 1000 lines: Critical (nobody reviews seriously)
Google Statistics:
PRs > 400 lines are reviewed 60% more superficially than PRs < 200 lines.
CR Metrics Dashboard
Example dashboard in Grafana:
┌─────────────────────────────────────────────────────┐
│ Code Review Metrics Dashboard Dec 2025 │
├─────────────────────────────────────────────────────┤
│ Time to First Review: 6.2 hours ✅ │
│ Review Throughput: 42 hours ✅ │
│ Review Coverage: 82% ✅ │
│ Bugs Found in Review: 15/month │
│ Bugs in Production: 3/month │
│ Review Effectiveness: 83% ✅ │
│ Participation Rate: 75% ⚠️ │
│ │
│ PR Size Distribution: │
│ < 200 lines: 45% █████████ │
│ 200-500 lines: 35% ███████ │
│ 500-1000 lines: 15% ███ │
│ > 1000 lines: 5% █ │
└─────────────────────────────────────────────────────┘
Tracking in Jira/Linear
Create custom fields:
Review Time— review time (hours)Review Comments— number of commentsBugs Found— bugs found in review
Automation:
# GitHub Webhook → Jira update
@app.route('/webhook', methods=['POST'])
def github_webhook():
event = request.json
if event['action'] == 'closed' and event['pull_request']['merged']:
pr = event['pull_request']
# Calculate review time
created = datetime.fromisoformat(pr['created_at'])
merged = datetime.fromisoformat(pr['merged_at'])
review_time = (merged - created).total_seconds() / 3600
# Get number of comments
comments = len(pr.get('comments', []))
# Update Jira
jira_issue = extract_jira_issue(pr['title'])
if jira_issue:
jira.update_issue(
jira_issue,
fields={
'customfield_10050': review_time, # Review Time
'customfield_10051': comments # Review Comments
}
)Case Studies: From Theater to Real CR
Case #1: "LGTM" in 30 Seconds Cost $127k
Problem:
Team of 8. Code Review is formal. Seniors approve with LGTM in seconds without reading code.
Metrics (before):
- Time to First Review: 15 minutes ✅ (seems great?)
- Review Coverage: 12% ❌ (almost no comments)
- Bugs in Production: 8/month ❌
What happened:
PR with payment module refactoring (847 lines). Two seniors approved with LGTM in 30 seconds and 2 minutes respectively.
Bug in retry logic — instead of 3 attempts, system canceled transaction immediately.
Result: $127k lost revenue over 4 hours.
Solution:
- Implemented review checklist (from this article)
- Minimum review time: For PRs > 200 lines, review cannot be approved faster than 30 minutes (physically impossible to read)
- Review Coverage metric: Target > 80%
- Rule: Minimum 2 reviewers, minimum 3 comments per 100 lines
Metrics (after 3 months):
- Time to First Review: 4 hours ⚠️ (slower, but that's okay)
- Review Coverage: 78% ✅
- Bugs in Production: 2/month ✅ (4x reduction)
- Bugs Found in Review: 12/month ✅
ROI: 1 production bug = 3 days team work on hotfix + reputation. CR saves months.
Case #2: Ego Battle Destroyed the Team
Problem:
Senior Paul and mid-level Andrew. Every Paul's review is a war.
Examples of Paul's comments:
"This is terrible code. Redo it."
"How did you even think this would work?"
"Obviously should have used pattern X."
Result:
- Andrew demotivated, makes minimal PRs
- Afraid to experiment
- Quits after 2 months ("no growth, constant criticism")
Solution:
- 1-on-1 with Paul: Explained that code criticism ≠ person criticism
- Training: "Sandwich" technique, using prefixes
[MAJOR],[MINOR] - Team rule: "criticize code, not people"
- Good comment examples in wiki
Result (after 2 months):
- Paul started writing constructive comments
- New junior on team: "Paul is the best mentor, learning a lot"
- Team atmosphere improved
Lesson: Feedback culture is more important than process.
Case #3: PR Sits for 5 Days — Context Lost
Problem:
PR created Monday. First review — Friday.
What happens:
- Andrew already switched to 2 other tasks
- Context lost
- Reviewer's comments unclear (code already forgotten)
- Merge conflicts (other changes in master)
Solution:
- Review SLA: First review within 4 hours
- Bot reminder: If PR without review > 4 hours → pings in Slack
- Reviewer rotation: Someone on duty each day
- Priority: Review is more important than new tasks
Result:
- Time to First Review: 5 days → 3 hours ✅
- Review Throughput: 7 days → 24 hours ✅
- Fewer merge conflicts
- Higher productivity (less context switching)
CR Culture Implementation Checklist
✅ Phase 1: Preparation (1 week)
- Analyze current CR metrics (if any)
- Gather team feedback: What doesn't work in current process
- Adapt checklist from this article to your project
- Create PR template in
.github/PULL_REQUEST_TEMPLATE.md - Write "How to Do Code Review" guide in wiki
✅ Phase 2: Automation (1 week)
- Set up pre-commit hooks (husky + lint-staged)
- Set up CI checks (linter, tests, coverage)
- Connect automatic comments (Danger JS)
- Set up Slack notifications for PRs
- (Optional) Connect AI reviewer (CodeRabbit)
✅ Phase 3: Process (2 weeks)
- Conduct team workshop: "Code Review Culture"
- Why CR is important
- How to give constructive feedback
- How to react to feedback
- Set SLA: First review within 4 hours
- Assign reviewer rotation
- Implement rule: Minimum 2 reviewers for critical modules
✅ Phase 4: Metrics (ongoing)
- Set up metric tracking:
- Time to First Review
- Review Throughput
- Review Coverage
- Bugs Found in Review vs Production
- Create metrics dashboard
- Weekly retro: What can be improved in CR process
✅ Phase 5: Culture (3-6 months)
- Publicly praise quality reviews
- Include CR in developer performance reviews
- Mentorship: Seniors teach mids how to do good reviews
- Collect examples of excellent comments in wiki
- Update guide quarterly based on experience
Key Takeaways
Code Review is not a barrier. It's an investment.
Investment in:
- Code quality (fewer production bugs)
- Team growth (learning through feedback)
- Knowledge (more people know how the system works)
- Culture (openness, trust, constructiveness)
Statistics don't lie:
Remember:
"Code Review isn't about finding mistakes. It's about growing the team. Every PR is an opportunity to learn and teach."
Key Principles:
- Automate routine: formatting, tests, security — CI's job, not human's
- Criticize code, not people: constructive feedback, not ego battles
- Measure effectiveness: metrics show if your process works
- Move fast: PR > 24 hours without review = lost context
- Teach and learn: CR is the primary learning tool in teams
What to Do Right Now
If you don't have a CR process yet:
- Create
.github/PULL_REQUEST_TEMPLATE.md(template above) - Set up pre-commit hooks (5 minutes)
- Set up CI checks (15 minutes)
- Hold 30-minute sync with team about CR culture
If you have a process but it's formal:
- Measure current metrics (Time to Review, Coverage, Bugs)
- Hold retro: What doesn't work in current CR?
- Implement comment classification ([CRITICAL], [MAJOR], etc.)
- Set SLA: First review within 4 hours
If you have a good process:
- Automate even more (AI reviewers, auto-comments)
- Teach mids how to do quality reviews
- Collect examples of best comments in wiki
- Share experience with other teams
First step (takes 5 minutes): Create PR template right now. This is the simplest change with maximum impact.
Useful Resources
Books:
- "The Art of Readable Code" (Boswell, Foucher) — how to write readable code
- "Code Complete" (Steve McConnell) — classic on code quality
- "Accelerate" (Nicole Forsgren) — DevOps effectiveness metrics
Articles:
- Google Engineering Practices: Code Review
- Atlassian: Code Review Best Practices
- Microsoft: Code Review Guidelines
Tools:
- Husky — Git hooks
- lint-staged — run linters on staged files
- Danger JS — automatic PR comments
- CodeRabbit — AI reviewer
- SonarQube — static code analysis
Metrics:
- DORA Metrics — DevOps effectiveness metrics
- Code Climate — code quality monitoring
Share Your Experience
I'm collecting Code Review culture examples and best practices. If you have an interesting case — share:
- Most useful practice in your CR process?
- What metrics do you track?
- Cases when CR saved you from critical bugs?
Write in comments or Telegram. Let's discuss processes, share experience.
Need help implementing Code Review culture? Contact me — I'll audit your process, help set up automation, train your team. First consultation is free.
Liked the article? Share with a colleague who says "review is a waste of time" or "we don't have time for quality CR". It might save their project from expensive production bugs.
Subscribe to updates on Telegram — I write about development, architecture, and team culture. Practice only, no fluff.



