MySQL Logs: Binlog, Redo Log, and Undo Log
MySQL’s logging mechanisms are fundamental to its reliability, performance, and replication capabilities. Understanding the three primary logs—binary log (binlog), redo log, and undo log—is crucial for database administrators and developers working with MySQL at scale.
Overview and Architecture
MySQL employs a multi-layered logging architecture where each log serves specific purposes:
- Redo Log (InnoDB): Ensures crash recovery and durability (ACID compliance)
- Undo Log (InnoDB): Enables transaction rollback and MVCC (Multi-Version Concurrency Control)
- Binary Log (Server Level): Facilitates replication and point-in-time recovery
graph TB
subgraph "MySQL Server"
subgraph "Server Layer"
SQL[SQL Layer]
BL[Binary Log]
end
subgraph "InnoDB Storage Engine"
BP[Buffer Pool]
RL[Redo Log]
UL[Undo Log]
DF[Data Files]
end
end
Client[Client Application] --> SQL
SQL --> BL
SQL --> BP
BP --> RL
BP --> UL
BP --> DF
BL --> Slave[Replica Server]
RL --> Recovery[Crash Recovery]
UL --> MVCC[MVCC Reads]
style RL fill:#e1f5fe
style UL fill:#f3e5f5
style BL fill:#e8f5e8
These logs work together to provide MySQL’s ACID guarantees while supporting high-availability architectures through replication.
Redo Log: Durability and Crash Recovery
Core Concepts
The redo log is InnoDB’s crash recovery mechanism that ensures committed transactions survive system failures. It operates on the Write-Ahead Logging (WAL) principle, where changes are logged before being written to data files.
Key Characteristics:
- Physical logging of page-level changes
- Circular buffer structure with configurable size
- Synchronous writes for committed transactions
- Critical for MySQL’s durability guarantee
Technical Implementation
The redo log consists of multiple files (typically ib_logfile0
, ib_logfile1
) that form a circular buffer. When InnoDB modifies a page, it first writes the change to the redo log, then marks the page as “dirty” in the buffer pool for eventual flushing to disk.
graph LR
subgraph "Redo Log Circular Buffer"
LF1[ib_logfile0]
LF2[ib_logfile1]
LF1 --> LF2
LF2 --> LF1
end
subgraph "Write Process"
Change[Data Change] --> WAL[Write to Redo Log]
WAL --> Mark[Mark Page Dirty]
Mark --> Flush[Background Flush to Disk]
end
LSN1[LSN: 12345]
LSN2[LSN: 12346]
LSN3[LSN: 12347]
Change --> LSN1
LSN1 --> LSN2
LSN2 --> LSN3
style LF1 fill:#e1f5fe
style LF2 fill:#e1f5fe
Log Sequence Number (LSN): A monotonically increasing number that uniquely identifies each redo log record. LSNs are crucial for recovery operations and determining which changes need to be applied during crash recovery.
Configuration and Monitoring
1 | -- Monitor redo log activity and health |
Key Configuration Parameters:
1 | -- Optimal production settings |
Performance Tuning Guidelines:
Log File Sizing: Size the total redo log space to handle 60-90 minutes of peak write activity. Larger logs reduce checkpoint frequency but increase recovery time.
Flush Strategy: The
innodb_flush_log_at_trx_commit
parameter controls durability vs. performance:1
(default): Full ACID compliance, flush and sync on each commit2
: Flush on commit, sync every second (risk: 1 second of transactions on OS crash)0
: Flush and sync every second (risk: 1 second of transactions on MySQL crash)
Interview Deep Dive: Checkpoint Frequency vs Recovery Time
Common Question: “Explain the relationship between checkpoint frequency and redo log size. How does this impact recovery time?”
graph LR
subgraph "Small Redo Logs"
SRL1[Frequent Checkpoints] --> SRL2[Less Dirty Pages]
SRL2 --> SRL3[Fast Recovery]
SRL1 --> SRL4[More I/O Overhead]
SRL4 --> SRL5[Slower Performance]
end
subgraph "Large Redo Logs"
LRL1[Infrequent Checkpoints] --> LRL2[More Dirty Pages]
LRL2 --> LRL3[Slower Recovery]
LRL1 --> LRL4[Less I/O Overhead]
LRL4 --> LRL5[Better Performance]
end
style SRL3 fill:#e8f5e8
style SRL5 fill:#ffebee
style LRL3 fill:#ffebee
style LRL5 fill:#e8f5e8
Answer Framework:
- Checkpoint frequency is inversely related to redo log size
- Small logs: fast recovery, poor performance during high writes
- Large logs: slow recovery, better steady-state performance
- Sweet spot: size logs for 60-90 minutes of peak write activity
- Monitor
Innodb_log_waits
to detect undersized logs
Undo Log: Transaction Rollback and MVCC
Fundamental Role
Undo logs serve dual purposes: enabling transaction rollback and supporting MySQL’s MVCC implementation for consistent reads. They store the inverse operations needed to undo changes made by transactions.
MVCC Implementation:
When a transaction reads data, InnoDB uses undo logs to reconstruct the appropriate version of the data based on the transaction’s read view, enabling non-blocking reads even while other transactions are modifying the same data.
Undo Log Structure and MVCC Showcase
graph TB
subgraph "Transaction Timeline"
T1[Transaction 1<br/>Read View: LSN 100]
T2[Transaction 2<br/>Read View: LSN 200]
T3[Transaction 3<br/>Read View: LSN 300]
end
subgraph "Data Versions via Undo Chain"
V1[Row Version 1<br/>LSN 100<br/>Value: 'Alice']
V2[Row Version 2<br/>LSN 200<br/>Value: 'Bob']
V3[Row Version 3<br/>LSN 300<br/>Value: 'Charlie']
V3 --> V2
V2 --> V1
end
T1 --> V1
T2 --> V2
T3 --> V3
style V1 fill:#f3e5f5
style V2 fill:#f3e5f5
style V3 fill:#f3e5f5
1 | -- Demonstrate MVCC in action |
Management and Troubleshooting
1 | -- Comprehensive undo log diagnostic script |
Best Practices:
- Transaction Hygiene: Keep transactions short to prevent undo log accumulation
- Undo Tablespace Management: Use dedicated undo tablespaces (
innodb_undo_tablespaces = 4
) - Purge Thread Tuning: Configure
innodb_purge_threads = 4
for better cleanup performance
Binary Log: Replication and Recovery
Architecture and Purpose
The binary log operates at the MySQL server level (above storage engines) and records all statements that modify data. It’s essential for replication and point-in-time recovery operations.
Logging Formats:
- Statement-Based (SBR): Logs SQL statements
- Row-Based (RBR): Logs actual row changes (recommended)
- Mixed: Automatically switches between statement and row-based logging
Replication Mechanics
sequenceDiagram
participant App as Application
participant Master as Master Server
participant BinLog as Binary Log
participant Slave as Slave Server
participant RelayLog as Relay Log
App->>Master: INSERT/UPDATE/DELETE
Master->>BinLog: Write binary log event
Master->>App: Acknowledge transaction
Slave->>BinLog: Request new events (I/O Thread)
BinLog->>Slave: Send binary log events
Slave->>RelayLog: Write to relay log
Note over Slave: SQL Thread processes relay log
Slave->>Slave: Apply changes to slave database
Note over Master,Slave: Asynchronous replication
Configuration and Format Comparison
1 | -- View current binary log files |
Production Configuration:
1 | -- High-availability binary log setup |
Interview Scenario: Replication Lag Analysis
Common Question: “A production database suddenly slowed down with replication lag. How would you diagnose?”
1 | -- Step-by-step diagnostic approach |
Optimization Solutions:
- Enable parallel replication:
slave_parallel_workers = 4
- Optimize slow queries on slave
- Consider read/write splitting
- Network optimization between master and slave
Transaction Commit Flow Integration
Understanding how these logs interact during transaction commits is crucial for troubleshooting and optimization:
flowchart TD
Start([Transaction Begins]) --> Changes[Execute DML Statements]
Changes --> UndoWrite[Write Undo Records]
UndoWrite --> RedoWrite[Write Redo Log Records]
RedoWrite --> Prepare[Prepare Phase]
Prepare --> BinLogCheck{Binary Logging Enabled?}
BinLogCheck -->|Yes| BinLogWrite[Write to Binary Log]
BinLogCheck -->|No| RedoCommit[Write Redo Commit Record]
BinLogWrite --> BinLogSync[Sync Binary Log<br/>「if sync_binlog=1」]
BinLogSync --> RedoCommit
RedoCommit --> RedoSync[Sync Redo Log<br/>「if innodb_flush_log_at_trx_commit=1」]
RedoSync --> Complete([Transaction Complete])
Complete --> UndoPurge[Mark Undo for Purge<br/>「Background Process」]
style UndoWrite fill:#f3e5f5
style RedoWrite fill:#e1f5fe
style BinLogWrite fill:#e8f5e8
style RedoCommit fill:#e1f5fe
Group Commit Optimization
Interview Insight: “How does MySQL’s group commit feature improve performance with binary logging enabled?”
Group commit allows multiple transactions to be fsynced together, reducing I/O overhead:
1 | -- Monitor group commit efficiency |
Crash Recovery and Point-in-Time Recovery
Recovery Process Flow
graph TB
subgraph "Crash Recovery Process"
Crash[System Crash] --> Start[MySQL Restart]
Start --> ScanRedo[Scan Redo Log from<br/>Last Checkpoint]
ScanRedo --> RollForward[Apply Committed<br/>Transactions]
RollForward --> ScanUndo[Scan Undo Logs for<br/>Uncommitted Transactions]
ScanUndo --> RollBack[Rollback Uncommitted<br/>Transactions]
RollBack --> BinLogSync[Synchronize with<br/>Binary Log Position]
BinLogSync --> Ready[Database Ready]
end
style ScanRedo fill:#e1f5fe
style ScanUndo fill:#f3e5f5
graph TB
subgraph "Point-in-Time Recovery"
Backup[Full Backup] --> RestoreData[Restore Data Files]
RestoreData --> ApplyBinLog[Apply Binary Logs<br/>to Target Time]
ApplyBinLog --> Recovered[Database Recovered<br/>to Specific Point]
end
style ApplyBinLog fill:#e8f5e8
Point-in-Time Recovery Example
1 | -- Practical PITR scenario |
Environment-Specific Configurations
Production-Grade Configuration
1 | -- High-Performance Production Template |
Interview Scenario: Financial Application Design
Question: “How would you design a MySQL setup for a financial application that cannot lose any transactions?”
graph TB
subgraph "Financial Grade Setup"
App[Application] --> LB[Load Balancer]
LB --> Master[Master DB]
Master --> Sync1[Synchronous Slave 1]
Master --> Sync2[Synchronous Slave 2]
subgraph "Master Configuration"
MC1[innodb_flush_log_at_trx_commit = 1]
MC2[sync_binlog = 1]
MC3[Large redo logs for performance]
MC4[GTID enabled]
end
subgraph "Monitoring"
Mon1[Transaction timeout < 30s]
Mon2[Undo log size alerts]
Mon3[Replication lag < 1s]
end
end
style Master fill:#e8f5e8
style Sync1 fill:#e1f5fe
style Sync2 fill:#e1f5fe
Answer Framework:
- Durability:
innodb_flush_log_at_trx_commit = 1
andsync_binlog = 1
- Consistency: Row-based binary logging with GTID
- Availability: Semi-synchronous replication
- Performance: Larger redo logs to handle synchronous overhead
- Monitoring: Aggressive alerting on log-related metrics
Monitoring and Alerting
Comprehensive Health Check Script
1 | -- Complete MySQL logs health check |
Key Alert Thresholds
Establish monitoring for:
- Redo log waits > 100/second
- Slave lag > 30 seconds
- Long-running transactions > 1 hour
- Binary log disk usage > 80%
- Undo tablespace growth > 20% per hour
Real-Time Monitoring Dashboard
1 | -- Create monitoring view for continuous observation |
Conclusion
MySQL’s logging architecture provides a robust foundation for transaction processing, crash recovery, and high-availability deployments. Key takeaways:
- Redo logs ensure durability through Write-Ahead Logging - size them for 60-90 minutes of peak writes
- Undo logs enable MVCC and rollbacks - keep transactions short to prevent growth
- Binary logs facilitate replication and PITR - use ROW format with GTID for modern deployments
The key to successful MySQL log management lies in understanding your workload’s specific requirements and balancing durability, consistency, and performance. Regular monitoring of log metrics and proactive tuning ensure these critical systems continue to provide reliable service as your database scales.
Remember: in production environments, always test configuration changes in staging first, and maintain comprehensive monitoring to detect issues before they impact your applications.