SQLite best practices
Comprehensive reference covering schema design, performance tuning, device-to-server sync, and production operations. Research compiled April 2026 from official SQLite documentation, benchmarks, and practitioner sources.
Table of Contents
Part I: Schema Design
- Column Naming Conventions
- Data Types and Type Affinity
- Primary Key Strategies
- Foreign Keys and Referential Integrity
- CHECK Constraints and Data Validation
- Schema Design Patterns
Part II: Performance and Tuning
- Indexes
- Triggers
- WAL Mode and Journal Modes
- Transaction Management
- Query Optimization
- PRAGMA Settings for Production
Part III: Operations and Maintenance
- Migration and Versioning
- Backup and Recovery
- Date and Time Handling
- Blob and Large Data
- Security
- Testing with SQLite
- Common Anti-Patterns
Part IV: Device-to-Server Sync
- Schema Design for Sync
- Conflict Resolution
- Sync Protocols and Patterns
- Offline-First Architecture
- Type Mapping Between SQLite and Server Databases
- SQLite Sync Tools and Extensions
- Real-World Sync Architectures
- Performance Considerations for Sync
Part V: Appendices
Part I: Schema Design
1. Column Naming Conventions
Recommended Practice: snake_case everywhere
Use snake_case for all identifiers -- tables, columns, indexes, constraints, triggers. This is the dominant convention in SQL and the most readable across tools and contexts.
Rationale: SQL is case-insensitive for identifiers. CamelCase creates ambiguity: UnderValue and Undervalue are identical to SQLite, but under_value and undervalue are visually distinct. Underscores also improve readability for non-native English speakers and people with vision difficulties.
-- Good
CREATE TABLE workflow_run (
workflow_run_id INTEGER PRIMARY KEY,
workflow_name TEXT NOT NULL,
creation_date TEXT NOT NULL DEFAULT (datetime('now')),
is_active INTEGER NOT NULL DEFAULT 1
);
-- Avoid
CREATE TABLE WorkflowRun (
WorkflowRunID INTEGER PRIMARY KEY,
WorkflowName TEXT NOT NULL,
CreatedAt TEXT NOT NULL DEFAULT (datetime('now')),
IsActive INTEGER NOT NULL DEFAULT 1
);
Table Naming: Plural vs. Singular
Both conventions have advocates. The key argument for plural is that it avoids collisions with SQL reserved words (user is reserved; users is not). The key argument for singular is that each row represents one entity, and singular nouns compose better in compound names (workflow_run vs. workflows_runs).
Pick one and be consistent. If a project already uses singular, stick with singular.
Primary Key Column Naming
Use table_name_id (or at minimum a descriptive name), not bare id.
-- Good: self-documenting in JOINs
SELECT *
FROM audit_log al
JOIN actors a ON a.actor_id = al.changed_by_actor_id;
-- Bad: ambiguous in JOINs, easy to introduce bugs
SELECT *
FROM audit_log al
JOIN actors a ON a.id = al.changed_by;
When both sides of a JOIN say id, errors are invisible. When they say actor_id and finding_id, mismatches are obvious.
Foreign Key Column Naming
Match the referenced column name when possible. When a table references the same parent table multiple times, add a descriptive qualifier:
-- Single reference: match the parent column name
finding_id INTEGER REFERENCES findings(finding_id)
-- Multiple references to same parent: add qualifier
source_actor_id INTEGER REFERENCES actors(actor_id),
destination_actor_id INTEGER REFERENCES actors(actor_id)
Boolean Columns
Prefix with is_ or has_:
is_active INTEGER NOT NULL DEFAULT 1 CHECK (is_active IN (0, 1)),
has_children INTEGER NOT NULL DEFAULT 0 CHECK (has_children IN (0, 1))
Date/Time Columns
Use descriptive event names, not generic suffixes:
-- Good: describes the event
creation_date TEXT NOT NULL DEFAULT (datetime('now')),
modification_date TEXT,
completion_date TEXT
-- Avoid: vague
created_at TEXT,
updated_at TEXT
Reserved Words to Avoid
SQLite has 147 reserved keywords. Common traps for column/table names:
| Dangerous Name | Problem | Alternative |
|---|---|---|
order |
Reserved keyword | sort_order, display_order |
group |
Reserved keyword | team, grouping |
index |
Reserved keyword | position, sort_index |
key |
Reserved keyword | lookup_key, api_key |
value |
Reserved keyword | setting_value, metric_value |
action |
Reserved keyword | operation, activity |
check |
Reserved keyword | validation, check_result |
column |
Reserved keyword | field, attribute |
default |
Reserved keyword | default_value, fallback |
replace |
Reserved keyword | substitution, replacement |
match |
Reserved keyword | comparison, match_result |
plan |
Reserved keyword | execution_plan |
query |
Reserved keyword | search_query |
row |
Reserved keyword | record, entry |
filter |
Reserved keyword | criterion, filter_expr |
If you must use a reserved word, quote it with double quotes ("order"), but this adds friction to every query. Better to choose a different name.
Future-proofing: SQLite adds new keywords over time. The official docs recommend quoting any English word used as an identifier, even if it is not currently reserved.
Index and Constraint Naming
-- Indexes: ix_tablename_purpose
CREATE INDEX ix_findings_workflow_run_id ON findings(workflow_run_id);
CREATE UNIQUE INDEX ux_actors_email ON actors(email);
-- Triggers: tr_tablename_event_purpose
CREATE TRIGGER tr_documents_after_update_audit ...
-- Check constraints: ck_tablename_column
CONSTRAINT ck_employees_salary CHECK (salary > 0)
Sources
- Database Naming Standards (Ovid) -- snake_case rationale, plural tables, FK naming
- SQL Naming Conventions (bbkane) -- PK naming, index conventions, trigger naming
- SQLite Keywords -- official keyword list, quoting rules
- Baeldung SQL Naming Conventions
- BrainStation SQL Naming Conventions
2. Data Types and Type Affinity
SQLite's Type System is Unique
Most databases use static typing -- the column determines the type. SQLite uses dynamic typing -- the value determines the type. A column's declared type is a preference (called "affinity"), not a constraint.
-- This is legal in SQLite (without STRICT):
CREATE TABLE demo (age INTEGER);
INSERT INTO demo VALUES ('not a number'); -- Stores as TEXT
SELECT typeof(age) FROM demo; -- Returns 'text'
The Five Storage Classes
Every value in SQLite belongs to exactly one storage class:
| Storage Class | Description | Size |
|---|---|---|
NULL |
Null value | 0 bytes |
INTEGER |
Signed integer | 0, 1, 2, 3, 4, 6, or 8 bytes (variable) |
REAL |
IEEE 754 float | 8 bytes |
TEXT |
UTF-8 string | Variable |
BLOB |
Raw bytes | Variable |
Key differences from other databases:
- No
BOOLEANtype. UseINTEGERwith 0/1.TRUEandFALSEkeywords (since 3.23.0) are just aliases for 1 and 0. - No
DATE/DATETIMEtype. Store asTEXT(ISO 8601:'YYYY-MM-DD HH:MM:SS'),REAL(Julian day), orINTEGER(Unix timestamp). TEXT is recommended for readability. INTEGERstorage is variable-width (0-8 bytes), not fixed. Small values use less space than large ones.
Type Affinity: The 5-Rule Algorithm
For non-STRICT tables, SQLite determines column affinity from the declared type name using these rules in order of precedence:
| Rule | If declared type contains... | Affinity | Examples |
|---|---|---|---|
| 1 | "INT" |
INTEGER | INT, INTEGER, TINYINT, SMALLINT, MEDIUMINT, BIGINT, INT2, INT8 |
| 2 | "CHAR", "CLOB", "TEXT" |
TEXT | CHARACTER(20), VARCHAR(255), NCHAR, TEXT, CLOB |
| 3 | "BLOB" or no type |
BLOB | BLOB, (no type specified) |
| 4 | "REAL", "FLOA", "DOUB" |
REAL | REAL, DOUBLE, FLOAT, DOUBLE PRECISION |
| 5 | Otherwise | NUMERIC | NUMERIC, DECIMAL(10,5), BOOLEAN, DATE, DATETIME |
Critical: order matters. "CHARINT" matches both rules 1 and 2, but rule 1 wins -- affinity is INTEGER. "FLOATING POINT" contains "INT" (in "POINT"), so affinity is INTEGER, not REAL.
The STRING Gotcha
Declaring a column as STRING gives it NUMERIC affinity (rule 5 -- "STRING" does not contain "CHAR", "CLOB", or "TEXT"). This means number-like strings get silently converted to integers:
CREATE TABLE demo (val STRING);
INSERT INTO demo VALUES ('007');
SELECT typeof(val), val FROM demo;
-- Returns: integer, 7 (leading zeros lost!)
Fix: TEXT MUST be used, never STRING.
NUMERIC Affinity Behavior
NUMERIC affinity aggressively converts text to numbers:
CREATE TABLE demo (val NUMERIC);
INSERT INTO demo VALUES ('3.0e+5');
SELECT typeof(val), val FROM demo;
-- Returns: integer, 300000 (converted from scientific notation to integer)
Comparison Pitfalls
When comparing values of different storage classes without affinity guidance, SQLite uses this ordering: NULL < INTEGER/REAL < TEXT < BLOB. This produces surprising results:
CREATE TABLE t1 (
a TEXT, -- text affinity
b NUMERIC, -- numeric affinity
c BLOB, -- blob affinity (no type)
d -- blob affinity (no type)
);
INSERT INTO t1 VALUES ('500', '500', '500', 500);
-- Column c has BLOB affinity, d is integer 500.
-- Without affinity guidance: INTEGER < TEXT always
SELECT d < '40' FROM t1;
-- Returns: 1 (integer 500 is "less than" text '40')
The full comparison rules:
- If one operand has INTEGER/REAL/NUMERIC affinity and the other has TEXT/BLOB/no affinity, apply NUMERIC affinity to the other operand before comparing.
- If one operand has TEXT affinity and the other has no affinity, apply TEXT affinity to the other operand.
- Otherwise, compare as-is using the storage class ordering.
STRICT Tables (SQLite 3.37.0+)
STRICT tables enforce rigid typing. Declare with the STRICT keyword:
CREATE TABLE measurements (
measurement_id INTEGER PRIMARY KEY,
sensor_name TEXT NOT NULL,
reading REAL NOT NULL,
raw_data BLOB
) STRICT;
Allowed types in STRICT tables: INT, INTEGER, REAL, TEXT, BLOB, ANY.
Behavior:
- Inserting a wrong type raises
SQLITE_CONSTRAINT_DATATYPE - SQLite attempts type coercion first (like other databases), fails if coercion fails
INTEGER PRIMARY KEYstill aliases rowid (butINT PRIMARY KEYdoes not)ANYtype preserves values exactly as inserted, with no coercion
CREATE TABLE demo (val ANY) STRICT;
INSERT INTO demo VALUES ('007');
SELECT typeof(val), val FROM demo;
-- Returns: text, 007 (preserved exactly -- no conversion)
CREATE TABLE demo2 (val INTEGER) STRICT;
INSERT INTO demo2 VALUES ('not a number');
-- ERROR: SQLITE_CONSTRAINT_DATATYPE
Combining with WITHOUT ROWID:
CREATE TABLE lookups (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
) STRICT, WITHOUT ROWID;
Compatibility: Databases with STRICT tables cannot be opened by SQLite versions before 3.37.0.
Practical Recommendations
- Declare explicit types on every column, even in non-STRICT tables. Use
TEXT,INTEGER,REAL,BLOB-- the four canonical storage classes. STRINGMUST NOT be used -- it gives NUMERIC affinity. UseTEXT.- Use
TEXTfor dates in ISO 8601 format ('YYYY-MM-DD HH:MM:SS'). It sorts correctly and is human-readable. - Use
INTEGERfor booleans with CHECK constraints:CHECK (col IN (0, 1)). - Consider STRICT tables for new schemas where type safety matters. The tradeoff is losing compatibility with SQLite < 3.37.0.
- Be explicit about comparisons -- don't rely on affinity coercion in WHERE clauses. Cast or quote consistently.
Sources
- Datatypes In SQLite -- official type system docs, affinity rules, comparison behavior
- STRICT Tables -- official STRICT table docs
- The Advantages of Flexible Typing -- official rationale for dynamic typing
- SQLite's Flexible Typing (DEV Community)
- Understanding Type Affinity in SQLite
3. Primary Key Strategies
How SQLite's rowid Works
Every SQLite table (unless WITHOUT ROWID) has a hidden 64-bit signed integer rowid. It is:
- The actual key used by the B-tree storage engine
- Accessible via the aliases
rowid,_rowid_, oroid(unless a real column shadows these names) - Automatically assigned on INSERT if not specified
- Not persistent --
VACUUMmay reassign rowids unless aliased byINTEGER PRIMARY KEY
Option 1: INTEGER PRIMARY KEY (Recommended Default)
CREATE TABLE findings (
finding_id INTEGER PRIMARY KEY,
content TEXT NOT NULL,
severity TEXT NOT NULL
);
What happens:
finding_idbecomes an alias for rowid -- no extra storage, no separate index- On INSERT without a value, SQLite assigns
max(finding_id) + 1 - If the max row is deleted, that ID can be reused
- This is the fastest possible primary key in SQLite
Critical detail: Only INTEGER PRIMARY KEY aliases rowid. INT PRIMARY KEY does NOT -- it creates a regular column with a separate unique index, doubling storage overhead.
-- These alias rowid (identical behavior):
id INTEGER PRIMARY KEY
id INTEGER PRIMARY KEY NOT NULL -- NOT NULL is redundant but harmless
-- These do NOT alias rowid:
id INT PRIMARY KEY -- INT != INTEGER for this purpose
id INTEGER PRIMARY KEY DESC -- DESC prevents aliasing (since SQLite 3.45.0 this may change)
id INTEGER UNIQUE -- UNIQUE != PRIMARY KEY for this purpose
Option 2: INTEGER PRIMARY KEY AUTOINCREMENT
CREATE TABLE audit_entries (
entry_id INTEGER PRIMARY KEY AUTOINCREMENT,
action TEXT NOT NULL,
actor_id INTEGER NOT NULL
);
What it adds beyond plain INTEGER PRIMARY KEY:
- Guarantees IDs are strictly monotonically increasing and never reused -- even if the highest row is deleted
- Maintains a counter in the internal
sqlite_sequencetable - If the counter reaches
2^63 - 1(9,223,372,036,854,775,807), further inserts fail withSQLITE_FULL - Without AUTOINCREMENT, SQLite would try random IDs at the max, potentially reusing old ones
Performance cost: Every INSERT requires an additional read/write to sqlite_sequence. The official SQLite docs explicitly warn: "The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed."
When to use it: Audit logs, financial ledgers, event streams -- anywhere ID reuse would be semantically wrong or a security concern.
Option 3: UUID
-- Text form (36 chars, human-readable but large)
CREATE TABLE sync_records (
record_id TEXT PRIMARY KEY,
data TEXT NOT NULL
);
-- Blob form (16 bytes, more compact)
CREATE TABLE sync_records (
record_id BLOB PRIMARY KEY,
data TEXT NOT NULL
);
Advantages:
- Globally unique across databases, servers, devices
- Generated client-side -- ID is known before INSERT
- No coordination required in distributed systems
Significant downsides in SQLite:
- Random UUIDs (v4) destroy B-tree locality. Sequential integer inserts append to the rightmost leaf page; random UUIDs scatter inserts across the entire tree, causing page splits and cache thrashing.
- 36-byte TEXT UUID is 9x larger than an 8-byte integer. This ripples into every foreign key and index.
- Does not alias rowid, so SQLite maintains two data structures (the rowid B-tree and a separate index).
UUIDv7 mitigates the locality problem. UUIDv7 (IETF-approved May 2024) encodes a Unix millisecond timestamp in the first 48 bits, making IDs time-ordered. This preserves B-tree locality while maintaining global uniqueness.
-- Store UUIDv7 as BLOB for maximum efficiency
CREATE TABLE distributed_events (
event_id BLOB PRIMARY KEY, -- 16-byte UUIDv7
payload TEXT NOT NULL
) WITHOUT ROWID;
Hybrid approach: Use INTEGER PRIMARY KEY for internal operations (joins, indexes, foreign keys) and a separate UUID column for external APIs:
CREATE TABLE resources (
resource_id INTEGER PRIMARY KEY,
external_id TEXT NOT NULL UNIQUE, -- UUIDv7 for API exposure
resource_name TEXT NOT NULL
);
WITHOUT ROWID Tables
CREATE TABLE word_counts (
word TEXT PRIMARY KEY,
count INTEGER NOT NULL DEFAULT 0
) WITHOUT ROWID;
What it does: Uses the declared PRIMARY KEY as the clustered index key instead of a hidden rowid. The table is stored as a single B-tree keyed on the PRIMARY KEY columns.
When to use:
- Non-integer or composite primary keys
- Small rows (< ~1/20th of page size, roughly 50-200 bytes)
- Tables that do NOT store large strings or BLOBs
Performance benefit: For the word_counts example, a regular table stores the word text twice (once in the rowid B-tree, once in the unique index). WITHOUT ROWID stores it once -- roughly 50% less disk space and 2x faster for simple lookups.
Restrictions:
- Must have an explicit PRIMARY KEY (error if omitted)
- No AUTOINCREMENT
- NOT NULL enforced on all PRIMARY KEY columns (SQL standard)
sqlite3_last_insert_rowid()does not work- No incremental BLOB I/O
- Requires SQLite 3.8.2+
Decision Table
| Situation | Use |
|---|---|
| Default / general tables | INTEGER PRIMARY KEY |
| Audit log, ledger -- IDs must never reuse | INTEGER PRIMARY KEY AUTOINCREMENT |
| Distributed / multi-device sync | UUIDv7 as BLOB (prefer WITHOUT ROWID) |
| Exposing IDs in a public API | Separate UUID column + integer PK internally |
| Non-integer or composite key, small rows | WITHOUT ROWID |
| Maximum performance, local-only DB | INTEGER PRIMARY KEY |
Sources
- Rowid Tables -- official rowid behavior, INTEGER PRIMARY KEY aliasing
- SQLite Autoincrement -- official AUTOINCREMENT docs, sqlite_sequence, performance warnings
- WITHOUT ROWID Tables -- official WITHOUT ROWID docs, when to use, restrictions
- UUID vs Auto-Increment (Bytebase) -- UUID performance analysis, UUIDv7
- Primary Key Data Types (High Performance SQLite)
4. Foreign Keys and Referential Integrity
The Critical First Step: Enable Foreign Keys
Foreign key constraints are disabled by default in SQLite and must be enabled for each database connection at runtime:
PRAGMA foreign_keys = ON;
This is the single most common SQLite pitfall. Developers create schemas with FOREIGN KEY declarations, test them, and find constraints are silently not enforced. The PRAGMA does not persist in the database file -- it must be set every time a connection opens.
Verification:
PRAGMA foreign_keys; -- Returns 0 (off) or 1 (on)
Cannot be changed mid-transaction: Attempting to enable/disable foreign keys inside a BEGIN...COMMIT block silently does nothing.
Why this design? Foreign keys were added long after SQLite's file format was designed. There is no place in the database file to store the on/off state, and changing the default would break billions of existing applications.
Foreign Key Declaration
-- Inline (column-level)
CREATE TABLE tracks (
track_id INTEGER PRIMARY KEY,
track_name TEXT NOT NULL,
artist_id INTEGER NOT NULL REFERENCES artists(artist_id)
);
-- Table-level (required for composite FKs)
CREATE TABLE songs (
song_id INTEGER PRIMARY KEY,
song_artist TEXT NOT NULL,
song_album TEXT NOT NULL,
FOREIGN KEY (song_artist, song_album)
REFERENCES albums(album_artist, album_name)
);
Requirement: The referenced column(s) must be the PRIMARY KEY or have a UNIQUE index. Otherwise, the table creation fails.
ON DELETE / ON UPDATE Actions
Actions configure what happens to child rows when a referenced parent row is deleted or its key is modified. Default is NO ACTION.
| Action | Behavior |
|---|---|
NO ACTION |
Fail if child rows exist (checked at statement end) |
RESTRICT |
Fail immediately, even with deferred constraints |
SET NULL |
Set child FK column(s) to NULL |
SET DEFAULT |
Set child FK column(s) to their DEFAULT value |
CASCADE |
Delete child rows (ON DELETE) or update child FK values (ON UPDATE) |
CREATE TABLE tracks (
track_id INTEGER PRIMARY KEY,
track_name TEXT NOT NULL,
artist_id INTEGER REFERENCES artists(artist_id)
ON DELETE SET NULL
ON UPDATE CASCADE
);
CASCADE example:
-- Parent table
INSERT INTO artists VALUES (1, 'Dean Martin');
INSERT INTO artists VALUES (2, 'Frank Sinatra');
-- Child rows
INSERT INTO tracks VALUES (11, 'That''s Amore', 1);
INSERT INTO tracks VALUES (12, 'Christmas Blues', 1);
INSERT INTO tracks VALUES (13, 'My Way', 2);
-- Update parent key -- cascades to all children
UPDATE artists SET artist_id = 100 WHERE artist_name = 'Dean Martin';
-- After: tracks 11 and 12 now have artist_id = 100
SET DEFAULT pitfall: If the default value does not exist in the parent table, the action fails:
CREATE TABLE tracks (
track_id INTEGER PRIMARY KEY,
track_name TEXT NOT NULL,
artist_id INTEGER DEFAULT 0 REFERENCES artists(artist_id)
ON DELETE SET DEFAULT
);
-- This FAILS if artist_id=0 doesn't exist in artists
DELETE FROM artists WHERE artist_id = 3;
-- Error: foreign key constraint failed
-- Fix: ensure the default value exists
INSERT INTO artists VALUES (0, 'Unknown Artist');
-- Now the delete succeeds
Deferred Constraints
By default, FK constraints are checked at the end of each statement (immediate). Deferred constraints delay checking until COMMIT:
CREATE TABLE tracks (
track_id INTEGER PRIMARY KEY,
track_name TEXT NOT NULL,
artist_id INTEGER REFERENCES artists(artist_id)
DEFERRABLE INITIALLY DEFERRED
);
-- With deferred constraints, insert order doesn't matter within a transaction:
BEGIN;
INSERT INTO tracks VALUES (1, 'My Song', 5); -- artist 5 doesn't exist yet
INSERT INTO artists VALUES (5, 'New Artist'); -- now it does
COMMIT; -- constraint checked here -- passes
Temporary override for all constraints in a session:
PRAGMA defer_foreign_keys = ON;
This is useful for bulk data imports or schema migrations where insert order is inconvenient.
Common Pitfalls
1. NULL values bypass FK checks:
-- This succeeds even if artist_id=999 doesn't exist
INSERT INTO tracks (track_id, track_name, artist_id) VALUES (1, 'Test', NULL);
-- NULL in any FK column = no parent row required
2. Missing indexes on child FK columns: Without an index on the child's FK column, every parent DELETE/UPDATE requires a full table scan of the child table:
-- Always create indexes on FK columns
CREATE INDEX ix_tracks_artist_id ON tracks(artist_id);
3. Composite FKs must match exactly: The child column count, types, and collation must match the parent's PRIMARY KEY or UNIQUE constraint exactly.
4. ALTER TABLE restrictions: You cannot add a new column with a FK constraint and a non-NULL default:
-- Fails
ALTER TABLE tracks ADD COLUMN genre_id INTEGER NOT NULL DEFAULT 1
REFERENCES genres(genre_id);
-- Works (NULL default)
ALTER TABLE tracks ADD COLUMN genre_id INTEGER REFERENCES genres(genre_id);
5. Cross-schema FKs not supported: Foreign keys cannot reference tables in attached databases.
Sources
- SQLite Foreign Key Support -- official comprehensive FK docs
- SQLite Forum: Why is FK support per-connection? -- design rationale
- SQLite Foreign Keys: Common Pitfalls -- indexed practical guide
5. CHECK Constraints and Data Validation
Syntax
-- Inline (column-level)
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
product_name TEXT NOT NULL,
quantity INTEGER NOT NULL CHECK (quantity >= 0),
price REAL NOT NULL CHECK (price > 0)
);
-- Table-level (can reference multiple columns)
CREATE TABLE events (
event_id INTEGER PRIMARY KEY,
start_date TEXT NOT NULL,
end_date TEXT NOT NULL,
CHECK (end_date >= start_date)
);
-- Named constraint
CREATE TABLE employees (
employee_id INTEGER PRIMARY KEY,
salary REAL NOT NULL,
CONSTRAINT ck_employees_salary CHECK (salary > 0)
);
There is no functional difference between column-level and table-level CHECK constraints. The only advantage of table-level is the ability to reference multiple columns.
How CHECK Evaluation Works
- The CHECK expression is evaluated on every INSERT and UPDATE
- The result is cast to NUMERIC
- If result is integer 0 or real 0.0 -- constraint violation (
SQLITE_CONSTRAINT_CHECK) - If result is NULL -- no violation (NULL is truthy for CHECK purposes)
- If result is any non-zero value -- no violation
The NULL gotcha: CHECK (status IN ('active', 'inactive')) allows NULL values because NULL IN (...) evaluates to NULL, which is not zero. Add NOT NULL separately if NULL should be prohibited.
Common Validation Patterns
Range Validation
age INTEGER NOT NULL CHECK (age >= 0 AND age <= 150),
score REAL NOT NULL CHECK (score BETWEEN 0.0 AND 100.0),
percent INTEGER NOT NULL CHECK (percent >= 0 AND percent <= 100)
Enum-Like Constraints (Restricting to Known Values)
status TEXT NOT NULL CHECK (status IN ('pending', 'active', 'completed', 'failed')),
priority INTEGER NOT NULL CHECK (priority IN (1, 2, 3, 4, 5)),
direction TEXT NOT NULL CHECK (direction IN ('inbound', 'outbound'))
Boolean Enforcement
is_active INTEGER NOT NULL DEFAULT 1 CHECK (is_active IN (0, 1))
Pattern Matching
email TEXT NOT NULL CHECK (email LIKE '%_@_%.__%'),
phone TEXT CHECK (phone LIKE '+%' OR phone IS NULL),
code TEXT NOT NULL CHECK (length(code) = 6 AND code GLOB '[A-Z][A-Z][0-9][0-9][0-9][0-9]')
Multi-Column Constraints
CREATE TABLE promotions (
promotion_id INTEGER PRIMARY KEY,
start_date TEXT NOT NULL,
end_date TEXT NOT NULL,
discount REAL NOT NULL,
CHECK (end_date > start_date),
CHECK (discount > 0 AND discount <= 1.0)
);
Conditional Logic
CREATE TABLE inventory (
product_id INTEGER PRIMARY KEY,
product_name TEXT NOT NULL,
stock INTEGER NOT NULL,
status TEXT NOT NULL,
CHECK (
(status = 'surplus' AND stock >= 500) OR
(status != 'surplus')
)
);
String Length Validation
username TEXT NOT NULL CHECK (length(username) >= 3 AND length(username) <= 50),
api_key TEXT NOT NULL CHECK (length(api_key) = 32)
What Is NOT Allowed in CHECK Expressions
These are explicitly prohibited and will cause errors:
| Prohibited | Reason |
|---|---|
Subqueries (SELECT ...) |
Cannot reference other rows or tables |
CURRENT_TIME |
Non-deterministic |
CURRENT_DATE |
Non-deterministic |
CURRENT_TIMESTAMP |
Non-deterministic |
Workaround for date validation: You cannot use CHECK (event_date <= CURRENT_DATE) in the schema definition. However, DEFAULT (datetime('now')) works for defaults because defaults are evaluated at INSERT time, not at schema creation time.
For date validation that depends on "now", use triggers or application-level validation instead.
Conflict Resolution
The conflict resolution algorithm for CHECK constraints is always ABORT. The ON CONFLICT clause is parsed for historical compatibility but has no effect:
-- ON CONFLICT is ignored for CHECK constraints
quantity INTEGER NOT NULL CHECK (quantity >= 0) ON CONFLICT REPLACE
-- Still ABORTs on violation, does NOT replace
Disabling CHECK Constraints
For data migrations or imports of potentially dirty data:
PRAGMA ignore_check_constraints = ON;
-- Import data...
PRAGMA ignore_check_constraints = OFF;
After import, verify data integrity:
PRAGMA integrity_check; -- Reports CHECK violations as corruption
Limitations
-
Cannot add via ALTER TABLE. You must recreate the table:
-- 1. Create new table with constraints CREATE TABLE new_table (...constraints...); -- 2. Copy data INSERT INTO new_table SELECT * FROM old_table; -- 3. Drop old table DROP TABLE old_table; -- 4. Rename ALTER TABLE new_table RENAME TO old_table; -
Row-scoped only. Cannot validate against other rows (use triggers for cross-row validation).
-
Not verified on SELECT. Corrupted data (from external file manipulation or disabled checks) can be read even if it violates constraints.
-
Minimal performance impact. CHECK expressions are simple comparisons evaluated in-process. Modern SQLite versions have optimized constraint evaluation. The cost is negligible compared to disk I/O.
Sources
- CREATE TABLE: CHECK constraints -- official docs, evaluation rules, prohibited expressions
- Validating Data with SQLite CHECK Constraints (Sling Academy)
- How to Write Effective CHECK Constraints (Sling Academy)
- SQLite Check Constraints (sql-easy.com)
- SQLite Data Validation: Using CHECK and Alternatives
6. Schema Design Patterns
Polymorphic Foreign Keys
The Problem
A polymorphic foreign key is a column that references one of several different tables. A common example is a changed_by column in an audit log where the actor could be a human, a service, or a bot.
Pattern 1: Generic FK with Discriminator Column
CREATE TABLE audit_log (
audit_log_id INTEGER PRIMARY KEY,
changed_by_id INTEGER NOT NULL,
changed_by_type TEXT NOT NULL CHECK (changed_by_type IN ('human', 'service', 'bot'))
);
Store both the id and a type discriminator. Application code resolves the join. SQLite will not enforce FK referential integrity across multiple tables even with PRAGMA foreign_keys = ON, so the app layer owns that constraint.
Pros: Simple, works everywhere, no schema changes when adding types Cons: No DB-level FK enforcement, easy to get into inconsistent state
Pattern 2: Supertype / Base Table (Recommended)
CREATE TABLE actors (
actor_id INTEGER PRIMARY KEY AUTOINCREMENT,
actor_type TEXT NOT NULL CHECK (actor_type IN ('human', 'service', 'bot')),
display_name TEXT NOT NULL -- denormalized for fast queries
);
CREATE TABLE humans (
actor_id INTEGER PRIMARY KEY REFERENCES actors(actor_id),
name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE
);
CREATE TABLE services (
actor_id INTEGER PRIMARY KEY REFERENCES actors(actor_id),
service_name TEXT NOT NULL,
api_key_hint TEXT
);
CREATE TABLE bots (
actor_id INTEGER PRIMARY KEY REFERENCES actors(actor_id),
bot_name TEXT NOT NULL,
owner_actor_id INTEGER REFERENCES actors(actor_id)
);
CREATE TABLE audit_log (
audit_log_id INTEGER PRIMARY KEY AUTOINCREMENT,
table_name TEXT NOT NULL,
row_id INTEGER NOT NULL,
operation TEXT NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
changed_by INTEGER NOT NULL REFERENCES actors(actor_id),
change_date TEXT NOT NULL DEFAULT (datetime('now'))
);
audit_log.changed_by is a real, enforced FK into actors. Each subtype has its own table with a 1:1 FK back to actors.
Pros: True referential integrity, clean join point, scalable query pattern Cons: Extra join to get subtype-specific data, insert order matters (supertype first)
Pattern 3: Nullable Column Per Type
CREATE TABLE audit_log (
audit_log_id INTEGER PRIMARY KEY,
changed_by_human_id INTEGER REFERENCES humans(actor_id),
changed_by_service_id INTEGER REFERENCES services(actor_id),
changed_by_bot_id INTEGER REFERENCES bots(actor_id),
CHECK (
(changed_by_human_id IS NOT NULL) +
(changed_by_service_id IS NOT NULL) +
(changed_by_bot_id IS NOT NULL) = 1
)
);
Pros: Real FK enforcement on each column, fully declarative Cons: Gets unwieldy fast as types grow, adds nullable columns
Recommendation
Use Pattern 2 (supertype table) when actor types share a common identity concept. It is the most principled design and gives you real referential integrity. Denormalize display_name onto the supertype to avoid subtype joins for common queries (audit feeds, lists).
Use Pattern 1 when moving fast or comfortable owning FK integrity in the application layer.
Normalized vs Denormalized
Recommended practice: Start with a normalized schema (3NF), then selectively denormalize only the hotspots where joins are measurably slow. This hybrid approach gives you clean data with targeted performance boosts.
When to normalize: Transactional systems (banking, inventory, ERP) where accuracy, redundancy control, and storage efficiency matter most.
When to denormalize: Read-heavy workloads (data warehouses, dashboards, BI tools) where retrieval speed is the priority. Denormalization deliberately introduces redundancy to speed up certain queries at the cost of extra storage and increased risk of anomalies.
SQLite-specific consideration: Because SQLite is embedded (zero network latency), the N+1 problem is far less costly than with client/server databases. Multiple simple queries often outperform complex joins, so the pressure to denormalize is lower than in PostgreSQL or MySQL.
Benchmark reference: In one test with 5,000 bars and 10K wines, denormalized queries ran 16x faster for one pattern (569ms vs 9,143ms) and 104x faster for another (83ms vs 8,648ms). The denormalized database was 50% smaller.
Sources:
- Database Schema Design Patterns for SQLite
- Database Design Patterns Every Developer Should Know
- ByteByteGo: Normalization vs Denormalization
- SQLite JSON and Denormalization
JSON Columns
SQLite's JSON functions let you store structured data in TEXT columns while still querying into them.
Core extraction:
-- json_extract: returns SQL type for scalars, JSON text for objects/arrays
SELECT json_extract(data, '$.name') FROM events;
-- For {"name": "alice"} returns: 'alice'
-- ->> operator: always returns SQL type (TEXT, INTEGER, REAL, NULL)
SELECT data ->> '$.name' FROM events;
-- -> operator: always returns JSON text representation
SELECT data -> '$.tags' FROM events;
-- For {"tags": [1,2]} returns: '[1,2]'
Iterating arrays with json_each:
-- Find users with a 704 area code phone number
SELECT DISTINCT user.name
FROM user, json_each(user.phone)
WHERE json_each.value LIKE '704-%';
Modifying JSON:
-- json_set: creates or overwrites
UPDATE events SET data = json_set(data, '$.status', 'processed');
-- json_insert: creates only (won't overwrite)
UPDATE events SET data = json_insert(data, '$.new_field', 42);
-- json_replace: overwrites only (won't create)
UPDATE events SET data = json_replace(data, '$.status', 'done');
-- Append to array (use $[#] for end position)
UPDATE events SET data = json_set(data, '$.tags[#]', 'new-tag');
-- Remove a key
UPDATE events SET data = json_remove(data, '$.temp_field');
Aggregating rows into JSON:
-- Build a JSON array from rows
SELECT json_group_array(json_object('id', id, 'name', name)) FROM users;
-- Returns: [{"id":1,"name":"alice"}, {"id":2,"name":"bob"}]
-- Build a JSON object from rows
SELECT json_group_object(name, score) FROM leaderboard;
-- Returns: {"alice":100, "bob":85}
Validation:
SELECT json_valid('{"x":35}'); -- 1 (valid)
SELECT json_valid('{"x":35'); -- 0 (invalid)
SELECT json_valid('{x:35}', 6); -- 1 (valid JSON5 or JSONB)
Source: SQLite JSON Functions
Generated Columns for JSON Indexing
The most powerful pattern for JSON performance: virtual generated columns with B-tree indexes.
CREATE TABLE documents (
id INTEGER PRIMARY KEY,
body TEXT -- JSON
);
-- Extract fields as virtual generated columns
ALTER TABLE documents ADD COLUMN doc_type TEXT
GENERATED ALWAYS AS (body ->> '$.type') VIRTUAL;
ALTER TABLE documents ADD COLUMN author TEXT
GENERATED ALWAYS AS (body ->> '$.author') VIRTUAL;
-- Index the generated columns for B-tree speed
CREATE INDEX idx_doc_type ON documents(doc_type);
CREATE INDEX idx_author ON documents(author);
-- Queries now use indexes (verify with EXPLAIN QUERY PLAN):
SELECT * FROM documents WHERE doc_type = 'report' AND author = 'alice';
-- SEARCH documents USING INDEX idx_doc_type (doc_type=?)
VIRTUAL vs STORED:
- VIRTUAL: computed on read, no disk space, can be added with ALTER TABLE
- STORED: computed on write, uses disk space, cannot be added with ALTER TABLE
- Use STORED when reads vastly outnumber writes; VIRTUAL otherwise
Key advantages:
- No extra write overhead (VIRTUAL columns are computed on read)
- Can be added with ALTER TABLE without rebuilding the table
- Indexes work exactly like regular column indexes
- Zero back-filling when adding new virtual columns
Sources: SQLite Generated Columns, SQLite JSON Virtual Columns + Indexing
Materialized Views via Triggers
SQLite has no native materialized views. Simulate them with a table + triggers:
-- 1. Create the materialized view table
CREATE TABLE report_summary (
report_id INTEGER PRIMARY KEY,
category TEXT NOT NULL,
item_count INTEGER NOT NULL DEFAULT 0,
last_updated TEXT NOT NULL
);
-- 2. Populate on insert to source table
CREATE TRIGGER trg_report_insert AFTER INSERT ON items
BEGIN
INSERT INTO report_summary (report_id, category, item_count, last_updated)
VALUES (NEW.report_id, NEW.category, 1, datetime('now'))
ON CONFLICT(report_id) DO UPDATE SET
item_count = item_count + 1,
last_updated = datetime('now');
END;
-- 3. Update on delete from source table
CREATE TRIGGER trg_report_delete AFTER DELETE ON items
BEGIN
UPDATE report_summary
SET item_count = item_count - 1,
last_updated = datetime('now')
WHERE report_id = OLD.report_id;
END;
-- 4. Full refresh when triggers are insufficient
DELETE FROM report_summary;
INSERT INTO report_summary (report_id, category, item_count, last_updated)
SELECT report_id, category, COUNT(*), datetime('now')
FROM items GROUP BY report_id, category;
Tradeoffs:
- Trigger-maintained: always current, adds overhead to every write on the source table
- Scheduled refresh: stale between refreshes, no write overhead
- Choose triggers for small-to-medium tables; scheduled refresh for large aggregation tables
Source: SQLite Triggers as Materialized Views
EAV (Entity-Attribute-Value) Pattern
What it is: Three columns -- entity_id, attribute_name, value -- allowing flexible attributes per entity.
CREATE TABLE product_attributes (
entity_id INTEGER REFERENCES products(product_id),
attribute TEXT NOT NULL,
value TEXT,
PRIMARY KEY (entity_id, attribute)
);
When to use: Product catalogs with hundreds of optional attributes, clinical records with thousands of possible fields, or any domain where the set of attributes is not known at schema design time.
Why it often fails:
- Every query retrieving multiple attributes requires self-joins or pivot operations
- Filtering is slow because
valueis always TEXT (no type enforcement) - No foreign key constraints on values
- Query complexity grows multiplicatively with each attribute
Modern alternative for SQLite: Use typed columns for core fields and JSON for variable attributes (requires SQLite 3.38.0+ for ->> operator):
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
price REAL NOT NULL,
attributes TEXT DEFAULT '{}' -- JSON
);
-- Query with JSON:
SELECT product_id, name, attributes ->> '$.color' AS color
FROM products
WHERE attributes ->> '$.color' = 'red';
-- Index on JSON field:
CREATE INDEX ix_products_color ON products(attributes ->> '$.color');
Sources:
Single-Table Inheritance
All subtypes stored in one table with a type discriminator column. Subtype-specific columns are NULL for rows of other types.
CREATE TABLE vehicles (
vehicle_id INTEGER PRIMARY KEY,
vehicle_type TEXT NOT NULL CHECK (vehicle_type IN ('car', 'truck', 'motorcycle')),
make TEXT NOT NULL,
model TEXT NOT NULL,
-- car-specific
num_doors INTEGER,
-- truck-specific
payload_tons REAL,
-- motorcycle-specific
engine_cc INTEGER
);
When to use: Few subtypes (2-4), subtypes share most columns, and you query across all types frequently. Simple to implement, single-table scans, no joins needed.
When to avoid: Many subtypes, subtypes have few shared columns (table becomes mostly NULL), or you need strict NOT NULL constraints on subtype-specific fields.
SQLite-specific consideration: Without STRICT mode, SQLite will happily store any type in any column, so the CHECK constraint on vehicle_type is your main safety net. Consider STRICT tables (SQLite 3.37.0+) for type enforcement.
Adjacency Lists and Tree Hierarchies
Three primary patterns for hierarchical data:
Adjacency List (simplest)
CREATE TABLE categories (
category_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
parent_id INTEGER REFERENCES categories(category_id)
);
Simple to implement. Finding immediate children: WHERE parent_id = ?. Finding all descendants requires recursive CTEs (SQLite 3.8.3+):
WITH RECURSIVE descendants AS (
SELECT category_id, name, parent_id FROM categories WHERE category_id = ?
UNION ALL
SELECT c.category_id, c.name, c.parent_id
FROM categories c
JOIN descendants d ON c.parent_id = d.category_id
)
SELECT * FROM descendants;
Trade-offs: Minimal storage, simple writes, but recursive reads.
Closure Table (best for read-heavy hierarchies)
CREATE TABLE category_closure (
ancestor_id INTEGER NOT NULL REFERENCES categories(category_id),
descendant_id INTEGER NOT NULL REFERENCES categories(category_id),
depth INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (ancestor_id, descendant_id)
);
Every path in the hierarchy is stored as a row. Efficient queries without recursion:
-- All descendants of node 1:
SELECT descendant_id FROM category_closure WHERE ancestor_id = 1;
-- All ancestors of node 4:
SELECT ancestor_id FROM category_closure WHERE descendant_id = 4;
-- Direct children only:
SELECT descendant_id FROM category_closure WHERE ancestor_id = 1 AND depth = 1;
Trade-offs: Excellent read performance (indexed flat queries, no recursion), but O(n^2) worst-case storage and complex insert/delete maintenance.
SQLite-specific: SQLite has a transitive_closure extension that auto-maintains the closure table from an adjacency list.
Nested Sets (best for static hierarchies)
CREATE TABLE categories (
category_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
left_boundary INTEGER NOT NULL,
right_boundary INTEGER NOT NULL
);
-- All descendants of node with left=1, right=10:
SELECT * FROM categories
WHERE left_boundary > 1 AND right_boundary < 10;
Trade-offs: Excellent read performance without recursion, low storage, but adding/deleting/moving nodes requires renumbering boundaries -- expensive for frequently-modified trees.
Decision Table
| Pattern | Read Performance | Write Complexity | Storage | Best For |
|---|---|---|---|---|
| Adjacency List | Moderate (recursive) | Simple | Minimal | Dynamic trees with few depth queries |
| Closure Table | Excellent | Moderate | High | Read-heavy, deep hierarchies |
| Nested Sets | Excellent | High (renumbering) | Low | Static/rarely-modified hierarchies |
Sources:
- Mastering SQL Trees: Adjacency Lists to Nested Sets and Closure Tables
- Querying Tree Structures in SQLite
- Hierarchical Data in SQL: The Ultimate Guide
Part II: Performance and Tuning
7. Indexes
B-Tree Architecture
SQLite stores all data in B-trees (specifically B+ trees). Each table is a B-tree keyed by rowid; each index is a separate B-tree keyed by the indexed columns with rowid appended. A lookup in a B-tree is O(log N), versus O(N) for a full table scan.
When a query uses an index, SQLite performs two binary searches: one on the index B-tree to find the rowid, then one on the table B-tree to retrieve the row. This is why covering indexes matter -- they eliminate the second lookup.
Single-Column Indexes
CREATE INDEX idx_fruit ON fruitsforsale(fruit);
Reduces lookup from O(N) to O(log N). Still requires two binary searches (index + table).
Multi-Column Indexes
CREATE INDEX idx_fruit_state ON fruitsforsale(fruit, state);
Rows are ordered by first column, with subsequent columns as tie-breakers. The query planner can use a multi-column index for any left-prefix of the indexed columns.
Column ordering rules:
- Equality columns (
=,IN,IS) go first - The rightmost used column can use inequalities (
<,>,<=,>=,BETWEEN) - No gaps allowed -- if columns are (a, b, c), you cannot use a and c without b
- Columns to the right of an inequality constraint are not used for indexing
-- Given index on (a, b, c):
WHERE a = 1 AND b = 2 AND c > 3 -- uses all 3 columns
WHERE a = 1 AND b > 2 -- uses a and b
WHERE a = 1 -- uses only a
WHERE b = 2 -- CANNOT use this index (no left-prefix)
Rule of thumb: "Your database schema should never contain two indices where one index is a prefix of the other." If you have an index on (a, b, c), you do not need a separate index on (a) or (a, b).
Source: SQLite Query Planning
Covering Indexes
A covering index includes all columns needed by a query, eliminating the table lookup entirely. This cuts the number of binary searches in half, roughly doubling query speed.
-- Query needs fruit, state, and price
SELECT price FROM fruitsforsale WHERE fruit = 'Orange' AND state = 'CA';
-- Covering index: includes the output column (price)
CREATE INDEX idx_fruit_state_price ON fruitsforsale(fruit, state, price);
EXPLAIN QUERY PLAN shows USING COVERING INDEX when this optimization applies:
QUERY PLAN
`--SEARCH fruitsforsale USING COVERING INDEX idx_fruit_state_price (fruit=? AND state=?)
Source: SQLite Query Planning
Partial Indexes
Index only a subset of rows by adding a WHERE clause. Reduces index size, speeds up writes, and can enforce conditional uniqueness.
-- Only index non-NULL values (useful when most rows are NULL)
CREATE INDEX idx_parent_po ON purchaseorder(parent_po)
WHERE parent_po IS NOT NULL;
-- Enforce "only one team leader per team"
CREATE UNIQUE INDEX idx_team_leader ON person(team_id)
WHERE is_team_leader;
Query planner usage rules:
- The partial index WHERE clause terms must appear (exactly or by implication) in the query WHERE clause
IS NOT NULLin the index is satisfied by any comparison operator (=,<,>,<>,IN,LIKE,GLOB) on that column- Expression matching is literal --
b=6matches6=bbut NOTb=3+3
CREATE INDEX idx_active ON orders(customer_id) WHERE status = 'active';
-- Uses the partial index:
SELECT * FROM orders WHERE customer_id = 42 AND status = 'active';
-- Does NOT use the partial index (status term missing):
SELECT * FROM orders WHERE customer_id = 42;
Available since SQLite 3.8.0. Databases with partial indexes are unreadable by older versions.
Source: SQLite Partial Indexes
Expression Indexes
Index the result of an expression rather than a raw column value.
CREATE INDEX idx_upper_last ON employees(UPPER(last_name));
CREATE INDEX idx_abs_amount ON account_change(acct_no, abs(amt));
CREATE INDEX idx_length_company ON customers(LENGTH(company));
Critical constraint: exact expression matching. The query planner does not do algebra. The expression in the query must match the index definition exactly:
-- Given: CREATE INDEX idx_xy ON t(x + y);
WHERE x + y > 10 -- USES the index
WHERE y + x > 10 -- does NOT use the index (different operand order)
Restrictions:
- Only deterministic functions allowed (no
random(),sqlite_version()) - Can only reference columns from the indexed table
- No subqueries
- Only usable in CREATE INDEX (not UNIQUE or PRIMARY KEY constraints)
Available since SQLite 3.9.0.
Source: SQLite Indexes on Expressions
Generated Columns + JSON Indexing
Generated columns (3.31.0+) let you extract values from JSON and index them at B-tree speed:
CREATE TABLE events (
id INTEGER PRIMARY KEY,
data TEXT -- JSON blob
);
-- Virtual generated columns (computed on read, no storage cost)
ALTER TABLE events ADD COLUMN event_type TEXT
GENERATED ALWAYS AS (json_extract(data, '$.type')) VIRTUAL;
ALTER TABLE events ADD COLUMN event_date TEXT
GENERATED ALWAYS AS (json_extract(data, '$.date')) VIRTUAL;
-- Index the generated columns
CREATE INDEX idx_event_type ON events(event_type);
CREATE INDEX idx_event_date ON events(event_date);
-- Now this query uses B-tree index speed:
SELECT * FROM events WHERE event_type = 'click';
VIRTUAL vs STORED:
- VIRTUAL: computed on read, no disk space, can be added with ALTER TABLE
- STORED: computed on write, uses disk space, cannot be added with ALTER TABLE
- Use STORED when reads vastly outnumber writes; VIRTUAL otherwise
Source: SQLite Generated Columns
EXPLAIN QUERY PLAN
The essential tool for understanding index usage.
EXPLAIN QUERY PLAN SELECT * FROM orders WHERE customer_id = 42;
Key output terms:
SCAN table-- full table scan (bad for large tables)SEARCH table USING INDEX idx (col=?)-- index lookup (good)SEARCH table USING COVERING INDEX idx (col=?)-- no table lookup needed (best)USE TEMP B-TREE FOR ORDER BY-- sorting required (index could eliminate this)CORRELATED SCALAR SUBQUERY-- runs once per outer row (expensive)MULTI-INDEX OR-- separate index lookups combined for OR conditionsAUTOMATIC INDEX-- SQLite created a temporary index (permanent index would help)
-- Enabling automatic EXPLAIN QUERY PLAN in the CLI:
.eqp on
-- Now every query automatically shows its plan before results
Source: SQLite EXPLAIN QUERY PLAN
Over-Indexing Pitfalls
Every index must be maintained on every INSERT, UPDATE, and DELETE. The number of indexes on a table is the dominant factor for insert performance.
Guidance:
- Before creating an index, ask: "Will queries WHERE, JOIN, or ORDER BY this column?"
- Remove indexes that EXPLAIN QUERY PLAN never references
- Never have two indexes where one is a prefix of the other
- Run
PRAGMA optimizeperiodically so the query planner has current statistics - Monitor with:
SELECT * FROM sqlite_stat1;(populated by ANALYZE)
Benchmark reference: With secondary indexes present, insert performance may reduce by up to 5x compared to a table with no secondary indexes.
Sources: Use The Index, Luke - Insert, Common Indexing Mistakes
8. Triggers
Syntax
CREATE [TEMP | TEMPORARY] TRIGGER [IF NOT EXISTS] trigger_name
{BEFORE | AFTER | INSTEAD OF} {DELETE | INSERT | UPDATE [OF column_name, ...]}
ON table_name
[FOR EACH ROW]
[WHEN expression]
BEGIN
-- one or more statements
END;
Trigger Types
| Type | Works On | When It Fires |
|---|---|---|
BEFORE |
Tables only | Before the row modification |
AFTER |
Tables only | After the row modification |
INSTEAD OF |
Views only | Replaces the triggering operation entirely |
SQLite supports only FOR EACH ROW triggers (not FOR EACH STATEMENT like PostgreSQL).
NEW and OLD References
| Event | NEW.column |
OLD.column |
|---|---|---|
| INSERT | Available | Not available |
| UPDATE | Available (new values) | Available (old values) |
| DELETE | Not available | Available |
Common Use Cases
1. Audit Trail
CREATE TABLE audit_log (
audit_log_id INTEGER PRIMARY KEY,
table_name TEXT NOT NULL,
record_id INTEGER NOT NULL,
operation TEXT NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
old_values TEXT, -- JSON of old row
new_values TEXT, -- JSON of new row
change_date TEXT NOT NULL DEFAULT (datetime('now'))
);
-- After INSERT: log the new row
CREATE TRIGGER tr_documents_after_insert_audit
AFTER INSERT ON documents
FOR EACH ROW
BEGIN
INSERT INTO audit_log (table_name, record_id, operation, new_values)
VALUES (
'documents',
NEW.document_id,
'INSERT',
json_object('title', NEW.title, 'body', NEW.body)
);
END;
-- After UPDATE: log old and new values
CREATE TRIGGER tr_documents_after_update_audit
AFTER UPDATE ON documents
FOR EACH ROW
BEGIN
INSERT INTO audit_log (table_name, record_id, operation, old_values, new_values)
VALUES (
'documents',
NEW.document_id,
'UPDATE',
json_object('title', OLD.title, 'body', OLD.body),
json_object('title', NEW.title, 'body', NEW.body)
);
END;
-- Before DELETE: log the deleted row
CREATE TRIGGER tr_documents_before_delete_audit
BEFORE DELETE ON documents
FOR EACH ROW
BEGIN
INSERT INTO audit_log (table_name, record_id, operation, old_values)
VALUES (
'documents',
OLD.document_id,
'DELETE',
json_object('title', OLD.title, 'body', OLD.body)
);
END;
2. Automatic Timestamp Updates
CREATE TRIGGER tr_documents_after_update_timestamp
AFTER UPDATE ON documents
FOR EACH ROW
WHEN NEW.modification_date = OLD.modification_date OR NEW.modification_date IS NULL
BEGIN
UPDATE documents
SET modification_date = datetime('now')
WHERE document_id = NEW.document_id;
END;
The WHEN clause prevents infinite recursion -- the trigger only fires when the timestamp was not explicitly set by the UPDATE statement.
3. Business Rule Validation
CREATE TRIGGER tr_sales_before_insert_validate
BEFORE INSERT ON sales
FOR EACH ROW
BEGIN
SELECT CASE
WHEN NEW.sale_price < NEW.purchase_price THEN
RAISE(ABORT, 'Sale price must not be less than purchase price')
END;
END;
The RAISE() function is trigger-specific and provides error handling:
RAISE(ROLLBACK, 'message')-- rolls back the entire transactionRAISE(ABORT, 'message')-- aborts the current statement, undoes its changes, but preserves prior statements in the transactionRAISE(FAIL, 'message')-- fails the current statement but keeps changes already made by itRAISE(IGNORE)-- silently skips the rest of the trigger and the triggering statement
4. Maintaining Denormalized Data
-- Keep a cached count in the parent table
CREATE TRIGGER tr_line_items_after_insert_count
AFTER INSERT ON line_items
FOR EACH ROW
BEGIN
UPDATE orders
SET item_count = (SELECT count(*) FROM line_items WHERE order_id = NEW.order_id)
WHERE order_id = NEW.order_id;
END;
CREATE TRIGGER tr_line_items_after_delete_count
AFTER DELETE ON line_items
FOR EACH ROW
BEGIN
UPDATE orders
SET item_count = (SELECT count(*) FROM line_items WHERE order_id = OLD.order_id)
WHERE order_id = OLD.order_id;
END;
5. INSTEAD OF Triggers on Views
CREATE VIEW active_customers AS
SELECT customer_id, customer_name, email
FROM customers
WHERE is_active = 1;
CREATE TRIGGER tr_active_customers_instead_of_update
INSTEAD OF UPDATE ON active_customers
FOR EACH ROW
BEGIN
UPDATE customers
SET customer_name = NEW.customer_name,
email = NEW.email
WHERE customer_id = NEW.customer_id;
END;
-- Now you can UPDATE the view directly:
UPDATE active_customers SET email = 'new@example.com' WHERE customer_id = 42;
Performance Implications
Overhead sources:
- SQLite opens a statement journal for any statement that fires triggers, adding file I/O even for simple operations.
- Each trigger body is a separate program that gets compiled and executed per affected row.
- Triggers that perform additional writes (INSERT into audit table, UPDATE a counter) multiply the I/O.
- Using
PRAGMA temp_store = MEMORYreduces statement journal overhead by keeping it in memory.
Practical advice:
- Prefer
AFTERtriggers overBEFOREtriggers. BEFORE triggers have undefined behavior if they modify or delete the row being processed. - Keep trigger logic simple -- complex business logic belongs in application code where it can be versioned, tested, and debugged.
- Triggers are invisible at the SQL level. Document them heavily. Developers debugging slow INSERTs may not realize triggers are firing.
- Audit triggers on high-write tables can become a bottleneck. Consider batched/async logging for high-throughput scenarios.
- Test empirically -- one trigger on a moderate-volume table is usually fine; dozens of triggers on hot tables compound overhead.
Restrictions Within Trigger Bodies
- Table names must be unqualified (no
schema.tablesyntax) - Non-TEMP triggers can only reference tables in the same database
- TEMP triggers can access any attached database
- No
INSERT INTO table DEFAULT VALUES - No
INDEXED BY/NOT INDEXEDclauses - No
ORDER BY/LIMITclauses - No CTEs directly (but CTEs work inside subselects)
Sources
- CREATE TRIGGER -- official trigger docs, syntax, restrictions
- Creating Audit Tables with SQLite Triggers (Medium)
- SQLite Triggers (sql-easy.com)
- Measuring and Reducing CPU Usage in SQLite -- performance measurement
- SQLite Optimizations for Ultra High Performance (PowerSync)
9. WAL Mode and Journal Modes
Journal Mode Comparison
| Mode | Mechanism | Concurrent Reads | Write Speed | Durability |
|---|---|---|---|---|
| DELETE (default) | Rollback journal, deleted after txn | Blocked during writes | Slow (2x write) | Full |
| TRUNCATE | Rollback journal, truncated (not deleted) | Blocked during writes | Slightly faster than DELETE | Full |
| PERSIST | Rollback journal header zeroed | Blocked during writes | Slightly faster than DELETE | Full |
| WAL | Write-ahead log | Yes, concurrent with writes | Fast (1x write, sequential) | Full with synchronous=FULL |
| MEMORY | Journal in RAM only | Blocked during writes | Fast | None (crash = corruption) |
| OFF | No journal | Blocked during writes | Fastest | None (crash = corruption) |
WAL Mode Details
PRAGMA journal_mode = WAL;
How it works: Changes are appended to a separate WAL file instead of modifying the database directly. The original database file stays intact. A COMMIT is just appending a commit record to the WAL -- no fsync of the database file required.
Concurrency model:
- Unlimited simultaneous readers
- One writer at a time
- Readers do not block writers; writers do not block readers
- Each reader sees a consistent snapshot from when its transaction started
Performance advantages:
- Writes are sequential (append-only to WAL), not random I/O
- Fewer fsync() calls than rollback journal
- Per-transaction overhead drops from 30ms+ to <1ms (with synchronous=NORMAL)
Limitations:
- All processes must be on the same machine (shared memory requirement)
- Cannot change page_size after entering WAL mode
- Very large transactions (multi-GB) may perform worse than rollback mode
- Creates additional -wal and -shm files alongside the database
Checkpointing
Checkpointing transfers WAL content back to the main database file. Types:
PRAGMA wal_checkpoint(PASSIVE); -- Non-blocking, does what it can
PRAGMA wal_checkpoint(FULL); -- Blocks new writers until complete
PRAGMA wal_checkpoint(RESTART); -- Blocks writers, resets WAL to beginning
PRAGMA wal_checkpoint(TRUNCATE); -- Blocks writers, truncates WAL to zero bytes
Automatic checkpointing: By default, SQLite checkpoints when the WAL reaches 1000 pages.
-- Increase threshold for better write throughput (at cost of slower reads):
PRAGMA wal_autocheckpoint = 2000;
-- Disable automatic checkpointing (manual control only):
PRAGMA wal_autocheckpoint = 0;
WAL growth prevention: Three causes of unbounded WAL growth:
- Automatic checkpointing disabled without manual replacement
- Checkpoint starvation -- long-running readers prevent checkpoint from completing
- Very large write transactions that block checkpointing
-- Limit WAL file size on disk (bytes, reclaimed after checkpoint):
PRAGMA journal_size_limit = 6144000; -- 6MB
When to Use Each Mode
- WAL: Default choice for most applications. Use when you have concurrent readers, need good write performance, and all access is from the same machine.
- DELETE: Use for maximum compatibility, network file systems, or when WAL limitations apply.
- TRUNCATE: Marginal speed improvement over DELETE on some filesystems.
- OFF/MEMORY: Only for ephemeral/rebuildable data where crash safety does not matter.
Sources: SQLite WAL Documentation, Fly.io SQLite WAL Internals, SQLite Performance Tuning (phiresky)
10. Transaction Management
Implicit vs Explicit Transactions
Every SQL statement in SQLite runs inside a transaction. Without an explicit BEGIN, each statement gets its own implicit transaction with automatic COMMIT. This means individual INSERT statements each pay the full fsync cost.
-- Slow: each INSERT is its own transaction with fsync
INSERT INTO t VALUES (1); -- implicit BEGIN, COMMIT, fsync
INSERT INTO t VALUES (2); -- implicit BEGIN, COMMIT, fsync
INSERT INTO t VALUES (3); -- implicit BEGIN, COMMIT, fsync
-- Fast: one transaction, one fsync
BEGIN;
INSERT INTO t VALUES (1);
INSERT INTO t VALUES (2);
INSERT INTO t VALUES (3);
COMMIT; -- single fsync
Transaction Types
BEGIN; -- same as BEGIN DEFERRED
BEGIN DEFERRED; -- default: acquire locks lazily
BEGIN IMMEDIATE; -- acquire write lock immediately
BEGIN EXCLUSIVE; -- acquire exclusive lock immediately
DEFERRED (default):
- No lock acquired until first database access
- First SELECT acquires a read lock
- First write statement attempts to upgrade to write lock
- If upgrade fails (another writer active), returns SQLITE_BUSY immediately
- busy_timeout does NOT apply to lock upgrades in DEFERRED mode
IMMEDIATE:
- Acquires write lock at BEGIN time
- If another writer is active, waits up to busy_timeout, then SQLITE_BUSY
- Benchmarks show approximately 2x better performance than DEFERRED for write-heavy workloads
- Recommended for any transaction that will write
EXCLUSIVE:
- Same as IMMEDIATE in WAL mode
- In rollback journal mode, also blocks readers
- Use only in rollback mode when you need total isolation
Recommendation: Use BEGIN IMMEDIATE for any transaction that will write. It fails fast at BEGIN time instead of failing mid-transaction after work has been done.
Sources: SQLite Transaction Documentation, SQLite Transactions (reorchestrate)
Batch Insert Performance
Transaction wrapping is the single most impactful optimization for inserts.
Benchmarks (100M rows, Rust):
| Technique | Time | Notes |
|---|---|---|
| Naive single-row inserts (autocommit) | Minutes per million | Each row = separate transaction + fsync |
| Transaction-wrapped batches + prepared stmts | 34.3 seconds (100M rows) | Single connection |
| Threaded producer + single writer | 32.37 seconds (100M rows) | 4 worker threads, 1 writer |
| In-memory database | 29 seconds (100M rows) | ~2 seconds of disk I/O overhead |
Impact of transaction wrapping by language (100M rows):
| Language | Batched | Naive |
|---|---|---|
| Rust (prepared + batched) | 34 seconds | N/A |
| PyPy (batched) | 2.5 minutes | N/A |
| CPython (batched) | 8.5 minutes | 10 minutes |
Source: Fast SQLite Inserts (avi.im)
Impact by optimization technique:
| Technique | Impact |
|---|---|
| WAL + synchronous=NORMAL | Per-transaction overhead from 30ms+ to <1ms |
| Transaction wrapping | Write throughput 2-20x improvement |
| Prepared statements | Per-statement throughput up to 1.5x |
| Background WAL checkpoints | Eliminates occasional 30-100ms fsync spikes |
Source: PowerSync SQLite Optimizations
PRAGMA settings for bulk loading (maximum speed, reduced safety):
PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;
Safe bulk loading (maintains crash safety):
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -64000; -- 64MB
PRAGMA temp_store = MEMORY;
BEGIN IMMEDIATE;
-- ... batch of inserts (100-10000 rows per transaction) ...
COMMIT;
Optimal batch size: 100-1,000 rows per transaction for general use. For bulk loading, larger batches (10K-100K) are better.
Transaction Size Considerations
- Keep transactions as short as possible to minimize lock contention
- Very large transactions (multi-GB) can cause WAL growth and performance degradation
- In WAL mode, long-running read transactions prevent checkpointing, causing WAL bloat
- There is no hard limit on transaction size, but practical limits come from disk space for the WAL/journal
11. Query Optimization
Reading EXPLAIN QUERY PLAN
EXPLAIN QUERY PLAN
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'active'
ORDER BY o.created_date DESC;
What to look for:
SCAN= full table scan (often bad, check if index would help)SEARCH= index-assisted lookup (good)COVERING INDEX= all data from index, no table lookup (best)USE TEMP B-TREE FOR ORDER BY= sort step needed (index on ORDER BY columns could eliminate)AUTOMATIC INDEX= SQLite created a temporary index (permanent index would help)CORRELATED SCALAR SUBQUERY= runs for each outer row (rewrite as JOIN if possible)MULTI-INDEX OR= separate index lookups combined for OR conditionsCO-ROUTINE= subquery evaluated in parallel, yielding single rows on demandMATERIALIZE= subquery result stored in temporary table
Common Anti-Patterns
1. Correlated subqueries instead of JOINs:
-- BAD: subquery runs once per order row
SELECT o.id,
(SELECT name FROM customers WHERE id = o.customer_id) AS customer_name
FROM orders o;
-- GOOD: single join operation
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id;
2. UNION when UNION ALL suffices:
-- BAD: sorts and deduplicates (unnecessary when sets are disjoint)
SELECT id, name FROM active_users
UNION
SELECT id, name FROM archived_users;
-- GOOD: just concatenates result sets
SELECT id, name FROM active_users
UNION ALL
SELECT id, name FROM archived_users;
UNION requires sorting all rows and comparing for duplicates. UNION ALL simply appends. For large datasets, UNION ALL can be 60%+ faster. Use UNION only when you genuinely need deduplication.
3. Functions on indexed columns in WHERE:
-- BAD: cannot use index on created_date
WHERE date(created_date) = '2024-01-15'
-- GOOD: preserves index usage
WHERE created_date >= '2024-01-15' AND created_date < '2024-01-16'
-- ALTERNATIVE: create an expression index
CREATE INDEX idx_date ON orders(date(created_date));
4. OR conditions without supporting indexes:
-- Potentially slow: needs indexes on BOTH columns
WHERE status = 'active' OR priority > 5
-- SQLite handles this with MULTI-INDEX OR if both columns are indexed
-- Without indexes on both, falls back to full table scan
5. SELECT * when you only need specific columns:
-- BAD: fetches all columns, prevents covering index optimization
SELECT * FROM orders WHERE status = 'active';
-- GOOD: may use covering index
SELECT id, customer_id FROM orders WHERE status = 'active';
6. NOT IN with subqueries (NULL hazard):
-- DANGEROUS: if subquery returns any NULL, entire NOT IN is NULL (returns no rows)
SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM inactive_customers);
-- SAFE: NOT EXISTS handles NULLs correctly
SELECT * FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM inactive_customers ic WHERE ic.id = o.customer_id);
Query Planner Optimizations to Know
Automatic index creation: When no persistent index helps and the lookup will run more than log(N) times during a statement, SQLite creates a temporary index. Construction cost is O(N log N). Watch for AUTOMATIC INDEX in EXPLAIN QUERY PLAN -- it means a permanent index would help.
Subquery flattening: SQLite attempts to merge subqueries in the FROM clause into the outer query, enabling index usage on the underlying tables instead of scanning a temporary result.
Skip-scan: When the leftmost index column has few distinct values but a later column is constrained, SQLite can skip-scan the index. Requires ANALYZE to have been run (needs statistics showing 18+ duplicates in the leftmost column).
MIN/MAX optimization: SELECT MIN(col) FROM t or SELECT MAX(col) FROM t on the leftmost column of an index executes as a single index lookup, not a full scan.
Predicate push-down: WHERE conditions from outer queries are pushed into subqueries to reduce the size of intermediate results.
Constant propagation: WHERE a = b AND b = 5 implies a = 5, enabling SQLite to use an index on a.
OR-to-IN conversion: Multiple equality conditions on the same column separated by OR are rewritten as IN operators for index use: WHERE x=1 OR x=2 OR x=3 becomes WHERE x IN (1,2,3).
LIKE/GLOB optimization: When the pattern does not start with a wildcard and the column has BINARY collation, SQLite converts LIKE 'prefix%' to a range scan: col >= 'prefix' AND col < 'prefiy'.
Running ANALYZE
-- Collect statistics for all tables:
ANALYZE;
-- Collect for a specific table:
ANALYZE orders;
-- Limit analysis time (rows examined per index):
PRAGMA analysis_limit = 1000;
ANALYZE;
-- View collected statistics:
SELECT * FROM sqlite_stat1;
Statistics are stored in sqlite_stat1 (and optionally sqlite_stat4). The query planner uses these to estimate row counts and choose between competing index strategies. Without ANALYZE, the planner uses rough heuristics that may choose suboptimal plans.
Sources: SQLite EXPLAIN QUERY PLAN, SQLite Query Optimizer, Deep Dive into SQLite Query Optimizer
12. PRAGMA Settings for Production
Recommended Production Configuration
Run these on every new connection:
-- Use write-ahead logging for concurrency and write speed
PRAGMA journal_mode = WAL;
-- NORMAL is safe in WAL mode; only checkpoints need fsync
PRAGMA synchronous = NORMAL;
-- 64MB page cache (negative value = kilobytes)
PRAGMA cache_size = -64000;
-- Memory-mapped I/O: let OS manage page caching (256MB)
PRAGMA mmap_size = 268435456;
-- Keep temp tables and indexes in memory
PRAGMA temp_store = MEMORY;
-- Wait 5 seconds on lock contention before returning SQLITE_BUSY
PRAGMA busy_timeout = 5000;
-- Enforce foreign key constraints (off by default)
PRAGMA foreign_keys = ON;
PRAGMA Reference
journal_mode = WAL
- Enables concurrent readers + single writer
- Sequential writes, fewer fsync calls
- Persists across connections (set once, stored in database header)
- Cannot use on network file systems
synchronous = NORMAL
- Default is FULL (fsync every commit)
- NORMAL: only checkpoint fsyncs; safe in WAL mode against corruption
- Risk: committed transaction could roll back on power loss (not application crash)
- OFF: no fsync at all; corruption risk on any crash
cache_size = -64000
- Negative value = kilobytes; positive value = pages
- Default is -2000 (approximately 2MB)
- More cache = fewer disk reads, but may duplicate OS page cache
- Session-only; resets on each new connection
mmap_size = 268435456
- Enables memory-mapped I/O (fewer syscalls)
- Set to 0 to disable, or to expected database size
- On 64-bit systems, can set very large (e.g., 30GB) -- reserves virtual address space, not physical RAM
- Beneficial for read-heavy workloads
temp_store = MEMORY
- Temp tables, indexes, and intermediate results stored in RAM
- Faster than disk-based temp storage
- Value 2 = memory; value 1 = file; value 0 = compile-time default
busy_timeout = 5000
- Milliseconds to wait on lock contention before returning SQLITE_BUSY
- Without this, SQLITE_BUSY returns immediately
- Essential for any multi-connection setup
foreign_keys = ON
- Off by default (historical reasons)
- Negligible performance impact
- Must be set per-connection (not stored in database)
For New Databases (SQLite 3.37.0+)
CREATE TABLE items (
item_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
price REAL NOT NULL
) STRICT;
STRICT enforces column types at insert/update time, raising SQLITE_CONSTRAINT_DATATYPE on type mismatches.
Maintenance PRAGMAs
-- For long-lived connections, run on open:
PRAGMA optimize = 0x10002;
-- Run periodically (hourly for long-lived connections):
PRAGMA optimize;
-- Run before closing short-lived connections:
PRAGMA optimize;
PRAGMA wal_checkpoint(PASSIVE);
-- Limit ANALYZE time (set before PRAGMA optimize):
PRAGMA analysis_limit = 400;
-- Limit WAL file size on disk:
PRAGMA journal_size_limit = 6144000; -- 6MB
PRAGMA optimize collects statistics for tables where the query planner would have benefited from better data. The 0x10002 mask checks all tables and limits runtime. Run it:
- On connection open for long-lived apps (with
0x10002) - Before connection close for short-lived apps (plain
optimize) - After schema changes
auto_vacuum:
-- Must be set before creating any tables:
PRAGMA auto_vacuum = INCREMENTAL;
-- Then periodically:
PRAGMA incremental_vacuum;
- NONE (default): unused pages stay allocated; requires manual VACUUM
- FULL: automatic but can worsen fragmentation
- INCREMENTAL: you control when space is reclaimed
VACUUM rewrites the entire database. Avoid for databases over 100MB due to the time and temporary disk space required.
Configuration Summary Table
| PRAGMA | Production Value | Default | Impact |
|---|---|---|---|
| journal_mode | WAL | DELETE | Concurrency, write speed |
| synchronous | NORMAL | FULL | Write speed (2-50x) |
| cache_size | -64000 | -2000 | Read speed for large DBs |
| mmap_size | 268435456 | 0 | Read speed, fewer syscalls |
| temp_store | MEMORY | DEFAULT | Temp operation speed |
| busy_timeout | 5000 | 0 | Prevents immediate BUSY errors |
| foreign_keys | ON | OFF | Data integrity |
| wal_autocheckpoint | 1000 | 1000 | WAL size vs checkpoint frequency |
| journal_size_limit | 6144000 | -1 (unlimited) | Disk space control |
| analysis_limit | 400 | 0 (unlimited) | ANALYZE/optimize runtime |
Quick Reference: Connection Setup Template
-- Run on every new connection:
PRAGMA journal_mode = WAL; -- persists in DB, but safe to re-set
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -64000;
PRAGMA mmap_size = 268435456;
PRAGMA temp_store = MEMORY;
PRAGMA busy_timeout = 5000;
PRAGMA foreign_keys = ON;
PRAGMA optimize = 0x10002; -- for long-lived connections
-- Run periodically (hourly for long-lived connections):
PRAGMA optimize;
-- Run before closing short-lived connections:
PRAGMA optimize;
PRAGMA wal_checkpoint(PASSIVE);
Sources: SQLite PRAGMA Documentation, SQLite PRAGMA Cheatsheet, SQLite Performance Tuning (phiresky), High Performance SQLite Recommended PRAGMAs, SQLite Optimizations For Ultra High-Performance, Write-Ahead Logging
Part III: Operations and Maintenance
13. Migration and Versioning
Schema Versioning with PRAGMA user_version
Use SQLite's built-in PRAGMA user_version integer to track schema version. It is simpler and more efficient than maintaining a version table -- the integer is available immediately without searching the database file.
-- Read current version
PRAGMA user_version;
-- Set version after migration
PRAGMA user_version = 3;
Migration file structure:
migrations/
0001_initial_schema.sql
0002_add_indexes.sql
0003_add_fts.sql
Each migration file ends with PRAGMA user_version = N;
Python implementation:
current = db.execute('PRAGMA user_version').fetchone()[0]
for migration_file in sorted(migration_files):
version = int(migration_file.split('_')[0])
if version > current:
db.executescript(open(migration_file).read())
Shell script implementation:
current_version=$(sqlite3 "$DB" "PRAGMA user_version;")
for migration in migrations/*.sql; do
version=$(basename "$migration" | cut -d_ -f1 | sed 's/^0*//')
if [ "$version" -gt "$current_version" ]; then
sqlite3 "$DB" < "$migration"
fi
done
Best practices:
- Keep migration SQL scripts in version control
- Wrap each migration in a transaction (BEGIN/COMMIT)
- Design scripts to be safely re-runnable (idempotent where possible)
- Number migrations sequentially to ensure ordering
- Include both the DDL changes and the
PRAGMA user_version = Nin each file
Sources:
ALTER TABLE Limitations
SQLite's ALTER TABLE is severely limited. It supports:
ALTER TABLE x RENAME TO yALTER TABLE x ADD COLUMN y(column must have a default value or allow NULL)ALTER TABLE x RENAME COLUMN old TO new(SQLite 3.25.0+)ALTER TABLE x DROP COLUMN y(SQLite 3.35.0+)
It does not support changing column types, adding/removing constraints, changing default values, or reordering columns.
The 12-step migration procedure for structural changes:
-- Example: changing a column type and adding a constraint
BEGIN TRANSACTION;
PRAGMA foreign_keys = OFF;
CREATE TABLE items_new (
item_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
price REAL NOT NULL CHECK (price >= 0) -- was TEXT, now REAL with constraint
);
INSERT INTO items_new (item_id, name, price)
SELECT item_id, name, CAST(price AS REAL) FROM items;
DROP TABLE items;
ALTER TABLE items_new RENAME TO items;
PRAGMA foreign_key_check; -- verify no broken references
PRAGMA foreign_keys = ON;
COMMIT;
Critical: The sequence (create new, copy, drop old, rename new) is important to avoid breaking foreign key references.
Declarative migration approach: Compare the actual database against a "pristine" in-memory database created from the schema definition. The migrator queries sqlite_schema to identify differences and applies changes automatically. Works well for adding new tables, modifying indexes, and adding nullable columns. Requires manual SQL for data migrations.
Sources:
14. Backup and Recovery
Backup Methods
.backup Command (built-in, recommended default)
sqlite3 mydb.db ".backup backup.db"
Creates a page-by-page replica. Other connections can write during the backup, but those changes will not appear in the backup.
VACUUM INTO (backup + optimize)
VACUUM INTO '/path/to/backup.db';
Creates a vacuumed (compacted) copy. More CPU-intensive than .backup but produces a smaller, defragmented file.
Online Backup API (programmatic, incremental)
The C API copies pages incrementally without locking the source for the entire duration:
sqlite3_backup_init()-- creates backup objectsqlite3_backup_step(N)-- copies N pages per iterationsqlite3_backup_finish()-- releases resources
The source is read-locked only while pages are being read. Progress monitored with sqlite3_backup_remaining() and sqlite3_backup_pagecount().
Litestream (continuous replication to S3)
Streams WAL changes to S3-compatible storage. Provides point-in-time recovery. Requires additional software.
Copy-on-Write (filesystem-level)
On Btrfs/XFS, cp --reflink=always within a deferred transaction creates near-instant backups (~2ms for 440MB+).
Comparison
| Method | Durability | Space | Restore Speed | Complexity |
|---|---|---|---|---|
.backup |
High | Moderate | Very Fast | Low |
VACUUM INTO |
High | Small (compacted) | Very Fast | Low |
| Online Backup API | High | Moderate | Very Fast | Medium |
| Litestream | Very High | Low (incremental) | Moderate | Medium |
CoW cp |
High | Very Low | Very Fast | Low |
.dump (SQL) |
High | Large | Slow | Low |
Critical WAL Consideration
When restoring a backup, always delete any existing *-wal and *-shm files at the destination before copying. A stale/mismatched WAL file can corrupt the restored database.
Corruption Prevention
SQLite is highly resistant to corruption -- partial transactions from crashes are automatically rolled back on next access. Corruption can occur from:
- Rogue process overwrites -- other processes writing directly to the database file
- Broken file locking -- especially on network filesystems (NFS, CIFS). SQLite MUST NOT be used on network storage.
- Sync failures -- disk drives reporting writes complete before reaching persistent media. Use
PRAGMA synchronous = FULL(orNORMALwith WAL mode). - Deleting journal files -- removing
*-journalor*-walfiles prevents crash recovery - Memory corruption -- stray pointers, especially with memory-mapped I/O
- Mismatched database + journal -- renaming or moving the database without its journal
Integrity checking:
PRAGMA integrity_check; -- thorough check (slow on large DBs)
PRAGMA quick_check; -- faster, less thorough
Recovery from corruption:
-- SQLite 3.29.0+
.recover
-- Or manually:
.mode insert
.output recovery.sql
.dump
.output stdout
Configuration rules to prevent corruption:
PRAGMA synchronous = OFFMUST NOT be usedPRAGMA journal_mode = OFForMEMORYMUST NOT be usedPRAGMA schema_versionMUST NOT be modified with active connections- Use
PRAGMA writable_schema = ONwith extreme caution
Sources:
- SQLite Backup API
- How To Corrupt An SQLite Database File
- Recovering Data From A Corrupt SQLite Database
- Backup strategies for SQLite in production
15. Date and Time Handling
Storage Format Options
SQLite has no native datetime type. Three storage approaches:
TEXT -- ISO-8601 strings (recommended default)
CREATE TABLE events (
event_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- Stores: '2026-04-03 14:30:00'
Pros: Human-readable, built-in function support (datetime(), date(), time()), supports timezone info, millisecond precision.
Cons: 20-27 bytes per timestamp (vs 8 for INTEGER), string comparison slightly slower.
INTEGER -- Unix timestamps
CREATE TABLE events (
event_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
created_at INTEGER NOT NULL DEFAULT (unixepoch('now'))
);
-- Stores: 1775403000
Pros: 8 bytes, fastest comparisons, simplest arithmetic, timezone-neutral (always UTC), efficient range queries.
Cons: Not human-readable, requires conversion for display. Caution: Timestamps from the first 63 days of 1970 may be misinterpreted as Julian day numbers by the auto modifier.
REAL -- Julian day numbers
SELECT julianday('now'); -- 2460737.10417
Pros: Most precise for day-based calculations. Cons: Rarely used, unfamiliar to most developers.
Core Date Functions
-- Current UTC timestamp
SELECT datetime('now'); -- '2026-04-03 14:30:00'
SELECT unixepoch('now'); -- 1775403000
-- Convert between formats
SELECT datetime(1775403000, 'unixepoch'); -- INTEGER to TEXT
SELECT strftime('%s', '2026-04-03 14:30:00'); -- TEXT to INTEGER
-- Date arithmetic
SELECT datetime('now', '+7 days'); -- 7 days from now
SELECT datetime('now', '-1 month'); -- 1 month ago
SELECT date('now', 'start of month', '+1 month', '-1 day'); -- end of current month
-- Day-based calculations
SELECT julianday('now') - julianday('2026-01-01') AS days_elapsed;
-- Validation (returns NULL for invalid dates)
SELECT datetime('2026-13-45'); -- NULL
Critical Best Practices
Always store UTC. Converting to local time is a display concern:
-- Store in UTC
INSERT INTO events (name, created_at) VALUES ('test', datetime('now'));
-- Display in local time
SELECT datetime(created_at, 'localtime') AS local_time FROM events;
Never mix formats in the same column. Pick TEXT or INTEGER and use it consistently. Mixed formats break comparisons and indexing.
Be consistent with precision. If you store milliseconds in some rows and seconds in others, comparisons break.
Month arithmetic can surprise:
SELECT date('2026-01-31', '+1 month'); -- result may vary
Choosing a Format
| Criterion | TEXT (ISO-8601) | INTEGER (Unix) |
|---|---|---|
| Human readability | Excellent | Poor |
| Storage size | 20-27 bytes | 8 bytes |
| Comparison speed | Good | Best |
| Date arithmetic | Via functions | Simple addition |
| Range queries | Good | Best |
| Timezone clarity | Can include offset | Always UTC |
| Function support | Native | Requires 'unixepoch' modifier |
For most applications: TEXT ISO-8601 is the safer default -- easier to debug and works naturally with SQLite's date functions.
For high-volume logging or time-series data: INTEGER unix timestamps for storage efficiency and comparison speed.
Sources:
16. Blob and Large Data
The 100KB Threshold
SQLite's own benchmarks established a clear guideline:
- Under 100KB: Reading BLOBs from the database is faster than from separate files. At 10KB, SQLite is 35% faster than filesystem I/O and uses 20% less disk space.
- Over 100KB: Reading from separate files is faster. The crossover varies by hardware (between 100KB and 1MB depending on page size).
When to Store BLOBs in the Database
Do store internally:
- Small files under 100KB (thumbnails, icons, small config files)
- When ACID guarantees on the data matter
- When atomic updates of both metadata and content are needed
Store externally (file path in DB):
- Files over 100KB (videos, PDFs, large images)
- When files are served directly via a web server
- When files are accessed independently of their metadata
Hybrid approach: Store small BLOBs internally, large BLOBs externally with the path in the database.
Page Size Optimization
-- Set before creating the database (cannot change after)
PRAGMA page_size = 8192; -- or 16384 for large BLOB I/O
A page size of 8192 or 16384 gives the best performance for large BLOB I/O. The default 4096 is fine for non-BLOB workloads.
Maximum BLOB Size
- Default maximum: 1,000,000,000 bytes (1 GB)
- Absolute maximum: 2,147,483,647 bytes (~2 GB)
- Configurable via
SQLITE_MAX_LENGTHcompile-time option
ZEROBLOB for Incremental Writing
zeroblob() allocates space filled with zeros, then you overwrite in chunks via the blob I/O API:
INSERT INTO files (name, content) VALUES ('large.bin', zeroblob(1048576));
-- Then use sqlite3_blob_open() / sqlite3_blob_write() to write in chunks
This avoids loading the entire BLOB into memory at once.
ACID vs Performance Trade-off
Storing files in the database gives you ACID properties (atomic updates, crash recovery). External files lose ACID but gain direct filesystem access, CDN compatibility, no database size bloat, and independent backup.
Sources:
17. Security
SQL Injection Prevention
The rule: Use prepared statements with bind parameters. Do not try to play games attempting to outthink your attacker. Prepared statements separate SQL code from data -- the database engine treats bound parameters as data, never as executable code.
In Python:
# DANGEROUS -- string concatenation
cursor.execute(f"SELECT * FROM users WHERE name = '{user_input}'")
# SAFE -- parameterized query
cursor.execute("SELECT * FROM users WHERE name = ?", (user_input,))
In shell scripts: The sqlite3 CLI does not natively support parameterized queries. This is the biggest security risk for shell script database access.
Option 1 -- .parameter bind (sqlite3 3.38.0+):
sqlite3 "$DB" <<EOF
.parameter set :name '$sanitized_name'
SELECT * FROM users WHERE name = :name;
EOF
Option 2 -- Validate and escape in the shell script:
# Minimal escaping: double all single quotes
safe_input="${user_input//\'/\'\'}"
sqlite3 "$DB" "SELECT * FROM users WHERE name = '${safe_input}'"
This is inferior to prepared statements but may be the only option in pure shell.
Option 3 -- Delegate to a helper program:
Write a small Python/Go/Rust wrapper that accepts arguments and uses proper parameterized queries. The shell script calls the wrapper. This is the most secure approach for shell-based architectures.
Option 4 -- sqlite3_mprintf format specifiers (%q, %Q, %w):
In C code, %q doubles single quotes, %Q wraps in quotes and handles NULL, %w is for identifiers.
Database File Permissions
SQLite delegates all access control to the operating system. It does not implement GRANT/REVOKE or user authentication.
# Database file: owner read/write only
chmod 600 mydb.db
# Set umask before creating databases
umask 077
sqlite3 newdb.db "CREATE TABLE test (id INTEGER PRIMARY KEY);"
Important considerations:
- The database file, WAL file, and journal file must all have consistent permissions
- The directory containing the database must be writable (SQLite creates temporary files)
- On WAL mode, read-only connections still need write permission to the WAL and SHM files
- Never make database files world-readable if they contain sensitive data
Additional Security Measures
- Principle of least privilege: Run database-accessing processes under a restricted user account
- Encryption at rest: SQLite has no built-in encryption. Use SQLCipher or SEE (SQLite Encryption Extension) for encrypted databases
- Input validation: Beyond SQL injection, validate that inputs conform to expected formats before they reach the query layer
- LIKE clause escaping: User input in LIKE patterns needs the ESCAPE clause:
SELECT * FROM items WHERE name LIKE '%' || :search || '%' ESCAPE '\';
Sources:
- SQLite Forum: Characters to escape to prevent SQL Injection
- OWASP SQL Injection Prevention Cheat Sheet
- Basic Security Practices for SQLite
18. Testing with SQLite
In-Memory Databases
The foundation of SQLite testing is :memory:, which creates a RAM-only database destroyed when the connection closes.
import sqlite3
conn = sqlite3.connect(':memory:')
conn.executescript(open('schema.sql').read())
# ... run tests ...
conn.close() # database destroyed
Key advantages: Speed (no disk I/O), isolation (each connection independent), simplicity (no files to manage), reproducibility (known blank state).
Test Isolation Strategies
Strategy 1 -- Fresh Database Per Test (strongest isolation)
import pytest, sqlite3
@pytest.fixture
def db():
conn = sqlite3.connect(':memory:')
conn.executescript(open('schema.sql').read())
yield conn
conn.close()
def test_insert_user(db):
db.execute("INSERT INTO users (name) VALUES (?)", ("Alice",))
assert db.execute("SELECT COUNT(*) FROM users").fetchone()[0] == 1
def test_empty_users(db):
# Guaranteed empty -- no cross-test contamination
assert db.execute("SELECT COUNT(*) FROM users").fetchone()[0] == 0
Pros: Perfect isolation, simple to reason about. Cons: Schema setup cost per test (usually negligible).
Strategy 2 -- Transaction Rollback (faster for large schemas)
@pytest.fixture
def db(shared_db):
shared_db.execute("BEGIN")
yield shared_db
shared_db.execute("ROLLBACK")
Schema created once. Tests must not COMMIT; nested transactions need SAVEPOINTs.
Strategy 3 -- Template Database with Backup API
@pytest.fixture(scope='session')
def template_db():
conn = sqlite3.connect(':memory:')
conn.executescript(open('schema.sql').read())
conn.executescript(open('test_seeds.sql').read())
return conn
@pytest.fixture
def db(template_db):
conn = sqlite3.connect(':memory:')
template_db.backup(conn)
return conn
Combines pre-populated data with per-test isolation using SQLite's backup API.
Migration Testing
def test_migrations_apply_cleanly():
conn = sqlite3.connect(':memory:')
for migration_file in sorted(glob.glob('migrations/*.sql')):
conn.executescript(open(migration_file).read())
tables = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table'"
).fetchall()
assert ('users',) in tables
def test_migration_idempotency():
conn = sqlite3.connect(':memory:')
for _ in range(2):
for migration_file in sorted(glob.glob('migrations/*.sql')):
conn.executescript(open(migration_file).read())
Cross-Database Compatibility Caveats
When SQLite substitutes for a production database in tests:
| Behavior | SQLite | PostgreSQL | MySQL |
|---|---|---|---|
| String comparison | Case-sensitive | Case-sensitive (default) | Case-insensitive |
| Type enforcement | Flexible (unless STRICT) | Strict | Strict |
| Boolean type | INTEGER 0/1 | Native BOOLEAN | TINYINT |
| LIKE | Case-sensitive for ASCII | Case-insensitive (ILIKE) | Case-insensitive |
Test against the actual production database for integration tests. Use SQLite only for unit tests where dialect differences do not matter.
Test Performance Tips
- Use function-scoped fixtures (one database per test) for isolation
- Use module/session-scoped fixtures for performance if tests are read-only
- Use
PRAGMA journal_mode = OFFandPRAGMA synchronous = OFFin test databases for maximum speed (safe because test data is disposable) - Pre-populate fixtures with representative data rather than building it per-test
Sources:
- How to Use SQLite in Testing
- How SQLite Is Tested
- How to test SQLite in-memory databases using pytest
19. Common Anti-Patterns
Storing CSV/Lists in Columns
-- ANTI-PATTERN:
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
tags TEXT -- 'python,sqlite,database'
);
Cannot index individual values, cannot join, cannot enforce referential integrity, requires LIKE '%tag%' (full table scan with false positives).
Fix -- junction table:
CREATE TABLE tags (
tag_id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE post_tags (
post_id INTEGER REFERENCES posts(post_id),
tag_id INTEGER REFERENCES tags(tag_id),
PRIMARY KEY (post_id, tag_id)
);
Or for simpler cases, use JSON:
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
tags TEXT DEFAULT '[]' -- JSON array
);
SELECT * FROM posts, json_each(posts.tags)
WHERE json_each.value = 'python';
Not Using Transactions for Batch Operations
-- ANTI-PATTERN: each statement is an implicit transaction with fsync
INSERT INTO data VALUES (1, 'a');
INSERT INTO data VALUES (2, 'b');
-- ... 10,000 more
-- FIX: explicit transaction (2-20x faster)
BEGIN TRANSACTION;
INSERT INTO data VALUES (1, 'a');
INSERT INTO data VALUES (2, 'b');
-- ... 10,000 more
COMMIT;
The filesystem sync happens once at COMMIT instead of per-statement. Even read performance improves (fewer lock operations).
Ignoring EXPLAIN QUERY PLAN
-- Always check query execution:
EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = 'test@example.com';
-- SCAN users <-- full table scan! needs an index
-- SEARCH users USING INDEX idx_email (email=?) <-- good
Adopt a repeatable workflow: capture the query, EXPLAIN it, test with the fix, deploy.
Using SQLite as a Message Queue
SQLite allows only one writer at a time. Queue patterns require frequent writes (enqueue) and deletes (dequeue) from competing processes, creating lock contention. Use a purpose-built queue instead.
Exception: A single-process queue (one producer, one consumer in the same application) works fine because there is no write contention.
SELECT * in Production
Returns unnecessary columns, prevents index-only scans, breaks when schema changes, wastes memory. List only the columns you need.
Functions on Indexed Columns
-- ANTI-PATTERN: cannot use index
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- FIX: expression index (SQLite 3.9.0+)
CREATE INDEX ix_users_email_lower ON users(LOWER(email));
N+1 Query Pattern
-- ANTI-PATTERN:
SELECT customer_id FROM customers;
-- Then for each: SELECT COUNT(*) FROM orders WHERE customer_id = ?;
-- FIX:
SELECT c.customer_id, c.name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name;
SQLite caveat: Because SQLite is embedded (no network hop), the N+1 penalty is smaller than with client/server databases. Sometimes N+1 with simple indexed lookups is actually faster than a complex join. Benchmark both.
Long Transactions
In SQLite, an open write transaction blocks all other writers. A transaction held for 30 seconds blocks all writes for 30 seconds. Keep transactions short. Use PRAGMA busy_timeout for transient contention:
PRAGMA busy_timeout = 5000; -- wait up to 5 seconds for locks
SQLite vs Other Databases
What SQLite Excels At
SQLite "competes with fopen()" -- it is an embedded database engine, not a client/server system.
Zero administration: No server to install, configure, monitor, or restart. The database is a single file.
Zero latency: No network round-trip. Queries execute in-process.
Ideal use cases:
- Application file format: Desktop apps, mobile apps, Electron apps
- Embedded/IoT devices: Cellphones, cameras, drones, medical devices
- Websites with < 100K hits/day
- Data analysis: Import CSV/JSON, query with SQL, share the single file
- Local caching and temporary databases
- Data transfer format: Richer than CSV, simpler than a server
Where SQLite Falls Short
- Write concurrency: Unlimited readers but only one writer at a time
- Network access: Cannot connect from a different machine
- Very large datasets: Performance degrades above a few GB without careful tuning
- Advanced SQL features: No RIGHT JOIN, no FULL OUTER JOIN, limited ALTER TABLE, no stored procedures
- Enterprise features: No replication, no role-based access control, no audit logging
Decision Checklist
| Question | If Yes |
|---|---|
| Data on a separate server from the app? | Use client/server |
| Many concurrent writers needed? | Use client/server |
| Data exceeds a few GB? | Consider client/server |
| Multiple application servers sharing data? | Use client/server |
| Need user authentication at DB level? | Use client/server |
| Otherwise? | SQLite is likely the best choice |
Modern Performance Reality (2025-2026)
Recent benchmarks challenge traditional assumptions:
- A Rails app with 10 Puma workers achieved 2,730 write requests/second using SQLite
- This supports ~1 million daily active users performing 35 writes/day each
- Key insight: match database connections to application workers, not excessive connection pools
- WAL mode with
busy_timeouteliminates most "database locked" errors
Migration Away from SQLite
When migrating to PostgreSQL/MySQL:
- Tools like
sqlite3-to-mysqlhandle encoding, type conversion, and bulk transfer - SQLite's flexible typing may have allowed data that stricter databases reject -- validate data types first
- Date formats may need conversion (SQLite TEXT dates to native TIMESTAMP types)
- NULL handling differences may surface
Sources:
- SQL Anti-Patterns and How to Fix Them
- SQLite Optimizations For Ultra High-Performance
- Database Design Patterns Every Developer Should Know
- Appropriate Uses For SQLite
- Why you should probably be using SQLite
- SQLite vs MySQL vs PostgreSQL
Part IV: Device-to-Server Sync
20. Schema Design for Sync
20.1 UUID vs Integer Primary Keys
Why UUIDs are necessary for offline-first sync:
Auto-incrementing integers are generated locally by each device. When two devices create records offline, their local databases produce identical IDs. On sync, these duplicates cause conflicts, data corruption, or sync failures. UUIDs solve this by enabling client-side ID generation without server coordination.
SQLite-specific advantage: Unlike MySQL or PostgreSQL where random UUIDs fragment the clustered B-tree index, SQLite uses the internal rowid as its clustered index. A TEXT UUID primary key creates a separate B-tree index, so UUID randomness does not cause the same page-split fragmentation problems.
Recommended: UUIDv7 (time-ordered)
UUIDv7 includes a timestamp prefix, making IDs roughly ordered by creation time. This preserves the insert-order performance benefits of integers while maintaining global uniqueness. UUIDv7 is supported natively in PostgreSQL 17+ via uuid_generate_v7().
-- SQLite: sync-ready table with TEXT UUID primary key
CREATE TABLE tasks (
id TEXT PRIMARY KEY NOT NULL, -- UUIDv7, generated client-side
title TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TEXT NOT NULL, -- ISO-8601 UTC
updated_at TEXT NOT NULL, -- ISO-8601 UTC
version INTEGER NOT NULL DEFAULT 1,
is_deleted INTEGER NOT NULL DEFAULT 0,
last_synced_at TEXT -- NULL until first sync
);
-- PostgreSQL: corresponding server table
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
version INTEGER NOT NULL DEFAULT 1,
is_deleted BOOLEAN NOT NULL DEFAULT FALSE,
last_synced_at TIMESTAMPTZ
);
Pitfalls to avoid:
- Never use
INTEGER PRIMARY KEY AUTOINCREMENTfor synced tables -- IDs will collide across devices - Never use
datetime('now')as an ID substitute -- insufficient precision for uniqueness - If using the sqlite-sync extension, use
cloudsync_uuid()which generates UUIDv7 natively
Sources:
- UUID vs Auto Increment for Primary Keys (Bytebase)
- Primary Key Data Types (High Performance SQLite)
- Android Room UUID Primary Key (CodeStudy)
20.2 Timestamps for Change Tracking
Every synced table needs timestamps to detect what changed since the last sync.
Required columns:
| Column | Purpose | Format |
|---|---|---|
created_at |
Record creation time | ISO-8601 UTC |
updated_at |
Last modification time | ISO-8601 UTC |
last_synced_at |
Last successful server sync | ISO-8601 UTC, NULL until synced |
Detecting unsynced changes:
-- Find all records that changed since last sync
SELECT * FROM tasks
WHERE last_synced_at IS NULL
OR updated_at > last_synced_at;
Automatic timestamp updates via triggers:
CREATE TRIGGER tasks_update_timestamp
AFTER UPDATE ON tasks
FOR EACH ROW
WHEN NEW.updated_at = OLD.updated_at -- prevent infinite recursion
BEGIN
UPDATE tasks SET updated_at = strftime('%Y-%m-%dT%H:%M:%fZ', 'now')
WHERE id = NEW.id;
END;
Physical clocks vs logical clocks:
Physical timestamps (datetime('now')) are simple but vulnerable to clock skew between devices. For systems where ordering correctness matters, consider:
- Lamport timestamps: Simple incrementing counter. Guarantees
e happened before f => L(e) < L(f), but the converse is not true. Low overhead but no concurrency detection. - Vector clocks: Array of counters, one per device. Can distinguish "happened-before" from "concurrent" events. Space grows with O(n) where n = number of devices. Impractical for many-device scenarios.
- Hybrid Logical Clocks (HLC): Combine physical wall-clock time with a logical counter in a single 64-bit value. Remain close to wall-clock time while guaranteeing causal ordering. Strictly monotonic per-node. Self-stabilizing against NTP corrections. Recommended for most sync systems.
HLC timestamp = [48-bit physical time] + [16-bit logical counter]
Sources:
- Handling Timestamps in SQLite (sqlite.ai)
- Hybrid Logical Clocks (Sergei Turukin)
- Vector Clocks (Wikipedia)
- Evolving Clock Sync in Distributed Databases (YugabyteDB)
20.3 Soft Deletes vs Hard Deletes (Tombstone Patterns)
Hard deleting a record on one device makes it impossible to propagate that deletion to other devices or the server -- there is no record left to sync. Sync systems require soft deletes.
Simple soft delete:
-- Mark as deleted instead of DELETE FROM
UPDATE tasks
SET is_deleted = 1,
updated_at = strftime('%Y-%m-%dT%H:%M:%fZ', 'now')
WHERE id = ?;
-- All normal queries exclude deleted records
SELECT * FROM tasks WHERE is_deleted = 0;
-- Sync queries include deleted records
SELECT * FROM tasks
WHERE last_synced_at IS NULL OR updated_at > last_synced_at;
Index for filtering deleted records:
CREATE INDEX idx_tasks_is_deleted ON tasks(is_deleted);
Tombstone table pattern (alternative):
Instead of a flag column, move deleted records to a separate tombstone table. This keeps the live table clean and fast while preserving deletion history for sync.
CREATE TABLE tasks_tombstones (
id TEXT PRIMARY KEY NOT NULL,
table_name TEXT NOT NULL,
deleted_at TEXT NOT NULL,
synced INTEGER NOT NULL DEFAULT 0
);
-- Trigger to capture deletes
CREATE TRIGGER tasks_soft_delete
BEFORE DELETE ON tasks
BEGIN
INSERT INTO tasks_tombstones (id, table_name, deleted_at)
VALUES (OLD.id, 'tasks', strftime('%Y-%m-%dT%H:%M:%fZ', 'now'));
END;
Purging tombstones:
Tombstones accumulate forever unless pruned. Only purge after confirming all devices have synced past the deletion:
-- Purge tombstones older than 90 days that have been synced
DELETE FROM tasks_tombstones
WHERE synced = 1
AND deleted_at < strftime('%Y-%m-%dT%H:%M:%fZ', 'now', '-90 days');
Pitfalls:
- Never hard-delete records that other devices might still reference
- Always include
is_deletedin indexes used by sync queries - Plan a tombstone purge strategy from the start -- unbounded tombstones degrade performance
Sources:
20.4 Schema Compatibility Between SQLite and Server DB
SQLite's dynamic type system differs fundamentally from PostgreSQL/MySQL's rigid type system. Design schemas for compatibility.
Key differences:
| Concept | SQLite | PostgreSQL |
|---|---|---|
| Type enforcement | Column affinity (recommended, not enforced) | Strict type enforcement |
| Boolean | INTEGER (0/1) | BOOLEAN (true/false) |
| Date/Time | TEXT (ISO-8601), INTEGER (unix), or REAL (julian) | TIMESTAMP, TIMESTAMPTZ, DATE, TIME |
| UUID | TEXT | UUID (native type) |
| JSON | TEXT with json1 functions | JSONB (binary, indexed) |
| BLOB | BLOB | BYTEA |
| Auto-increment | INTEGER PRIMARY KEY (alias for rowid) | SERIAL / GENERATED ALWAYS AS IDENTITY |
Compatibility rules for sync schemas:
- Store booleans as INTEGER 0/1 in SQLite; map to BOOLEAN on server
- Store dates as ISO-8601 TEXT in SQLite; map to TIMESTAMPTZ on server
- Store UUIDs as TEXT in SQLite; map to UUID type on server
- Store JSON as TEXT in SQLite; map to JSONB on server
- Always use UTC for all timestamps on both sides
- Define column constraints identically on both sides (NOT NULL, DEFAULT, CHECK)
Sources:
20.5 Version Columns for Optimistic Concurrency
A version column enables detecting concurrent modifications without timestamps.
How it works:
- Client reads record with
version = 3 - Client modifies record locally, increments to
version = 4 - Client sends update to server:
UPDATE tasks SET ... WHERE id = ? AND version = 3 - If another client already incremented to
version = 4, the WHERE clause matches zero rows - Server responds with conflict; client must reconcile
-- SQLite: version-based update
UPDATE tasks
SET title = ?,
status = ?,
version = version + 1,
updated_at = strftime('%Y-%m-%dT%H:%M:%fZ', 'now')
WHERE id = ? AND version = ?;
-- Check changes() == 1; if 0, conflict occurred
SQLite-specific rowversion simulation:
SQLite lacks a native auto-updating rowversion type. Simulate with a trigger:
-- Randomized version token (alternative to incrementing integer)
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
row_version BLOB NOT NULL DEFAULT (randomblob(8)),
-- ... other columns
);
CREATE TRIGGER tasks_rowversion
AFTER UPDATE ON tasks
BEGIN
UPDATE tasks SET row_version = randomblob(8) WHERE id = NEW.id;
END;
Best practice: Use both version (integer) and updated_at (timestamp). The version integer is authoritative for conflict detection; the timestamp is useful for debugging and human-readable ordering.
Sources:
- Optimistic Concurrency (ServiceStack)
- Handling Concurrency Conflicts (EF Core)
- Entity Framework Core: SQLite Concurrency Checks
21. Conflict Resolution
21.1 Last-Write-Wins (LWW)
The simplest strategy: the most recent modification (by timestamp or version) overwrites older versions.
Implementation:
-- Server-side: accept the newer version
INSERT INTO tasks (id, title, status, updated_at, version)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT (id) DO UPDATE SET
title = EXCLUDED.title,
status = EXCLUDED.status,
updated_at = EXCLUDED.updated_at,
version = EXCLUDED.version
WHERE EXCLUDED.updated_at > tasks.updated_at;
When to use: Low-conflict domains (analytics, logging, single-user-per-record apps), settings/preferences, any case where the latest value is "correct enough."
When NOT to use: Collaborative editing, inventory management, financial records, or anywhere losing a concurrent edit is unacceptable.
Pitfalls:
- Clock skew between devices can cause the "wrong" write to win
- Silent data loss -- the user whose write is overwritten gets no notification
- Use HLC timestamps instead of wall-clock time to mitigate ordering issues
21.2 Server-Wins vs Client-Wins Policies
Server-wins: The server's current value always takes precedence. Client changes are silently discarded on conflict.
# Server-wins: ignore client version if server is newer
if server_record.version >= client_record.version:
return server_record # client change discarded
else:
apply(client_record)
Client-wins: The client's value always overwrites the server. Equivalent to LWW where client timestamp always "wins."
# Client-wins: always accept client change
apply(client_record)
When to use:
- Server-wins: settings pushed from admin, read-only sync (server is source of truth)
- Client-wins: draft documents, personal notes, any data "owned" by one user
Turso's conflict strategies demonstrate the spectrum:
FAIL_ON_CONFLICT-- reject sync, require explicit handlingDISCARD_LOCAL-- server-winsREBASE_LOCAL-- replay local changes on top of server state (like git rebase)MANUAL_RESOLUTION-- callback with both versions for custom logic
21.3 Field-Level Merge
Instead of replacing entire rows, merge individual columns. If Device A changes title and Device B changes status, both changes are preserved.
Implementation:
def field_level_merge(client_record, server_record, base_record):
"""Merge at column level using three-way comparison."""
merged = {}
for field in all_fields:
client_changed = client_record[field] != base_record[field]
server_changed = server_record[field] != base_record[field]
if client_changed and not server_changed:
merged[field] = client_record[field] # client wins this field
elif server_changed and not client_changed:
merged[field] = server_record[field] # server wins this field
elif client_changed and server_changed:
if client_record[field] == server_record[field]:
merged[field] = client_record[field] # both agree
else:
# True conflict on this field -- apply policy
merged[field] = resolve_field_conflict(
field, client_record, server_record
)
else:
merged[field] = base_record[field] # neither changed
return merged
cr-sqlite uses per-column CRDTs for this: Each column is independently tracked with its own version, so conflicting edits to different columns on the same row merge automatically. Only same-column, same-row conflicts require LWW fallback.
When to use: Any application where concurrent users typically edit different fields of the same record (task trackers, CRMs, project management).
Pitfalls:
- Requires storing the "base" version (the last-synced state) to do three-way comparison
- Field-level merge can produce semantically invalid combinations (e.g., changing
quantityandunit_priceindependently can produce wrongtotal) - Consider which fields should merge independently vs which should be treated as an atomic group
21.4 Operational Transformation (OT)
OT transforms concurrent operations against each other so both can be applied in any order and converge to the same result. Used primarily for text and sequence editing.
How it works:
- User A inserts "X" at position 5
- User B inserts "Y" at position 3
- Server receives both operations
- Server transforms A's operation: since B inserted before position 5, A's position shifts to 6
- Both clients apply the transformed operations, converging to the same document
When to use: Collaborative text editing (Google Docs uses OT), sequential data where position matters.
When NOT to use: Simple key-value record sync, CRUD apps, scenarios without a central server (OT requires a server for ordering).
Trade-offs vs CRDTs:
- OT is simpler for text editing with a central server
- CRDTs work peer-to-peer without a central server
- OT requires all operations to pass through the server
- CRDTs have higher metadata overhead but work offline indefinitely
Sources:
- OT vs CRDT Comparison (thom.ee)
- Real-Time Collaboration OT vs CRDT (TinyMCE)
- Why Fiberplane Uses OT (Fiberplane)
21.5 CRDTs (Conflict-Free Replicated Data Types)
CRDTs are data structures designed to automatically converge across replicas without coordination. If no more updates are made, all replicas reach the same state -- guaranteed by mathematical properties.
CRDT types relevant to SQLite sync:
| Type | Behavior | Use Case |
|---|---|---|
| LWW-Register | Last write wins per field, using timestamp | Individual record fields |
| G-Counter | Grow-only counter, each replica tracks its own count | Page views, like counts |
| PN-Counter | Positive-negative counter (two G-Counters) | Inventory, resource pools |
| G-Set | Grow-only set (add only, no remove) | Event logs, tags |
| OR-Set | Observed-Remove set (add and remove, add-wins) | Shopping carts, selections |
| LWW-Element-Set | Add/remove with timestamps | Feature flags, preferences |
| MV-Register | Multi-value register (preserves all concurrent writes) | Conflict-aware fields |
| RGA | Replicated Growable Array | Collaborative text, lists |
cr-sqlite CRDT usage:
-- Load the extension
.load crsqlite
-- Create a normal table
CREATE TABLE tasks (
id TEXT PRIMARY KEY NOT NULL,
title TEXT,
status TEXT
);
-- Convert to a conflict-free replicated relation (CRR)
SELECT crsql_as_crr('tasks');
-- Normal INSERT/UPDATE/DELETE operations work as usual
INSERT INTO tasks (id, title, status) VALUES ('task-1', 'Buy groceries', 'pending');
UPDATE tasks SET status = 'done' WHERE id = 'task-1';
-- Export changes since version X for sync
SELECT "table", "pk", "cid", "val", "col_version", "db_version",
"site_id", "cl", "seq"
FROM crsql_changes
WHERE db_version > ?;
-- Import changes from another device
INSERT INTO crsql_changes
("table", "pk", "cid", "val", "col_version", "db_version",
"site_id", "cl", "seq")
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?);
-- Clean up before closing
SELECT crsql_finalize();
Performance characteristics of cr-sqlite:
- Inserts into CRR tables: approximately 2.5x slower than regular SQLite
- Reads: identical speed to standard SQLite
- Space overhead: additional metadata for version tracking per column
When to use CRDTs: Peer-to-peer sync, multi-device apps with extended offline periods, collaborative editing, any scenario where a central server cannot always be reached.
When NOT to use CRDTs: Simple client-server sync where the server is always authoritative, apps where business logic validation must happen before accepting writes.
Sources:
- cr-sqlite (GitHub)
- CRDT Dictionary (Ian Duncan)
- CRDT Implementations (crdt.tech)
- Conflict-Free Replicated Data Types (Wikipedia)
21.6 Conflict Queues for Manual Resolution
When automated resolution is insufficient, queue conflicts for human review.
Conflict queue table:
CREATE TABLE sync_conflicts (
id TEXT PRIMARY KEY NOT NULL,
table_name TEXT NOT NULL,
record_id TEXT NOT NULL,
client_data TEXT NOT NULL, -- JSON of client version
server_data TEXT NOT NULL, -- JSON of server version
base_data TEXT, -- JSON of last-synced version (for 3-way merge)
conflict_type TEXT NOT NULL, -- 'update_update', 'update_delete', 'delete_update'
detected_at TEXT NOT NULL,
resolved_at TEXT,
resolution TEXT, -- 'client', 'server', 'merged', 'discarded'
resolved_data TEXT -- JSON of final version if merged
);
CREATE INDEX idx_conflicts_unresolved
ON sync_conflicts(resolved_at) WHERE resolved_at IS NULL;
Conflict detection during sync:
def sync_record(client_record, server_record):
if server_record.version == client_record.base_version:
# No conflict -- server hasn't changed since client last synced
apply_to_server(client_record)
else:
# Conflict -- both changed since last sync
conflict = {
"table_name": "tasks",
"record_id": client_record.id,
"client_data": json.dumps(client_record),
"server_data": json.dumps(server_record),
"conflict_type": "update_update",
"detected_at": now_utc()
}
insert_conflict(conflict)
# Optionally: apply server version as interim, flag for review
When to use: Medical records, legal documents, financial transactions, any domain where silent data loss is unacceptable and a human must choose the correct resolution.
22. Sync Protocols and Patterns
22.1 Full Sync vs Incremental/Delta Sync
Full sync: Transfer the entire dataset on every sync. Simple but expensive.
-- Full sync: client sends everything
SELECT * FROM tasks;
-- Server replaces all client data
DELETE FROM tasks;
INSERT INTO tasks SELECT * FROM incoming_data;
Incremental (delta) sync: Transfer only records that changed since the last sync.
-- Client tracks last successful sync timestamp
-- or last sync version number
SELECT * FROM tasks
WHERE updated_at > ? -- last_sync_timestamp
ORDER BY updated_at ASC;
Delta sync with version numbers (more reliable than timestamps):
-- Server maintains a global sync version counter
-- Each change increments it
-- Client stores the last version it received
-- Client pull: "give me everything after version 42"
SELECT * FROM tasks WHERE sync_version > 42 ORDER BY sync_version ASC;
-- Client push: send changes with their local version
-- Server assigns sequential sync_version on acceptance
Best practice: Always use delta sync in production. Full sync only for initial bootstrap or recovery after corruption.
22.2 Change Tracking Approaches
Approach 1: Flag columns on each table
-- updated_at + last_synced_at comparison
SELECT * FROM tasks
WHERE last_synced_at IS NULL OR updated_at > last_synced_at;
Pros: Simple. Cons: Requires adding columns to every synced table.
Approach 2: Change log table with triggers
CREATE TABLE change_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
table_name TEXT NOT NULL,
record_id TEXT NOT NULL,
operation TEXT NOT NULL, -- 'INSERT', 'UPDATE', 'DELETE'
changed_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
synced INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX idx_changelog_unsynced ON change_log(synced, changed_at);
-- INSERT trigger
CREATE TRIGGER tasks_after_insert
AFTER INSERT ON tasks
BEGIN
INSERT INTO change_log (table_name, record_id, operation)
VALUES ('tasks', NEW.id, 'INSERT');
END;
-- UPDATE trigger
CREATE TRIGGER tasks_after_update
AFTER UPDATE ON tasks
BEGIN
INSERT INTO change_log (table_name, record_id, operation)
VALUES ('tasks', NEW.id, 'UPDATE');
END;
-- DELETE trigger
CREATE TRIGGER tasks_after_delete
AFTER DELETE ON tasks
BEGIN
INSERT INTO change_log (table_name, record_id, operation)
VALUES ('tasks', OLD.id, 'DELETE');
END;
Pros: Centralized change tracking, captures operation type. Cons: Table grows continuously, needs periodic purging.
Approach 3: SQLite Session Extension
The SQLite Session Extension is a built-in mechanism for recording changes to rowid tables and packaging them as binary changesets.
// Create a session monitoring all tables
sqlite3_session *pSession;
sqlite3session_create(db, "main", &pSession);
sqlite3session_attach(pSession, NULL); // NULL = all tables
// ... application makes changes ...
// Generate a binary changeset
void *pChangeset;
int nChangeset;
sqlite3session_changeset(pSession, &nChangeset, &pChangeset);
// Apply changeset to another database
sqlite3changeset_apply(db2, nChangeset, pChangeset, NULL, conflict_handler, NULL);
// Clean up
sqlite3session_delete(pSession);
Changeset contents per operation:
- INSERT: Values for all columns of the new row
- DELETE: Primary key + original values for all columns
- UPDATE: Primary key + original values + new values for changed columns
Changeset vs patchset:
- Changeset: Full old + new values; enables complete conflict detection
- Patchset: Compact format; DELETE carries only PK, UPDATE carries only new values; limited conflict detection
Build requirement: compile SQLite with -DSQLITE_ENABLE_SESSION -DSQLITE_ENABLE_PREUPDATE_HOOK
Sources:
- SQLite Session Extension (sqlite.org)
- SQLiteChangesetSync (GitHub)
- sqlite3session changeset example (GitHub Gist)
22.3 Push vs Pull vs Bidirectional Sync
Pull sync (server to client):
- Client periodically requests new data from server
- Server is source of truth
- Simple to implement; polling introduces latency
- Use for read-heavy apps (news, documentation, catalog data)
Push sync (client to server):
- Client sends local changes to server when connectivity returns
- Server validates and applies
- Use for write-heavy offline scenarios (field data collection, surveys)
Bidirectional sync:
- Both push and pull in a single sync cycle
- Most complex; requires conflict resolution
- Standard pattern for collaborative apps
Recommended sync cycle for bidirectional:
1. Push local changes to server
2. Server validates, resolves conflicts, returns results
3. Pull server changes (including changes from other devices)
4. Apply server changes to local database
5. Update last_synced_at / sync version
Push-based optimization via "shoulder tap": Instead of polling, the server sends a lightweight notification (push notification, WebSocket message, SSE event) that new data is available. The client then pulls the actual data. This combines low latency with simple pull-based data transfer.
22.4 Batch Sync with Pagination
Never sync unbounded result sets. Paginate using the sync version or timestamp.
-- Server: paginated sync endpoint
-- Client requests: GET /sync?since_version=42&limit=100
SELECT id, title, status, updated_at, version, is_deleted, sync_version
FROM tasks
WHERE sync_version > :since_version
ORDER BY sync_version ASC
LIMIT :limit;
-- Response includes the max sync_version in the batch
-- Client uses that as since_version for the next page
Batch size recommendations:
- Start with 100-500 records per batch
- Adjust based on average record size and network conditions
- Mobile: smaller batches (50-100) for unreliable connections
- Desktop: larger batches (500-1000) for stable connections
- Never exceed the SQLite parameter limit (999 for older versions, 32766 for newer)
22.5 Idempotent Operations for Retry Safety
Network failures during sync mean the same batch may be sent multiple times. Every sync operation must be safe to replay.
Making operations idempotent:
-- UPSERT pattern: safe to replay
INSERT INTO tasks (id, title, status, updated_at, version)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT (id) DO UPDATE SET
title = EXCLUDED.title,
status = EXCLUDED.status,
updated_at = EXCLUDED.updated_at,
version = EXCLUDED.version
WHERE EXCLUDED.version > tasks.version;
-- Soft delete: safe to replay
UPDATE tasks SET is_deleted = 1, updated_at = ?
WHERE id = ? AND is_deleted = 0;
-- If already deleted, this is a no-op (0 rows affected) -- safe
Idempotency keys in the outbox:
CREATE TABLE outbox (
id TEXT PRIMARY KEY,
type TEXT NOT NULL,
payload TEXT NOT NULL,
idempotency_key TEXT NOT NULL UNIQUE, -- critical
status TEXT NOT NULL DEFAULT 'pending',
attempt_count INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL
);
-- INSERT OR IGNORE prevents duplicate entries on retry
INSERT OR IGNORE INTO outbox (id, type, payload, idempotency_key, created_at)
VALUES (?, ?, ?, ?, ?);
Server-side deduplication:
The server must also track processed idempotency keys and return success (not error) for duplicates:
def process_sync_batch(changes):
for change in changes:
if already_processed(change.idempotency_key):
continue # idempotent: return success, don't reapply
apply_change(change)
mark_processed(change.idempotency_key)
22.6 Queue-Based Sync with Outbox Pattern
The outbox pattern ensures that local data writes and sync queue entries are always consistent -- either both happen or neither does.
Outbox table schema:
CREATE TABLE sync_queue (
id INTEGER PRIMARY KEY AUTOINCREMENT,
operation TEXT NOT NULL, -- 'INSERT', 'UPDATE', 'DELETE'
table_name TEXT NOT NULL,
record_id TEXT NOT NULL,
payload TEXT NOT NULL, -- JSON of the record
idempotency_key TEXT NOT NULL UNIQUE,
created_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
synced INTEGER NOT NULL DEFAULT 0,
attempt_count INTEGER NOT NULL DEFAULT 0,
next_attempt_at INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX idx_sync_unsynced ON sync_queue(synced, next_attempt_at);
Transactional dual-write (critical pattern):
# BOTH the data write and queue entry in ONE transaction
db.execute("BEGIN TRANSACTION")
db.execute("""
INSERT INTO tasks (id, title, status, created_at, updated_at)
VALUES (?, ?, ?, ?, ?)
""", [task_id, title, status, now, now])
db.execute("""
INSERT INTO sync_queue (operation, table_name, record_id, payload, idempotency_key)
VALUES ('INSERT', 'tasks', ?, ?, ?)
""", [task_id, json.dumps(task_data), f"insert:tasks:{task_id}"])
db.execute("COMMIT")
# If it's on screen, it's in the outbox.
Sync worker (processes queue on reconnect):
def process_sync_queue():
rows = db.execute("""
SELECT id, operation, table_name, record_id, payload, attempt_count
FROM sync_queue
WHERE synced = 0 AND next_attempt_at <= ?
ORDER BY id ASC
LIMIT 50
""", [now_ms()]).fetchall()
for row in rows:
try:
response = send_to_server(row)
if response.status_code in (200, 201, 204):
db.execute("UPDATE sync_queue SET synced = 1 WHERE id = ?", [row.id])
elif response.status_code == 409: # conflict
handle_conflict(row, response.json())
except NetworkError:
# Exponential backoff, capped at 15 minutes
next_attempt = now_ms() + min(
15 * 60_000,
30_000 * (row.attempt_count + 1)
)
db.execute("""
UPDATE sync_queue
SET attempt_count = attempt_count + 1,
next_attempt_at = ?
WHERE id = ?
""", [next_attempt, row.id])
Inbound sync (applying server changes locally):
def apply_server_changes(changes):
db.execute("BEGIN TRANSACTION")
for change in changes:
if change["operation"] == "INSERT":
db.execute("""
INSERT OR REPLACE INTO tasks (id, title, status, updated_at, version)
VALUES (?, ?, ?, ?, ?)
""", [change["id"], change["title"], change["status"],
change["updated_at"], change["version"]])
elif change["operation"] == "DELETE":
db.execute("""
UPDATE tasks SET is_deleted = 1, updated_at = ?
WHERE id = ?
""", [change["updated_at"], change["id"]])
db.execute("COMMIT")
Sources:
- Outbox Pattern (Milan Jovanovic)
- Transactional Outbox Pattern (microservices.io)
- React Native Offline Sync with SQLite Queue (DEV)
- Offline-First Apps with SQLite Sync Queues (SQLite Forum)
23. Offline-First Architecture
23.1 Write-Ahead Log for Offline Operations
SQLite's WAL mode is foundational for offline-first apps.
Enable WAL mode:
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL; -- safe for app crashes, not power loss
Why WAL matters for sync:
- Readers never block writers and writers never block readers
- Write transactions are fast (sequential log append, no fsync per commit in NORMAL mode)
- Background sync reads can proceed while the user writes data
- WAL can be replicated (Litestream, Turso use this)
Connection strategy for sync:
- One writer connection (serialized writes via application-level queue)
- Multiple reader connections (parallel reads for UI and sync)
- Never share a single connection between UI thread and sync thread
23.2 Operation Queue Pattern
Queue every mutation as a discrete operation. Replay the queue when connectivity returns.
Queue table:
CREATE TABLE outbox (
id TEXT PRIMARY KEY,
type TEXT NOT NULL, -- 'create_task', 'update_task', 'delete_task'
payload TEXT NOT NULL, -- JSON with all data needed to replay
idempotency_key TEXT NOT NULL UNIQUE,
status TEXT NOT NULL DEFAULT 'pending', -- 'pending', 'sending', 'done', 'failed'
attempt_count INTEGER NOT NULL DEFAULT 0,
next_attempt_at INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL
);
CREATE INDEX idx_outbox_pending
ON outbox(status, next_attempt_at)
WHERE status = 'pending';
Critical rule: The UI write and the outbox entry must be in the same transaction. If it is on screen, it is in the outbox.
Event-sourcing variant: Instead of storing record snapshots, store operations:
{"type": "set_logged", "set_id": "abc", "reps": 10, "weight": 135}
{"type": "set_deleted", "set_id": "abc"}
{"type": "workout_renamed", "workout_id": "xyz", "name": "Leg Day"}
The server replays these events to reconstruct state. This makes sync equivalent to event replay and naturally supports undo/redo.
23.3 Optimistic UI Updates
The UI always reads from the local SQLite database, never from the network. Changes appear instantly.
Pattern:
User Action
|
v
Write to local SQLite + Enqueue to outbox (one transaction)
|
v
UI reads from SQLite (instant update)
|
v (background, asynchronous)
Sync worker sends outbox to server
|
v
Server confirms or rejects
|
v
If rejected: roll back local change, notify user
If confirmed: mark outbox entry as done
Rollback on server rejection:
def handle_server_rejection(outbox_entry, server_response):
db.execute("BEGIN TRANSACTION")
# Revert the local change
if outbox_entry.type == 'create_task':
db.execute("DELETE FROM tasks WHERE id = ?", [outbox_entry.record_id])
elif outbox_entry.type == 'update_task':
# Restore from server's version
apply_server_version(server_response.current_record)
# Remove from outbox
db.execute("UPDATE outbox SET status = 'failed' WHERE id = ?", [outbox_entry.id])
db.execute("COMMIT")
notify_user("Your change could not be saved: " + server_response.reason)
23.4 Schema Migrations When Device Is Offline
Devices may be offline when a new app version with schema changes is released.
Migration strategy using user_version pragma:
def migrate_database(db):
db.execute("PRAGMA user_version")
current_version = db.fetchone()[0]
if current_version < 1:
db.execute("ALTER TABLE tasks ADD COLUMN priority TEXT DEFAULT 'medium'")
db.execute("CREATE INDEX idx_tasks_priority ON tasks(priority)")
db.execute("PRAGMA user_version = 1")
if current_version < 2:
db.execute("""
CREATE TABLE task_tags (
task_id TEXT NOT NULL REFERENCES tasks(id),
tag TEXT NOT NULL,
PRIMARY KEY (task_id, tag)
)
""")
db.execute("PRAGMA user_version = 2")
if current_version < 3:
# Complex migration: rename column (SQLite < 3.25 workaround)
db.execute("BEGIN TRANSACTION")
db.execute("ALTER TABLE tasks RENAME TO tasks_old")
db.execute("""
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
description TEXT, -- renamed from 'content'
-- ... all other columns
)
""")
db.execute("""
INSERT INTO tasks (id, title, description, ...)
SELECT id, title, content, ... FROM tasks_old
""")
db.execute("DROP TABLE tasks_old")
db.execute("COMMIT")
db.execute("PRAGMA user_version = 3")
Migration tracking table (alternative to pragma):
CREATE TABLE IF NOT EXISTS schema_migrations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
applied_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now'))
);
Rules for sync-compatible migrations:
- Only add columns -- never remove or rename without a migration path
- New columns must have DEFAULT values (CRDTs require this)
- Migrations must be idempotent (safe to run twice)
- Wrap each migration in a transaction
- Always back up before migrating
- Test migrations against databases at every previous version
Sources:
- SQLite Versioning and Migration Strategies (SQLite Forum)
- Android SQLite Database Migration (Medium)
23.5 Data Expiry and Cache Invalidation
Local SQLite databases grow unbounded without maintenance.
Pruning synced data:
-- Remove successfully synced outbox entries older than 30 days
DELETE FROM sync_queue
WHERE synced = 1
AND created_at < strftime('%Y-%m-%dT%H:%M:%fZ', 'now', '-30 days');
-- Remove old change log entries
DELETE FROM change_log
WHERE synced = 1
AND changed_at < strftime('%Y-%m-%dT%H:%M:%fZ', 'now', '-30 days');
-- Purge tombstones older than 90 days (confirmed synced)
DELETE FROM tasks WHERE is_deleted = 1
AND updated_at < strftime('%Y-%m-%dT%H:%M:%fZ', 'now', '-90 days')
AND last_synced_at IS NOT NULL;
VACUUM after pruning:
-- Reclaim disk space (locks database, copies and rebuilds)
VACUUM;
-- Check space usage before deciding to vacuum
SELECT page_count * page_size AS total_size,
freelist_count * page_size AS free_space
FROM pragma_page_count(), pragma_page_size(), pragma_freelist_count();
Incremental vacuum (less disruptive alternative):
PRAGMA auto_vacuum = INCREMENTAL;
PRAGMA incremental_vacuum(100); -- free up to 100 pages
Cache invalidation strategy:
- Track
last_full_synctimestamp - If
last_full_syncis older than threshold (e.g., 7 days), do a full sync instead of delta - If server sends a "schema changed" signal, drop and rebuild local tables
- Monitor storage quota (especially important in browsers):
const estimate = await navigator.storage.estimate();
const percentUsed = (estimate.usage / estimate.quota) * 100;
if (percentUsed > 80) {
// Trigger aggressive pruning
await pruneOldSyncData();
}
24. Type Mapping Between SQLite and Server Databases
24.1 Complete Type Mapping Reference
| Data Concept | SQLite Storage | SQLite DDL | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|---|---|
| UUID | TEXT | id TEXT PRIMARY KEY |
UUID |
CHAR(36) |
UNIQUEIDENTIFIER |
| Boolean | INTEGER | is_active INTEGER DEFAULT 0 |
BOOLEAN |
TINYINT(1) |
BIT |
| Timestamp (UTC) | TEXT | created_at TEXT |
TIMESTAMPTZ |
DATETIME |
DATETIMEOFFSET |
| Date only | TEXT | birth_date TEXT |
DATE |
DATE |
DATE |
| Integer | INTEGER | count INTEGER |
INTEGER / BIGINT |
INT / BIGINT |
INT / BIGINT |
| Decimal | TEXT or REAL | price TEXT |
NUMERIC(10,2) |
DECIMAL(10,2) |
DECIMAL(10,2) |
| Float | REAL | latitude REAL |
DOUBLE PRECISION |
DOUBLE |
FLOAT |
| Short text | TEXT | name TEXT |
VARCHAR(255) |
VARCHAR(255) |
NVARCHAR(255) |
| Long text | TEXT | description TEXT |
TEXT |
TEXT |
NVARCHAR(MAX) |
| JSON | TEXT | metadata TEXT |
JSONB |
JSON |
NVARCHAR(MAX) |
| Binary data | BLOB | avatar BLOB |
BYTEA |
LONGBLOB |
VARBINARY(MAX) |
| Enum | TEXT | status TEXT CHECK(...) |
VARCHAR + CHECK |
ENUM(...) |
VARCHAR + CHECK |
24.2 Date/Time Format Alignment
Critical rule: Always store and transmit dates as ISO-8601 UTC strings.
-- SQLite: store as ISO-8601 TEXT
INSERT INTO events (id, event_date)
VALUES ('evt-1', strftime('%Y-%m-%dT%H:%M:%fZ', 'now'));
-- Result: '2026-04-03T14:30:45.123Z'
-- PostgreSQL: parse from ISO-8601
INSERT INTO events (id, event_date)
VALUES ('evt-1', '2026-04-03T14:30:45.123Z'::timestamptz);
Conversion helpers:
-- SQLite: ISO-8601 TEXT to Unix timestamp
SELECT strftime('%s', '2026-04-03T14:30:45Z'); -- 1775148645
-- SQLite: Unix timestamp to ISO-8601 TEXT
SELECT strftime('%Y-%m-%dT%H:%M:%fZ', 1775148645, 'unixepoch');
-- SQLite: Compare dates stored as TEXT (works because ISO-8601 is lexicographically sortable)
SELECT * FROM events WHERE event_date > '2026-01-01T00:00:00Z';
Pitfalls:
- SQLite has no TIMESTAMP type -- it stores dates as TEXT, REAL, or INTEGER
- ISO-8601 TEXT comparison requires consistent formatting (always use leading zeros, always include 'Z' suffix)
- Never mix timestamp formats in the same column
- Always convert to UTC before storing; convert to local time only at the display layer
- PostgreSQL
TIMESTAMP(without time zone) andTIMESTAMPTZbehave differently -- always useTIMESTAMPTZfor synced data
24.3 JSON Handling Differences
-- SQLite: JSON stored as TEXT, queried with json1 functions
SELECT json_extract(metadata, '$.color') FROM tasks WHERE id = ?;
SELECT metadata ->> 'color' FROM tasks WHERE id = ?; -- SQLite 3.38+
-- PostgreSQL: JSON stored as JSONB (binary, indexed)
SELECT metadata->>'color' FROM tasks WHERE id = ?;
-- SQLite: index on JSON field
CREATE INDEX idx_tasks_color ON tasks(json_extract(metadata, '$.color'));
-- PostgreSQL: GIN index on JSONB
CREATE INDEX idx_tasks_metadata ON tasks USING GIN (metadata);
Compatibility notes:
- SQLite's
->and->>operators are designed to be compatible with both MySQL and PostgreSQL syntax - SQLite JSONB format is NOT binary-compatible with PostgreSQL JSONB -- it is a different on-disk format
- Always validate JSON on both sides -- SQLite's json1 returns NULL for invalid JSON, PostgreSQL raises an error
24.4 Boolean Handling
-- SQLite: booleans are integers
INSERT INTO tasks (id, is_complete) VALUES ('task-1', 0);
SELECT * FROM tasks WHERE is_complete = 1;
-- PostgreSQL: native boolean
INSERT INTO tasks (id, is_complete) VALUES ('task-1', FALSE);
SELECT * FROM tasks WHERE is_complete = TRUE;
Sync conversion layer:
def sqlite_to_postgres(record):
"""Convert SQLite record to PostgreSQL-compatible values."""
converted = dict(record)
for col in boolean_columns:
converted[col] = bool(converted[col]) # 0/1 -> False/True
for col in timestamp_columns:
# TEXT -> datetime object (PostgreSQL driver handles conversion)
converted[col] = datetime.fromisoformat(converted[col])
return converted
def postgres_to_sqlite(record):
"""Convert PostgreSQL record to SQLite-compatible values."""
converted = dict(record)
for col in boolean_columns:
converted[col] = int(converted[col]) # True/False -> 1/0
for col in timestamp_columns:
converted[col] = record[col].isoformat() # datetime -> TEXT
return converted
Sources:
- SQLite Data Types (w3resource)
- Datatypes in SQLite (sqlite.org)
- Handling Timestamps in SQLite (sqlite.ai)
- Drizzle ORM SQLite Column Types
- Drizzle ORM PostgreSQL Column Types
25. SQLite Sync Tools and Extensions
25.1 SQLite Session Extension (sqlite3session)
Built into SQLite. Records changes to attached tables and packages them as binary changesets.
Key capabilities:
- Captures INSERT, UPDATE, DELETE as binary blobs
- Changesets can be applied to other databases with the same schema
- Built-in conflict handler callback
- Supports changeset inversion (undo)
- Supports changeset concatenation (batch multiple sessions)
Conflict handler callback types:
| Conflict Type | Trigger Condition |
|---|---|
| SQLITE_CHANGESET_DATA | UPDATE/DELETE: row exists but non-PK values don't match |
| SQLITE_CHANGESET_NOTFOUND | UPDATE/DELETE: no row with matching PK |
| SQLITE_CHANGESET_CONFLICT | INSERT: row with matching PK already exists |
| SQLITE_CHANGESET_CONSTRAINT | Change violates UNIQUE or CHECK constraint |
Limitations:
- Tables must have a declared PRIMARY KEY
- Virtual tables not supported
- NULL values in PK columns are ignored (rows not captured)
- Requires compile-time flags:
-DSQLITE_ENABLE_SESSION -DSQLITE_ENABLE_PREUPDATE_HOOK
Source: SQLite Session Extension (sqlite.org)
25.2 CR-SQLite (CRDT-Based Merge)
Run-time loadable extension that adds multi-master replication via CRDTs.
Setup:
.load crsqlite
CREATE TABLE documents (id TEXT PRIMARY KEY NOT NULL, title TEXT, content TEXT);
SELECT crsql_as_crr('documents');
Sync between two databases:
-- On Device A: export changes
SELECT * FROM crsql_changes WHERE db_version > ?;
-- On Device B: import Device A's changes
INSERT INTO crsql_changes VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?);
Schema changes on CRR tables:
SELECT crsql_begin_alter('documents');
ALTER TABLE documents ADD COLUMN status TEXT;
SELECT crsql_commit_alter('documents');
CRDT algorithms per column: LWW (last-write-wins), fractional indices (list ordering), observe-remove sets (row presence). Counter and rich-text CRDTs in development.
Source: cr-sqlite (GitHub)
25.3 Litestream
Streaming WAL-based replication for disaster recovery. Not a sync tool -- it replicates a single SQLite database to cloud storage (S3, Azure Blob, SFTP).
How it works:
- Takes over SQLite's WAL checkpointing process
- Starts a long-running read transaction to prevent other processes from checkpointing
- Continuously copies new WAL pages to a "shadow WAL" directory
- Streams shadow WAL to configured storage backends
- Periodically takes full snapshots; prunes old WAL files
Key configuration:
dbs:
- path: /path/to/app.db
replicas:
- type: s3
bucket: my-backup-bucket
path: app.db
retention: 72h
Live read replicas: Litestream can stream changes to read-only replicas on other servers, applying changes in a transactionally-safe manner.
When to use: Server-side SQLite backup and disaster recovery. Not designed for multi-writer sync or device-to-server sync.
Source: Litestream (litestream.io)
25.4 ElectricSQL
Syncs Postgres with client-side SQLite using the Postgres logical replication WAL.
Architecture:
- Electric service sits between Postgres and clients
- Reads Postgres WAL via logical replication
- Streams changes to client-side SQLite (browser via WASM, mobile via native SQLite)
- Client writes go through Electric back to Postgres
- Conflict resolution via CRDTs (last-writer-wins semantics)
Key characteristic: "Direct-to-Postgres" -- writes bypass your application backend. Validation must happen via Postgres constraints and DDLX rules.
Trade-offs:
- Pro: No backend code needed for sync
- Con: Cannot inject custom business logic on the write path
- Con: Modifies Postgres schema (adds shadow tables, triggers)
- Con: Requires SUPERUSER database privileges
Source: ElectricSQL Alternatives (electric-sql.com)
25.5 PowerSync
Postgres-to-SQLite sync engine with server-authoritative write path.
Architecture:
- PowerSync Service connects to Postgres via logical replication (read-only)
- Streams data to client-side SQLite based on configurable "Sync Rules"
- Client writes go to a local upload queue, then through YOUR backend
- Your backend applies business logic, validation, authorization
- Changes committed to Postgres flow back to all clients via PowerSync
Key differentiator: You control the write path. The server can reject, transform, or merge client writes with custom logic.
Sync Rules (dynamic partial replication):
# Only sync tasks belonging to the current user
- table: tasks
filter: "user_id = token_parameters.user_id"
Consistency model: Causal+ consistency via checkpoint-based synchronization. Clients update state atomically when receiving all data matching a checkpoint.
Supported backends: Postgres (GA), MongoDB (GA), MySQL (planned)
Sources:
25.6 Turso / libSQL
Fork of SQLite with built-in replication and embedded replicas.
Embedded replicas architecture:
- Source of truth is the remote Turso database
- Local SQLite copy on each device for zero-latency reads
- Sync uses frame-based WAL replication (1 frame = 4KB page)
- Guarantees read-your-writes semantics
Sync strategies:
- Manual sync: call
client.sync()when desired - Periodic sync: configure
syncIntervalfor automatic polling - Offline writes: write to local WAL, push when connected, pull to reconcile
Conflict resolution options:
FAIL_ON_CONFLICT-- reject and require explicit handlingDISCARD_LOCAL-- server-wins (discard local changes)REBASE_LOCAL-- replay local changes on top of server stateMANUAL_RESOLUTION-- callback withlocalDataandremoteData
const client = createClient({
url: 'local.db',
syncUrl: 'libsql://remote.turso.io',
authToken: '...',
});
await client.execute('INSERT INTO tasks VALUES (?)', ['task-1']);
await client.sync({ strategy: SyncStrategy.REBASE_LOCAL });
Source: Turso Offline Writes (turso.tech)
25.7 SQLite Sync (sqlite.ai)
CRDT-based extension that syncs SQLite with SQLite Cloud, PostgreSQL, and Supabase.
Setup:
.load cloudsync
-- Enable CRDT sync on a table
SELECT cloudsync_init('tasks');
-- Use UUIDv7 for primary keys
INSERT INTO tasks (id, title) VALUES (cloudsync_uuid(), 'New task');
-- Connect and sync
SELECT cloudsync_network_init('your-database-id');
SELECT cloudsync_network_set_apikey('your-api-key');
SELECT cloudsync_network_sync();
CRDT algorithm options:
cls(Causal-Length Set) -- defaultdws(Delete-Wins Set)aws(Add-Wins Set)gos(Grow-Only Set)
Block-level LWW for text columns:
-- Enable per-line conflict resolution on a text column
SELECT cloudsync_set_column('notes', 'body', 'algo', 'block');
SELECT cloudsync_set_column('notes', 'body', 'delimiter', '\n');
-- After sync, materialize the merged text
SELECT cloudsync_text_materialize('notes', 'body', 'note-001');
Schema requirements:
- All NOT NULL columns must have DEFAULT values
- TEXT primary keys with UUIDv7 recommended
- Must call
cloudsync_begin_alter/cloudsync_commit_alterbefore/after ALTER TABLE
Source: sqlite-sync API (GitHub)
Tool Comparison Matrix
| Feature | Session Extension | cr-sqlite | Litestream | ElectricSQL | PowerSync | Turso | sqlite-sync |
|---|---|---|---|---|---|---|---|
| Sync direction | Manual | Bidirectional | One-way (backup) | Bidirectional | Bidirectional | Bidirectional | Bidirectional |
| Conflict resolution | Callback | CRDT (automatic) | N/A | CRDT (LWW) | Custom (your backend) | Multiple strategies | CRDT (automatic) |
| Server DB | Any | Any | N/A (storage) | Postgres only | Postgres, MongoDB | Turso Cloud | SQLite Cloud, PG, Supabase |
| Offline writes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| Custom write logic | Yes | No | N/A | No (PG constraints) | Yes (your backend) | Partial | No |
| Setup complexity | Low (C API) | Low (extension) | Low (config) | Medium | Medium | Low | Low (extension) |
| Maturity | Stable (part of SQLite) | Beta | Stable | Production | Production | Beta (offline) | Beta |
26. Real-World Sync Architectures
26.1 Notion
Architecture: Notion built their entire client-side data layer on SQLite compiled to WebAssembly.
Key design decisions:
- Uses OPFS SyncAccessHandle Pool VFS for browser persistence
- SharedWorker coordinates SQLite access across browser tabs -- only one tab writes at a time
- Web Locks API detects closed tabs
- The local SQLite database serves as a read cache; the server is source of truth
- Every server update writes to the local cache
- Navigation queries race SQLite and API requests on slower devices
Performance results:
- 20% improvement in page navigation times across all browsers
- 28-33% faster for users in high-latency regions (Australia, China, India)
Notable trade-off: Initial page load deliberately skips SQLite caching because downloading the WASM library is slower than the first API call. SQLite only accelerates subsequent navigations.
Source: How We Sped Up Notion with WASM SQLite (Notion Blog)
26.2 Linear
Architecture: Custom sync engine with centralized server authority.
Key design decisions:
- All operations pass through the server, which assigns sequential sync IDs (monotonically increasing integers)
- Sync IDs serve as the global version number of the database
- Clients send transactions to the server; server broadcasts delta packets to all clients
- Delta packets may differ from the original transaction (server can add side-effects)
- Uses MobX for reactive UI updates from local store
- IndexedDB for browser-side persistence
Data model:
- Models defined with TypeScript decorators (
@ClientModel) - Seven property types: property, ephemeralProperty, reference, referenceModel, referenceCollection, backReference, referenceArray
- Lazy loading for properties not needed at bootstrap
- Schema hash for instant detection of schema mismatches
Offline support:
- Transactions cached in IndexedDB during disconnection
- Automatically resent on reconnection
- Reversible transactions enable client-side rollback if server rejects
Conflict resolution: Last-writer-wins for simple fields; server-authoritative ordering via sync IDs eliminates most conflicts by serializing all operations.
Sources:
26.3 Figma
Architecture: Client/server with custom CRDT-inspired approach.
Key design decisions:
- Clients connect to servers via WebSockets
- Each document gets a separate server process
- Inspired by CRDTs but not a pure CRDT implementation
- Uses server authority for operation ordering (closer to OT than full CRDT)
- CRDTs provide eventual consistency guarantees: if no more updates, all clients converge
Collaboration model:
- Server receives operations, validates against authoritative state
- Transforms operations to resolve concurrent conflicts
- Broadcasts transformed operations to all connected clients
- Client-side operations are applied optimistically (instant UI)
Source: How Figma's Multiplayer Technology Works (Figma Blog)
26.4 Mobile Apps (iOS/Android)
iOS pattern (Core Data + CloudKit):
NSPersistentCloudKitContainerbridges Core Data (backed by SQLite) with iCloud- Only SQLite-type persistent stores can be synchronized
- Requires Persistent History Tracking enabled
- Supports three database tiers: private, shared, public
- Multiple SQLite stores with separate configurations control what syncs
Android pattern (Room + Sync Adapter):
- Room provides a type-safe abstraction over SQLite
- SyncAdapter framework handles background sync with system-managed scheduling
- ContentProvider mediates between SyncAdapter and the private SQLite database
- System batches sync operations for battery efficiency
Cross-platform pattern (React Native / Flutter):
- SQLite via
expo-sqliteorsqflitepackages - Outbox queue table for pending mutations
- NetInfo API to detect connectivity changes
- Background sync on reconnect
- PowerSync or ElectricSQL for managed sync infrastructure
26.5 Desktop Apps (Electron / Tauri)
Electron:
- better-sqlite3 or sql.js for SQLite access
- RxDB provides reactive queries with IndexedDB or SQLite adapters
- Sync via REST APIs or WebSocket connections
- Multiple windows share one SQLite database via main process IPC
Tauri:
- Rust backend with
sqlxcrate for direct SQLite access - Tauri SQL plugin for cross-platform SQLite
- Drizzle ORM: frontend computes SQL queries, sends to Rust backend for execution
- Turso embedded replicas for managed sync
Shared desktop pattern:
- SQLite database in the app's data directory
- WAL mode enabled for concurrent read/write
- Sync worker runs in a background thread
- Outbox pattern for offline writes
- Delta sync for efficient bandwidth usage
Sources:
- Expo SQLite Guide (Expo Documentation)
- Electron Database (RxDB)
- Drizzle + SQLite in Tauri (DEV)
- Android Room Database (Android Developers)
27. Performance Considerations for Sync
27.1 Batch Size for Sync Operations
Recommendations:
| Context | Batch Size | Rationale |
|---|---|---|
| Mobile (unstable network) | 50-100 records | Smaller batches survive connection drops |
| Desktop (stable network) | 500-1000 records | Larger batches reduce HTTP overhead |
| Initial bootstrap | 1000-5000 records | Fast initial sync is critical for UX |
| Background sync | 100-500 records | Balance throughput with UI responsiveness |
JSON bulk operations (PowerSync pattern):
-- Bulk insert via JSON: single statement, no parameter limit issues
INSERT INTO tasks (id, title, status)
SELECT e->>'id', e->>'title', e->>'status'
FROM json_each(?) e;
-- Bulk update via JSON
WITH data AS (
SELECT e->>'id' AS id, e->>'title' AS title, e->>'status' AS status
FROM json_each(?) e
)
UPDATE tasks
SET title = data.title, status = data.status
FROM data
WHERE tasks.id = data.id;
-- Bulk delete via JSON
DELETE FROM tasks
WHERE id IN (SELECT e.value FROM json_each(?) e);
27.2 Transaction Management During Sync
Rule: Wrap each sync batch in a single transaction.
def apply_sync_batch(changes):
db.execute("BEGIN IMMEDIATE") # IMMEDIATE to acquire write lock upfront
try:
for change in changes:
apply_change(db, change)
db.execute("COMMIT")
except Exception:
db.execute("ROLLBACK")
raise
Why IMMEDIATE for sync transactions: BEGIN IMMEDIATE acquires the write lock at the start of the transaction, not at the first write statement. This prevents SQLITE_BUSY errors mid-transaction, which would require rolling back and retrying the entire batch.
Connection strategy:
- Single write connection with an application-level queue (prevents SQLITE_BUSY)
- Multiple read connections for UI queries
- Never hold a write transaction open while waiting for network I/O
Transaction batching benchmark:
- Without transaction: each INSERT triggers an fsync (30ms+ per operation)
- With transaction wrapping: 2-20x throughput improvement
- WAL mode + synchronous=NORMAL: reduces per-transaction overhead from 30ms+ to under 1ms
27.3 Index Strategy for Sync Metadata Columns
Essential indexes for sync:
-- Find unsynced records (most important sync query)
CREATE INDEX idx_tasks_sync_status
ON tasks(last_synced_at, updated_at)
WHERE last_synced_at IS NULL OR updated_at > last_synced_at;
-- Outbox: find pending entries
CREATE INDEX idx_outbox_pending
ON outbox(status, next_attempt_at)
WHERE status = 'pending';
-- Change log: find unsynced changes
CREATE INDEX idx_changelog_unsynced
ON change_log(synced, changed_at)
WHERE synced = 0;
-- Soft-deleted records
CREATE INDEX idx_tasks_deleted ON tasks(is_deleted);
-- Version-based sync
CREATE INDEX idx_tasks_version ON tasks(version);
Partial indexes (SQLite 3.8.0+) reduce index size: Only index the rows that matter for sync queries.
Use EXPLAIN QUERY PLAN to verify:
EXPLAIN QUERY PLAN
SELECT * FROM tasks
WHERE last_synced_at IS NULL OR updated_at > last_synced_at;
-- Should show "SEARCH ... USING INDEX" not "SCAN"
27.4 WAL Mode Benefits for Concurrent Sync Reads/Writes
Essential PRAGMA configuration for sync:
PRAGMA journal_mode = WAL; -- enable write-ahead logging
PRAGMA synchronous = NORMAL; -- safe for app crashes, fast commits
PRAGMA busy_timeout = 5000; -- wait 5s instead of failing immediately
PRAGMA journal_size_limit = 6144000; -- limit WAL file to ~6MB
PRAGMA cache_size = -2000; -- 2MB page cache (negative = KB)
PRAGMA foreign_keys = ON; -- enforce referential integrity
WAL concurrency model:
- Multiple readers can proceed simultaneously
- One writer at a time (SQLITE_BUSY if contended)
- Readers do not block writers
- Writers do not block readers
- Readers see a consistent snapshot from when they started
WAL checkpoint management:
-- Default: auto-checkpoint every 1000 pages
PRAGMA wal_autocheckpoint = 1000;
-- Manual checkpoint (run periodically or after large sync batches)
PRAGMA wal_checkpoint(PASSIVE); -- doesn't block, checkpoints what it can
PRAGMA wal_checkpoint(TRUNCATE); -- blocks briefly, resets WAL to zero size
Checkpoint strategy for sync:
- Use PASSIVE checkpoints during normal operation
- Use TRUNCATE checkpoint after large sync batches (reduces WAL file size)
- Run checkpoints in a separate thread/connection to avoid blocking UI reads
27.5 Database Size Management
Monitoring database size:
-- Total size and free space
SELECT page_count * page_size AS total_bytes,
freelist_count * page_size AS free_bytes
FROM pragma_page_count(), pragma_page_size(), pragma_freelist_count();
Pruning strategy:
-- 1. Purge synced outbox entries (keep 7 days for debugging)
DELETE FROM outbox WHERE status = 'done'
AND created_at < strftime('%s', 'now', '-7 days') * 1000;
-- 2. Purge old change log entries
DELETE FROM change_log WHERE synced = 1
AND changed_at < strftime('%Y-%m-%dT%H:%M:%fZ', 'now', '-30 days');
-- 3. Hard-delete confirmed tombstones
DELETE FROM tasks WHERE is_deleted = 1
AND last_synced_at IS NOT NULL
AND updated_at < strftime('%Y-%m-%dT%H:%M:%fZ', 'now', '-90 days');
-- 4. Reclaim space (only if significant free space exists)
PRAGMA incremental_vacuum(500); -- free up to 500 pages
When to VACUUM:
- After deleting 25%+ of database content
- Run during app idle time or on app launch
- VACUUM locks the database and requires 2x the database size in free disk space
- Prefer
PRAGMA incremental_vacuumfor gradual, non-blocking reclamation
Sources:
- SQLite Optimizations for Ultra High-Performance (PowerSync)
- Best Practices for SQLite Performance (Android Developers)
- Write-Ahead Logging (sqlite.org)
- SQLite Performance Tuning (phiresky)
- SQLite WAL Mode for Mobile Apps (DEV)
Part V: Appendices
28. Collected References
Schema Design
- Datatypes In SQLite
- STRICT Tables
- The Advantages of Flexible Typing
- Rowid Tables
- SQLite Autoincrement
- WITHOUT ROWID Tables
- SQLite Foreign Key Support
- CREATE TABLE: CHECK constraints
- SQLite Keywords
- Database Naming Standards (Ovid)
- SQL Naming Conventions (bbkane)
- UUID vs Auto-Increment (Bytebase)
- Database Design Patterns (Bytebase)
- Database Schema Design Patterns for SQLite
Performance
- SQLite Query Planning
- SQLite Query Optimizer
- SQLite EXPLAIN QUERY PLAN
- SQLite Partial Indexes
- SQLite Indexes on Expressions
- SQLite Generated Columns
- SQLite WAL Documentation
- SQLite PRAGMA Documentation
- SQLite Transaction Documentation
- SQLite JSON Functions
- CREATE TRIGGER
- Fast SQLite Inserts (avi.im)
- SQLite Optimizations for Ultra High-Performance (PowerSync)
- SQLite Performance Tuning (phiresky)
- SQLite PRAGMA Cheatsheet
- High Performance SQLite Recommended PRAGMAs
- Fly.io SQLite WAL Internals
- Use The Index, Luke - Insert
- Deep Dive into SQLite Query Optimizer
- SQLite JSON and Denormalization
- SQLite JSON Virtual Columns + Indexing
Operations
- SQLite Backup API
- SQLite ALTER TABLE documentation
- Internal Versus External BLOBs
- 35% Faster Than The Filesystem
- How To Corrupt An SQLite Database File
- Recovering Data From A Corrupt SQLite Database
- SQLite Date & Time Functions
- How SQLite Is Tested
- Appropriate Uses For SQLite
- Backup strategies for SQLite in production
- SQLite DB Migrations with PRAGMA user_version
- Simple declarative schema migration for SQLite
- OWASP SQL Injection Prevention Cheat Sheet
Sync
- SQLite Session Extension (sqlite.org)
- cr-sqlite (GitHub)
- Litestream (litestream.io)
- ElectricSQL (electric-sql.com)
- PowerSync v1.0 (powersync.com)
- Turso Offline Writes (turso.tech)
- sqlite-sync API (GitHub)
- Handling Timestamps in SQLite (sqlite.ai)
- Hybrid Logical Clocks (Sergei Turukin)
- Outbox Pattern (Milan Jovanovic)
- Transactional Outbox Pattern (microservices.io)
- CRDT Dictionary (Ian Duncan)
- CRDT Implementations (crdt.tech)
- OT vs CRDT Comparison (thom.ee)
- How We Sped Up Notion with WASM SQLite (Notion Blog)
- Reverse Engineering Linear's Sync Engine (GitHub)
- How Figma's Multiplayer Technology Works (Figma Blog)
- Expo SQLite Guide (Expo Documentation)
- Electron Database (RxDB)
- Android Room Database (Android Developers)
- Best Practices for SQLite Performance (Android Developers)
29. Decision Frameworks
Choosing a Sync Strategy
Is the server always the source of truth?
├── YES: Pull-only sync (server-wins)
│ └── Tools: PowerSync, Turso embedded replicas
└── NO: Bidirectional sync needed
│
Can you tolerate silent data loss on conflict?
├── YES: Last-Write-Wins
│ └── Simple timestamp-based resolution
└── NO:
│
Do users edit the same fields concurrently?
├── RARELY: Field-level merge
│ └── Three-way merge with base version
├── OFTEN: CRDTs or OT
│ └── Tools: cr-sqlite, ElectricSQL, sqlite-sync
└── NEVER (single-user-per-record): Version-based optimistic concurrency
└── Conflict queue for edge cases
Choosing a Sync Tool
Do you need custom business logic on writes?
├── YES: PowerSync (your backend handles writes)
└── NO:
│
Is your server database Postgres?
├── YES:
│ ├── Want CRDTs? → ElectricSQL
│ ├── Want control? → PowerSync
│ └── Want simplicity? → Turso + embedded replicas
└── NO / Multiple DBs:
├── SQLite-to-SQLite: cr-sqlite or sqlite-sync
├── Need backup only: Litestream
└── Custom: SQLite Session Extension + your own sync layer
Compiled from official SQLite documentation, benchmarks, and practitioner sources. Last updated April 2026.