Imagine your web app lives on Azure Kubernetes Services (AKS). It needs fast, persistent data reads and writes, but your persistence layer is Snowflake — not exactly built for speed.
Back in 2023, the answer was open-source Redis v6.x with RedisGears handling write-behind to Snowflake. It worked — but AKS security constraints made the setup complicated, fragile, and painful to touch. Then Redis dropped its open-source license in 2024. Suddenly there are no security patches. RedisGears support dries up. By mid-2025, a Snowflake table autoincrement bug takes down the write-behind process entirely.
The solution is quietly falling apart.
Today the answer is Redis OSS v8.x. When open-source Redis returned in 2025, you rebuilt and modernized: Stream data type instead of String, configurable write-behind, and automatic restore from Snowflake for when a persistent volume goes missing.
You can preserve all user data through the migration. Utilities populate Snowflake from Redis dump files; historical data was loaded before cutover to keep downtime minimal. And you can transfer ten thousand records in under six minutes using temporary tables and bulk loading.
What You're Building
A Redis OSS cache that asynchronously moves user activity from Redis into Snowflake, running inside a Kubernetes pod with two containers:
- Redis (main) — stores data in a stream.
- Consumer (sidecar) — reads the stream and writes rows to Snowflake.
How It Flows
- Your UI or API sends a JSON batch with
XADDtouserdata-stream. - Redis stores each batch as a stream entry.
- The consumer reads entries, builds MERGE / UPDATE SQL, and writes to Snowflake.
- A freshness window skips stale records unless you tell it otherwise.
- On an empty stream, an optional restore step seeds Redis from Snowflake.
flowchart LR
UI["APP<br>UI"] <--> UH["Userdata<br>Handler<br>API"]
UH <-->|Synchronous<br>CRUD| REDIS[("<br>REDIS<br>CACHE<br>(Stream)")]
REDIS -->|Asynchronous<br>CRUD| CONSUMER[CONSUMER]
CONSUMER --> DB[(<br>SNOWFLAKE<br>TABLE)]
DB -.->|Restore on demand| REDIS
Terminology
| Term | Meaning |
|---|---|
| Visited | View / open action |
| Saved | Bookmark / saved search |
| Deleted | Saved turned off (soft delete) |
| Fresh | Inside freshness window |
| Stale | Outside freshness window (skip unless override) |
Stream Shapes
Create User:
{"user_email":"user.name@example.com"}
Track Visited / Saved batch:
{
"key": "userdata-events:user.name-abc123/visited",
"records": [
{
"id": "user.name-abc123",
"page_name": "search",
"query_row_count": "25",
"track_id": "item-01",
"track_title": "Report 01",
"track_type": "REPORT",
"track_url": "/search?q=item-01",
"lastplus": "2025-11-01 10:20:30",
"action": "v"
}
]
}
Visited items use key segment visited and action v. Saved uses /saved and action s. Delete uses action d and sets saved=0 in Snowflake.
If track_id is blank and HASH_NULL_TRACK_ID=1, the consumer builds a stable auto_<md5prefix> from title|url|id. Leave it disabled and you'll store an empty string instead.
Key Rules
| Part | Rule |
|---|---|
| Key path | /visited or /saved decides logic |
| Page suffix | If KEY_APPEND_PAGE=1 visited restore keys include /visited/ |
| Action | v visited, s saved, d delete |
| Timestamp | lastplus UTC YYYY-MM-DD HH:MM:SS |
| Stale | Outside freshness window (skip unless override) |
Environment Variables
| Var | Default | Purpose |
|---|---|---|
| ACK_INVALID_JSON | 0 | Ack invalid JSON messages |
| ALLOW_NO_SNOWFLAKE | 0 | Force dry run if Snowflake secrets incomplete |
| ALWAYS_PROCESS_STALE_FIRST | 0 | Process first stale record in a batch |
| AUTO_QUOTE_LASTPLUS | 0 | Quote bare lastplus tokens / datetimes |
| CONFIG_MAX_ATTEMPTS | 12 | Snowflake config retry attempts |
| CONFIG_SLEEP_SECONDS | 5 | Sleep between config retries |
| CONSUMER | consumer-1 | Redis consumer name |
| DRY_RUN | 0 | Log SQL only, no execute |
| ENABLE_FALLBACK_RECORD_LIST | 1 | Treat pure list of tracked dicts as batch |
| ENABLE_FALLBACK_SINGLE_RECORD | 1 | Treat lone tracked dict as single-record batch |
| FALLBACK_JSON_FIELDS | value,payload,data | Additional JSON field names to inspect |
| GROUP | snowflake-writer | Redis consumer group |
| HASH_NULL_TRACK_ID | 0 | Hash blank track_id into auto_<hash> |
| KEY_APPEND_PAGE | 1 | Append page segment to visited restore keys |
| KEY_PREFIX | userdata-events | Prefix for stream batch keys |
| LENIENT_JSON_REPAIR | 0 | Attempt brace/bracket repair on truncated JSON |
| LIST_STREAM_KEYS_ON_START | 0 | List existing stream keys at startup |
| LOG_LEVEL | INFO | Logging level |
| LOG_RAW | 0 | Log raw Redis messages |
| LOG_SQL | 0 | Log SQL statements |
| LOG_TRACKED_EACH | 0 | Log every tracked record execution |
| LOG_TRACKED_HASH_GEN | 0 | Log generated hash track_ids |
| MAX_JSON_REPAIR_APPEND | 10 | Max appended chars during repair |
| METRICS_INTERVAL_SECONDS | 60 | Metrics logging interval |
| NEWER_RECORD_TIME_LIMIT_SECONDS | 30 | Freshness window seconds |
| PRESERVE_TRACK_TYPE_CASE | 0 | Keep original case of track_type |
| PROCESS_STALE_SAVED | 1 | Allow stale saved records |
| REDIS_DB | 0 | Redis DB index |
| REDIS_HOST | localhost | Redis host |
| REDIS_PORT | 6379 | Redis port |
| REDIS_STREAM | userdata-stream | Stream name |
| RESTORE_BATCH_SIZE | 100 | Records per emitted restore message |
| RESTORE_DAYS | 7 | Look-back days for restore |
| RESTORE_FORCE | 0 | Force restore ignoring length/marker |
| RESTORE_INCLUDE_SAVED | 1 | Include saved rows in restore |
| RESTORE_INCLUDE_VISITED | 1 | Include visited rows in restore |
| RESTORE_LIMIT_TOTAL | 2000 | Max rows queried/emitted during restore |
| RESTORE_LOG_SAMPLE | 5 | Sample count logged (size only) |
| RESTORE_MARK_KEY | userdata_stream_restored | Redis marker key to prevent repeat |
| RESTORE_MIN_ROWS_TRIGGER | 0 | Skip restore if rows < threshold (unless force) |
| RESTORE_ON_EMPTY | 1 | Enable restore logic when stream empty |
| SKIP_FRESHNESS | 0 | Ignore freshness window (process all) |
| SNOWFLAKE_AUTO_RECONNECT | 1 | Reconnect on token expiration |
| SNOWFLAKE_PING_INTERVAL_SECONDS | 300 | Ping interval for Snowflake keep-alive |
| STREAM | userdata-stream | Alias for REDIS_STREAM (if used) |
| TARGET_LOG_IDS | (blank) | IDs that force detailed logging |
| TRACE_SQL | 0 | Extra SQL trace lines |
| UNIT_TEST | 0 | Use in-memory Redis stub |
| USERNAME_SPLIT_HYPHEN | 1 | Trim at first hyphen when deriving username |
| USERNAME_USE_RECORD_ID | 0 | Use record id as username for batches |
| VERBOSE_SNOWFLAKE_TEST | 0 | Log Snowflake context on connect |
| VERIFY_TRACKED_AFTER_WRITE | 0 | Run SELECT after write for sample rows |
| VERIFY_TRACKED_LIMIT | 20 | Max verify selects |
Restore Logic
Restore runs once at startup when all three are true:
RESTORE_ON_EMPTY=1- Stream length is 0 (unless
RESTORE_FORCE=1) - The marker key is absent (unless force)
It fetches recent rows grouped by user and kind (visited / saved), then emits batches, respecting KEY_APPEND_PAGE for visited. Set RESTORE_ON_EMPTY=0 to disable it entirely.
Freshness
Records older than NEWER_RECORD_TIME_LIMIT_SECONDS are skipped unless one of these applies:
- The record is saved and
PROCESS_STALE_SAVED=1 - It's the first stale record in a batch and
ALWAYS_PROCESS_STALE_FIRST=1 - You've set
SKIP_FRESHNESS=1to turn off the window completely
Track ID
Empty track_id is stored as an empty string by default. Turn on HASH_NULL_TRACK_ID=1 and the consumer computes a stable auto_<12 hex> prefix to keep grouping consistent.
Quick Dry Run
Spin up a virtual environment and run without touching Snowflake:
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
export DRY_RUN=1 REDIS_STREAM=userdata-stream HASH_NULL_TRACK_ID=1
python app/stream_consumer.py &
redis-cli XADD userdata-stream * value '{"user_email":"alice@example.com"}'
redis-cli XADD userdata-stream * value '{"key":"userdata-events:alice-1/visited","records":[{"id":"alice-1","page_name":"search","query_row_count":"5","track_id":"","track_title":"Sample","track_type":"QUERY","track_url":"/search?q=Sample","lastplus":"2025-11-03 12:00:00","action":"v"}]}'
Watch the logs for RECORD_EXEC and any auto_ hash lines.
Unit Tests
pytest -q
Tests cover JSON field selection, parsing and repair, username derivation, fresh vs. stale skipping, stale-saved overrides, null track_id hashing, string key import, and failover paths.
Common Checks
| Task | Command | Expect |
|---|---|---|
| Ping Redis | redis-cli PING |
PONG |
| Group info | redis-cli XINFO GROUPS userdata-stream |
Group row |
| Last 1 entry | redis-cli XREVRANGE userdata-stream + - COUNT 1 |
Array |
| Snowflake write | Look for SQL (TRACKED) in logs |
MERGE lines |
| Failover event | Search FAILOVER in logs |
Failover lines |
Troubleshooting
| Issue | Fix |
|---|---|
| No stream growth | Check UI or bridge for XADD. |
| Consumer silent | Confirm REDIS_HOST and network. |
Null track_id rows missing |
Set HASH_NULL_TRACK_ID=1. |
| Saved queries skipped | Set PROCESS_STALE_SAVED=1 or fix timestamps. |
| Import not running | Set EXTRA_BRIDGE_KEYS before start. |
| Bad JSON repeats | Set ACK_INVALID_JSON=1 or fix source. |
| Failover noisy | Set STREAM_FAILOVER_TO_SNOWFLAKE=0. |
Minimal Deployment Env List
Apply these alongside your Snowflake secrets:
AUTO_QUOTE_LASTPLUS=1
DRY_RUN=0
HASH_NULL_TRACK_ID=1
PROCESS_STALE_SAVED=1
REDIS_CONSUMER=consumer-1
REDIS_GROUP=snowflake-writer
REDIS_STREAM=userdata-stream
SKIP_FRESHNESS=0
STREAM_FAILOVER_TO_SNOWFLAKE=1
Snowflake Verification
SELECT tracked_id, track_id, saved, visited_count, deleted_dts
FROM <DB>.<SCHEMA>.TRACKED
WHERE tracked_id = 'alice-1'
ORDER BY visited_dts DESC
LIMIT 5;
No rows? Check consumer logs for skip or failover markers.
Safe Manual Test in Pod
kubectl exec <pod> -- redis-cli XADD userdata-stream * value '{"diag":"ping","ts":"2025-11-01 10:22:00"}'
kubectl exec <pod> -- redis-cli XLEN userdata-stream
Clean Exit
Stop the consumer with Ctrl+C. Redis keeps its data.
Persistence
Your data needs to survive a pod restart, even after a persistent volume failure.
Redis writes to /data inside the container, which is where the Kubernetes Persistent Volume mounts. Two files do the work:
appendonly.aof— every write is appended here; stream entries survive restarts.dump.rdb— periodic snapshots based on yoursaverules.
redis.conf minimum settings:
appendfilename appendonly.aof
appendfsync everysec
appendonly yes
dbfilename dump.rdb
dir /data
save 300 10
save 60 10000
save 900 1
appendfsync everysec— flush AOF once per second (small data loss window if a crash hits between flushes).appendonly yes— log eachXADDto the AOF file.dir /data— write files to the mounted volume.- The
saverules create snapshots on: 1 write in 900s, 10 writes in 300s, or 10,000 writes in 60s.
Testing Persistence
Basic check:
TS=$(date -u +"%Y-%m-%d %H:%M:%S")
redis-cli XADD userdata-stream * value '{"diag":"persist","ts":"'$TS'"}'
redis-cli XLEN userdata-stream
ls -lh /data
tail -n 5 /data/appendonly.aof
Restart test (in cluster):
- Record the current stream length:
redis-cli XLEN userdata-stream. - Delete the pod — the deployment recreates it.
- Run the length check again:
redis-cli XLEN userdata-stream. - It should match or be larger.
If the length resets to 0, check whether the PV is mounted in the Redis container, whether redis.conf has the right path, whether /data has permissions issues, and whether you're reading from the right DB index (redis-cli -n 1 XLEN userdata-stream targets DB 1).
To force a snapshot immediately:
redis-cli SAVE
ls -lh /data/dump.rdb
For near-zero data loss, switch to appendfsync always — at the cost of higher disk I/O. For local dev where persistence doesn't matter, leave defaults and skip the PV.
Restore: Snowflake → Redis Stream on Empty
After a pod restart with an empty /data volume, your Redis stream will be blank while Snowflake still holds the authoritative history. The restore step seeds the stream so any downstream consumers that rely on recent backlog have context to work with.
Trigger conditions — all must be true:
RESTORE_ON_EMPTY=1(default)- Stream length is 0
- The Redis marker key (
_userdata_stream_restoredby default) is absent
What happens:
- The consumer queries Snowflake's
TRACKEDtable joined toUSERfor rows within the lastRESTORE_DAYSdays (default 7). - Filters apply for saved / visited based on your flags.
- Total rows are capped at
RESTORE_LIMIT_TOTAL. - Rows are grouped by user + kind (visited / saved) and emitted as normal stream messages in batches of
RESTORE_BATCH_SIZE. - The marker key is set so the restore doesn't run again.
In dry run mode (DRY_RUN=1), the Snowflake query still runs but the XADD calls are only logged, not executed.
One thing to remember: restore seeds the stream only. It does not rewrite Snowflake. If restore runs after a true Redis data-loss event, it replays recent history for observability — Snowflake remains the source of truth.
| Var | Default | Description |
|---|---|---|
KEY_APPEND_PAGE |
1 | Append page segment to visited keys (/visited/<page>). |
RESTORE_BATCH_SIZE |
100 | Records per emitted message. |
RESTORE_DAYS |
7 | Look-back window (days). |
RESTORE_FORCE |
0 | Force restore even if stream not empty or marker exists. |
RESTORE_INCLUDE_SAVED |
1 | Include saved rows. |
RESTORE_INCLUDE_VISITED |
1 | Include visited rows. |
RESTORE_LIMIT_TOTAL |
2000 | Hard cap on rows queried/emitted. |
RESTORE_LOG_SAMPLE |
5 | Count of sample rows to mention in logs (size only; rows themselves not logged). |
RESTORE_MARK_KEY |
_userdata_stream_restored |
Redis string key used as idempotent marker. |
RESTORE_MIN_ROWS_TRIGGER |
0 | If >0 skip restore when candidate row count is below this threshold. |
RESTORE_ON_EMPTY |
1 | Enable feature. |
