Imagine your web app lives on Azure Kubernetes Services (AKS). It needs fast, persistent data reads and writes, but your persistence layer is Snowflake — not exactly built for speed.

Back in 2023, the answer was open-source Redis v6.x with RedisGears handling write-behind to Snowflake. It worked — but AKS security constraints made the setup complicated, fragile, and painful to touch. Then Redis dropped its open-source license in 2024. Suddenly there are no security patches. RedisGears support dries up. By mid-2025, a Snowflake table autoincrement bug takes down the write-behind process entirely.

The solution is quietly falling apart.

Today the answer is Redis OSS v8.x. When open-source Redis returned in 2025, you rebuilt and modernized: Stream data type instead of String, configurable write-behind, and automatic restore from Snowflake for when a persistent volume goes missing.

You can preserve all user data through the migration. Utilities populate Snowflake from Redis dump files; historical data was loaded before cutover to keep downtime minimal. And you can transfer ten thousand records in under six minutes using temporary tables and bulk loading.

What You're Building

A Redis OSS cache that asynchronously moves user activity from Redis into Snowflake, running inside a Kubernetes pod with two containers:

  1. Redis (main) — stores data in a stream.
  2. Consumer (sidecar) — reads the stream and writes rows to Snowflake.

How It Flows

  1. Your UI or API sends a JSON batch with XADD to userdata-stream.
  2. Redis stores each batch as a stream entry.
  3. The consumer reads entries, builds MERGE / UPDATE SQL, and writes to Snowflake.
  4. A freshness window skips stale records unless you tell it otherwise.
  5. On an empty stream, an optional restore step seeds Redis from Snowflake.
flowchart LR
UI["APP<br>UI"] <--> UH["Userdata<br>Handler<br>API"]
UH <-->|Synchronous<br>CRUD| REDIS[("<br>REDIS<br>CACHE<br>(Stream)")]
REDIS -->|Asynchronous<br>CRUD| CONSUMER[CONSUMER]
CONSUMER --> DB[(<br>SNOWFLAKE<br>TABLE)]
DB -.->|Restore on demand| REDIS

Terminology

Term Meaning
Visited View / open action
Saved Bookmark / saved search
Deleted Saved turned off (soft delete)
Fresh Inside freshness window
Stale Outside freshness window (skip unless override)

Stream Shapes

Create User:

{"user_email":"user.name@example.com"}

Track Visited / Saved batch:

{
  "key": "userdata-events:user.name-abc123/visited",
  "records": [
    {
      "id": "user.name-abc123",
      "page_name": "search",
      "query_row_count": "25",
      "track_id": "item-01",
      "track_title": "Report 01",
      "track_type": "REPORT",
      "track_url": "/search?q=item-01",
      "lastplus": "2025-11-01 10:20:30",
      "action": "v"
    }
  ]
}

Visited items use key segment visited and action v. Saved uses /saved and action s. Delete uses action d and sets saved=0 in Snowflake.

If track_id is blank and HASH_NULL_TRACK_ID=1, the consumer builds a stable auto_<md5prefix> from title|url|id. Leave it disabled and you'll store an empty string instead.

Key Rules

Part Rule
Key path /visited or /saved decides logic
Page suffix If KEY_APPEND_PAGE=1 visited restore keys include /visited/
Action v visited, s saved, d delete
Timestamp lastplus UTC YYYY-MM-DD HH:MM:SS
Stale Outside freshness window (skip unless override)

Environment Variables

Var Default Purpose
ACK_INVALID_JSON 0 Ack invalid JSON messages
ALLOW_NO_SNOWFLAKE 0 Force dry run if Snowflake secrets incomplete
ALWAYS_PROCESS_STALE_FIRST 0 Process first stale record in a batch
AUTO_QUOTE_LASTPLUS 0 Quote bare lastplus tokens / datetimes
CONFIG_MAX_ATTEMPTS 12 Snowflake config retry attempts
CONFIG_SLEEP_SECONDS 5 Sleep between config retries
CONSUMER consumer-1 Redis consumer name
DRY_RUN 0 Log SQL only, no execute
ENABLE_FALLBACK_RECORD_LIST 1 Treat pure list of tracked dicts as batch
ENABLE_FALLBACK_SINGLE_RECORD 1 Treat lone tracked dict as single-record batch
FALLBACK_JSON_FIELDS value,payload,data Additional JSON field names to inspect
GROUP snowflake-writer Redis consumer group
HASH_NULL_TRACK_ID 0 Hash blank track_id into auto_<hash>
KEY_APPEND_PAGE 1 Append page segment to visited restore keys
KEY_PREFIX userdata-events Prefix for stream batch keys
LENIENT_JSON_REPAIR 0 Attempt brace/bracket repair on truncated JSON
LIST_STREAM_KEYS_ON_START 0 List existing stream keys at startup
LOG_LEVEL INFO Logging level
LOG_RAW 0 Log raw Redis messages
LOG_SQL 0 Log SQL statements
LOG_TRACKED_EACH 0 Log every tracked record execution
LOG_TRACKED_HASH_GEN 0 Log generated hash track_ids
MAX_JSON_REPAIR_APPEND 10 Max appended chars during repair
METRICS_INTERVAL_SECONDS 60 Metrics logging interval
NEWER_RECORD_TIME_LIMIT_SECONDS 30 Freshness window seconds
PRESERVE_TRACK_TYPE_CASE 0 Keep original case of track_type
PROCESS_STALE_SAVED 1 Allow stale saved records
REDIS_DB 0 Redis DB index
REDIS_HOST localhost Redis host
REDIS_PORT 6379 Redis port
REDIS_STREAM userdata-stream Stream name
RESTORE_BATCH_SIZE 100 Records per emitted restore message
RESTORE_DAYS 7 Look-back days for restore
RESTORE_FORCE 0 Force restore ignoring length/marker
RESTORE_INCLUDE_SAVED 1 Include saved rows in restore
RESTORE_INCLUDE_VISITED 1 Include visited rows in restore
RESTORE_LIMIT_TOTAL 2000 Max rows queried/emitted during restore
RESTORE_LOG_SAMPLE 5 Sample count logged (size only)
RESTORE_MARK_KEY userdata_stream_restored Redis marker key to prevent repeat
RESTORE_MIN_ROWS_TRIGGER 0 Skip restore if rows < threshold (unless force)
RESTORE_ON_EMPTY 1 Enable restore logic when stream empty
SKIP_FRESHNESS 0 Ignore freshness window (process all)
SNOWFLAKE_AUTO_RECONNECT 1 Reconnect on token expiration
SNOWFLAKE_PING_INTERVAL_SECONDS 300 Ping interval for Snowflake keep-alive
STREAM userdata-stream Alias for REDIS_STREAM (if used)
TARGET_LOG_IDS (blank) IDs that force detailed logging
TRACE_SQL 0 Extra SQL trace lines
UNIT_TEST 0 Use in-memory Redis stub
USERNAME_SPLIT_HYPHEN 1 Trim at first hyphen when deriving username
USERNAME_USE_RECORD_ID 0 Use record id as username for batches
VERBOSE_SNOWFLAKE_TEST 0 Log Snowflake context on connect
VERIFY_TRACKED_AFTER_WRITE 0 Run SELECT after write for sample rows
VERIFY_TRACKED_LIMIT 20 Max verify selects

Restore Logic

Restore runs once at startup when all three are true:

  • RESTORE_ON_EMPTY=1
  • Stream length is 0 (unless RESTORE_FORCE=1)
  • The marker key is absent (unless force)

It fetches recent rows grouped by user and kind (visited / saved), then emits batches, respecting KEY_APPEND_PAGE for visited. Set RESTORE_ON_EMPTY=0 to disable it entirely.

Freshness

Records older than NEWER_RECORD_TIME_LIMIT_SECONDS are skipped unless one of these applies:

  • The record is saved and PROCESS_STALE_SAVED=1
  • It's the first stale record in a batch and ALWAYS_PROCESS_STALE_FIRST=1
  • You've set SKIP_FRESHNESS=1 to turn off the window completely

Track ID

Empty track_id is stored as an empty string by default. Turn on HASH_NULL_TRACK_ID=1 and the consumer computes a stable auto_<12 hex> prefix to keep grouping consistent.

Quick Dry Run

Spin up a virtual environment and run without touching Snowflake:

python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
export DRY_RUN=1 REDIS_STREAM=userdata-stream HASH_NULL_TRACK_ID=1
python app/stream_consumer.py &
redis-cli XADD userdata-stream * value '{"user_email":"alice@example.com"}'
redis-cli XADD userdata-stream * value '{"key":"userdata-events:alice-1/visited","records":[{"id":"alice-1","page_name":"search","query_row_count":"5","track_id":"","track_title":"Sample","track_type":"QUERY","track_url":"/search?q=Sample","lastplus":"2025-11-03 12:00:00","action":"v"}]}'

Watch the logs for RECORD_EXEC and any auto_ hash lines.

Unit Tests

pytest -q

Tests cover JSON field selection, parsing and repair, username derivation, fresh vs. stale skipping, stale-saved overrides, null track_id hashing, string key import, and failover paths.

Common Checks

Task Command Expect
Ping Redis redis-cli PING PONG
Group info redis-cli XINFO GROUPS userdata-stream Group row
Last 1 entry redis-cli XREVRANGE userdata-stream + - COUNT 1 Array
Snowflake write Look for SQL (TRACKED) in logs MERGE lines
Failover event Search FAILOVER in logs Failover lines

Troubleshooting

Issue Fix
No stream growth Check UI or bridge for XADD.
Consumer silent Confirm REDIS_HOST and network.
Null track_id rows missing Set HASH_NULL_TRACK_ID=1.
Saved queries skipped Set PROCESS_STALE_SAVED=1 or fix timestamps.
Import not running Set EXTRA_BRIDGE_KEYS before start.
Bad JSON repeats Set ACK_INVALID_JSON=1 or fix source.
Failover noisy Set STREAM_FAILOVER_TO_SNOWFLAKE=0.

Minimal Deployment Env List

Apply these alongside your Snowflake secrets:

AUTO_QUOTE_LASTPLUS=1
DRY_RUN=0
HASH_NULL_TRACK_ID=1
PROCESS_STALE_SAVED=1
REDIS_CONSUMER=consumer-1
REDIS_GROUP=snowflake-writer
REDIS_STREAM=userdata-stream
SKIP_FRESHNESS=0
STREAM_FAILOVER_TO_SNOWFLAKE=1

Snowflake Verification

SELECT tracked_id, track_id, saved, visited_count, deleted_dts
FROM <DB>.<SCHEMA>.TRACKED
WHERE tracked_id = 'alice-1'
ORDER BY visited_dts DESC
LIMIT 5;

No rows? Check consumer logs for skip or failover markers.

Safe Manual Test in Pod

kubectl exec <pod> -- redis-cli XADD userdata-stream * value '{"diag":"ping","ts":"2025-11-01 10:22:00"}'
kubectl exec <pod> -- redis-cli XLEN userdata-stream

Clean Exit

Stop the consumer with Ctrl+C. Redis keeps its data.

Persistence

Your data needs to survive a pod restart, even after a persistent volume failure.

Redis writes to /data inside the container, which is where the Kubernetes Persistent Volume mounts. Two files do the work:

  • appendonly.aof — every write is appended here; stream entries survive restarts.
  • dump.rdb — periodic snapshots based on your save rules.

redis.conf minimum settings:

appendfilename appendonly.aof
appendfsync everysec
appendonly yes
dbfilename dump.rdb
dir /data
save 300 10
save 60 10000
save 900 1
  • appendfsync everysec — flush AOF once per second (small data loss window if a crash hits between flushes).
  • appendonly yes — log each XADD to the AOF file.
  • dir /data — write files to the mounted volume.
  • The save rules create snapshots on: 1 write in 900s, 10 writes in 300s, or 10,000 writes in 60s.

Testing Persistence

Basic check:

TS=$(date -u +"%Y-%m-%d %H:%M:%S")
redis-cli XADD userdata-stream * value '{"diag":"persist","ts":"'$TS'"}'
redis-cli XLEN userdata-stream
ls -lh /data
tail -n 5 /data/appendonly.aof

Restart test (in cluster):

  1. Record the current stream length: redis-cli XLEN userdata-stream.
  2. Delete the pod — the deployment recreates it.
  3. Run the length check again: redis-cli XLEN userdata-stream.
  4. It should match or be larger.

If the length resets to 0, check whether the PV is mounted in the Redis container, whether redis.conf has the right path, whether /data has permissions issues, and whether you're reading from the right DB index (redis-cli -n 1 XLEN userdata-stream targets DB 1).

To force a snapshot immediately:

redis-cli SAVE
ls -lh /data/dump.rdb

For near-zero data loss, switch to appendfsync always — at the cost of higher disk I/O. For local dev where persistence doesn't matter, leave defaults and skip the PV.

Restore: Snowflake → Redis Stream on Empty

After a pod restart with an empty /data volume, your Redis stream will be blank while Snowflake still holds the authoritative history. The restore step seeds the stream so any downstream consumers that rely on recent backlog have context to work with.

Trigger conditions — all must be true:

  • RESTORE_ON_EMPTY=1 (default)
  • Stream length is 0
  • The Redis marker key (_userdata_stream_restored by default) is absent

What happens:

  1. The consumer queries Snowflake's TRACKED table joined to USER for rows within the last RESTORE_DAYS days (default 7).
  2. Filters apply for saved / visited based on your flags.
  3. Total rows are capped at RESTORE_LIMIT_TOTAL.
  4. Rows are grouped by user + kind (visited / saved) and emitted as normal stream messages in batches of RESTORE_BATCH_SIZE.
  5. The marker key is set so the restore doesn't run again.

In dry run mode (DRY_RUN=1), the Snowflake query still runs but the XADD calls are only logged, not executed.

One thing to remember: restore seeds the stream only. It does not rewrite Snowflake. If restore runs after a true Redis data-loss event, it replays recent history for observability — Snowflake remains the source of truth.

Var Default Description
KEY_APPEND_PAGE 1 Append page segment to visited keys (/visited/<page>).
RESTORE_BATCH_SIZE 100 Records per emitted message.
RESTORE_DAYS 7 Look-back window (days).
RESTORE_FORCE 0 Force restore even if stream not empty or marker exists.
RESTORE_INCLUDE_SAVED 1 Include saved rows.
RESTORE_INCLUDE_VISITED 1 Include visited rows.
RESTORE_LIMIT_TOTAL 2000 Hard cap on rows queried/emitted.
RESTORE_LOG_SAMPLE 5 Count of sample rows to mention in logs (size only; rows themselves not logged).
RESTORE_MARK_KEY _userdata_stream_restored Redis string key used as idempotent marker.
RESTORE_MIN_ROWS_TRIGGER 0 If >0 skip restore when candidate row count is below this threshold.
RESTORE_ON_EMPTY 1 Enable feature.