Malcolm 'Max' DeRungs

Imagine your web app lives on Azure Kubernetes Services (AKS). It needs fast, persistent data reads and writes, but your persistence layer is Snowflake — not exactly built for speed.

Back in 2023, the answer was open-source Redis v6.x with RedisGears handling write-behind to Snowflake. It worked — but AKS security constraints made the setup complicated, fragile, and painful to touch. Then Redis dropped its open-source license in 2024. Suddenly there are no security patches. RedisGears support dries up. By mid-2025, a Snowflake table autoincrement bug takes down the write-behind process entirely.

The solution is quietly falling apart.

Today the answer is Redis OSS v8.x. When open-source Redis returned in 2025, you rebuilt and modernized: Stream data type instead of String, configurable write-behind, and automatic restore from Snowflake for when a persistent volume goes missing.

You can preserve all user data through the migration. Utilities populate Snowflake from Redis dump files; historical data was loaded before cutover to keep downtime minimal. And you can transfer ten thousand records in under six minutes using temporary tables and bulk loading.

What You're Building

A Redis OSS cache that asynchronously moves user activity from Redis into Snowflake, running inside a Kubernetes pod with two containers:

Redis (main) — stores data in a stream.
Consumer (sidecar) — reads the stream and writes rows to Snowflake.

How It Flows

Your UI or API sends a JSON batch with XADD to userdata-stream.
Redis stores each batch as a stream entry.
The consumer reads entries, builds MERGE / UPDATE SQL, and writes to Snowflake.
A freshness window skips stale records unless you tell it otherwise.
On an empty stream, an optional restore step seeds Redis from Snowflake.

flowchart LR
UI["APP<br>UI"] <--> UH["Userdata<br>Handler<br>API"]
UH <-->|Synchronous<br>CRUD| REDIS[("<br>REDIS<br>CACHE<br>(Stream)")]
REDIS -->|Asynchronous<br>CRUD| CONSUMER[CONSUMER]
CONSUMER --> DB[(<br>SNOWFLAKE<br>TABLE)]
DB -.->|Restore on demand| REDIS

Terminology

Term	Meaning
Visited	View / open action
Saved	Bookmark / saved search
Deleted	Saved turned off (soft delete)
Fresh	Inside freshness window
Stale	Outside freshness window (skip unless override)

Stream Shapes

Create User:

{"user_email":"user.name@example.com"}

Track Visited / Saved batch:

{
  "key": "userdata-events:user.name-abc123/visited",
  "records": [
    {
      "id": "user.name-abc123",
      "page_name": "search",
      "query_row_count": "25",
      "track_id": "item-01",
      "track_title": "Report 01",
      "track_type": "REPORT",
      "track_url": "/search?q=item-01",
      "lastplus": "2025-11-01 10:20:30",
      "action": "v"
    }
  ]
}

Visited items use key segment visited and action v. Saved uses /saved and action s. Delete uses action d and sets saved=0 in Snowflake.

If track_id is blank and HASH_NULL_TRACK_ID=1, the consumer builds a stable auto_<md5prefix> from title|url|id. Leave it disabled and you'll store an empty string instead.

Key Rules

Part	Rule
Key path	`/visited` or `/saved` decides logic
Page suffix	If `KEY_APPEND_PAGE=1` visited restore keys include `/visited/`
Action	`v` visited, `s` saved, `d` delete
Timestamp	`lastplus` UTC `YYYY-MM-DD HH:MM:SS`
Stale	Outside freshness window (skip unless override)

Environment Variables

Var	Default	Purpose
ACK_INVALID_JSON	0	Ack invalid JSON messages
ALLOW_NO_SNOWFLAKE	0	Force dry run if Snowflake secrets incomplete
ALWAYS_PROCESS_STALE_FIRST	0	Process first stale record in a batch
AUTO_QUOTE_LASTPLUS	0	Quote bare `lastplus` tokens / datetimes
CONFIG_MAX_ATTEMPTS	12	Snowflake config retry attempts
CONFIG_SLEEP_SECONDS	5	Sleep between config retries
CONSUMER	consumer-1	Redis consumer name
DRY_RUN	0	Log SQL only, no execute
ENABLE_FALLBACK_RECORD_LIST	1	Treat pure list of tracked dicts as batch
ENABLE_FALLBACK_SINGLE_RECORD	1	Treat lone tracked dict as single-record batch
FALLBACK_JSON_FIELDS	value,payload,data	Additional JSON field names to inspect
GROUP	snowflake-writer	Redis consumer group
HASH_NULL_TRACK_ID	0	Hash blank track_id into `auto_<hash>`
KEY_APPEND_PAGE	1	Append page segment to visited restore keys
KEY_PREFIX	userdata-events	Prefix for stream batch keys
LENIENT_JSON_REPAIR	0	Attempt brace/bracket repair on truncated JSON
LIST_STREAM_KEYS_ON_START	0	List existing stream keys at startup
LOG_LEVEL	INFO	Logging level
LOG_RAW	0	Log raw Redis messages
LOG_SQL	0	Log SQL statements
LOG_TRACKED_EACH	0	Log every tracked record execution
LOG_TRACKED_HASH_GEN	0	Log generated hash track_ids
MAX_JSON_REPAIR_APPEND	10	Max appended chars during repair
METRICS_INTERVAL_SECONDS	60	Metrics logging interval
NEWER_RECORD_TIME_LIMIT_SECONDS	30	Freshness window seconds
PRESERVE_TRACK_TYPE_CASE	0	Keep original case of track_type
PROCESS_STALE_SAVED	1	Allow stale saved records
REDIS_DB	0	Redis DB index
REDIS_HOST	localhost	Redis host
REDIS_PORT	6379	Redis port
REDIS_STREAM	userdata-stream	Stream name
RESTORE_BATCH_SIZE	100	Records per emitted restore message
RESTORE_DAYS	7	Look-back days for restore
RESTORE_FORCE	0	Force restore ignoring length/marker
RESTORE_INCLUDE_SAVED	1	Include saved rows in restore
RESTORE_INCLUDE_VISITED	1	Include visited rows in restore
RESTORE_LIMIT_TOTAL	2000	Max rows queried/emitted during restore
RESTORE_LOG_SAMPLE	5	Sample count logged (size only)
RESTORE_MARK_KEY	userdata_stream_restored	Redis marker key to prevent repeat
RESTORE_MIN_ROWS_TRIGGER	0	Skip restore if rows < threshold (unless force)
RESTORE_ON_EMPTY	1	Enable restore logic when stream empty
SKIP_FRESHNESS	0	Ignore freshness window (process all)
SNOWFLAKE_AUTO_RECONNECT	1	Reconnect on token expiration
SNOWFLAKE_PING_INTERVAL_SECONDS	300	Ping interval for Snowflake keep-alive
STREAM	userdata-stream	Alias for REDIS_STREAM (if used)
TARGET_LOG_IDS	(blank)	IDs that force detailed logging
TRACE_SQL	0	Extra SQL trace lines
UNIT_TEST	0	Use in-memory Redis stub
USERNAME_SPLIT_HYPHEN	1	Trim at first hyphen when deriving username
USERNAME_USE_RECORD_ID	0	Use record id as username for batches
VERBOSE_SNOWFLAKE_TEST	0	Log Snowflake context on connect
VERIFY_TRACKED_AFTER_WRITE	0	Run SELECT after write for sample rows
VERIFY_TRACKED_LIMIT	20	Max verify selects

Restore Logic

Restore runs once at startup when all three are true:

RESTORE_ON_EMPTY=1
Stream length is 0 (unless RESTORE_FORCE=1)
The marker key is absent (unless force)

It fetches recent rows grouped by user and kind (visited / saved), then emits batches, respecting KEY_APPEND_PAGE for visited. Set RESTORE_ON_EMPTY=0 to disable it entirely.

Freshness

Records older than NEWER_RECORD_TIME_LIMIT_SECONDS are skipped unless one of these applies:

The record is saved and PROCESS_STALE_SAVED=1
It's the first stale record in a batch and ALWAYS_PROCESS_STALE_FIRST=1
You've set SKIP_FRESHNESS=1 to turn off the window completely

Track ID

Empty track_id is stored as an empty string by default. Turn on HASH_NULL_TRACK_ID=1 and the consumer computes a stable auto_<12 hex> prefix to keep grouping consistent.

Quick Dry Run

Spin up a virtual environment and run without touching Snowflake:

python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
export DRY_RUN=1 REDIS_STREAM=userdata-stream HASH_NULL_TRACK_ID=1
python app/stream_consumer.py &
redis-cli XADD userdata-stream * value '{"user_email":"alice@example.com"}'
redis-cli XADD userdata-stream * value '{"key":"userdata-events:alice-1/visited","records":[{"id":"alice-1","page_name":"search","query_row_count":"5","track_id":"","track_title":"Sample","track_type":"QUERY","track_url":"/search?q=Sample","lastplus":"2025-11-03 12:00:00","action":"v"}]}'

Watch the logs for RECORD_EXEC and any auto_ hash lines.

Unit Tests

pytest -q

Tests cover JSON field selection, parsing and repair, username derivation, fresh vs. stale skipping, stale-saved overrides, null track_id hashing, string key import, and failover paths.

Common Checks

Task	Command	Expect
Ping Redis	`redis-cli PING`	`PONG`
Group info	`redis-cli XINFO GROUPS userdata-stream`	Group row
Last 1 entry	`redis-cli XREVRANGE userdata-stream + - COUNT 1`	Array
Snowflake write	Look for `SQL (TRACKED)` in logs	MERGE lines
Failover event	Search `FAILOVER` in logs	Failover lines

Troubleshooting

Issue	Fix
No stream growth	Check UI or bridge for XADD.
Consumer silent	Confirm `REDIS_HOST` and network.
Null `track_id` rows missing	Set `HASH_NULL_TRACK_ID=1`.
Saved queries skipped	Set `PROCESS_STALE_SAVED=1` or fix timestamps.
Import not running	Set `EXTRA_BRIDGE_KEYS` before start.
Bad JSON repeats	Set `ACK_INVALID_JSON=1` or fix source.
Failover noisy	Set `STREAM_FAILOVER_TO_SNOWFLAKE=0`.

Minimal Deployment Env List

Apply these alongside your Snowflake secrets:

AUTO_QUOTE_LASTPLUS=1
DRY_RUN=0
HASH_NULL_TRACK_ID=1
PROCESS_STALE_SAVED=1
REDIS_CONSUMER=consumer-1
REDIS_GROUP=snowflake-writer
REDIS_STREAM=userdata-stream
SKIP_FRESHNESS=0
STREAM_FAILOVER_TO_SNOWFLAKE=1

Snowflake Verification

SELECT tracked_id, track_id, saved, visited_count, deleted_dts
FROM <DB>.<SCHEMA>.TRACKED
WHERE tracked_id = 'alice-1'
ORDER BY visited_dts DESC
LIMIT 5;

No rows? Check consumer logs for skip or failover markers.

Safe Manual Test in Pod

kubectl exec <pod> -- redis-cli XADD userdata-stream * value '{"diag":"ping","ts":"2025-11-01 10:22:00"}'
kubectl exec <pod> -- redis-cli XLEN userdata-stream

Clean Exit

Stop the consumer with Ctrl+C. Redis keeps its data.

Persistence

Your data needs to survive a pod restart, even after a persistent volume failure.

Redis writes to /data inside the container, which is where the Kubernetes Persistent Volume mounts. Two files do the work:

appendonly.aof — every write is appended here; stream entries survive restarts.
dump.rdb — periodic snapshots based on your save rules.

redis.conf minimum settings:

appendfilename appendonly.aof
appendfsync everysec
appendonly yes
dbfilename dump.rdb
dir /data
save 300 10
save 60 10000
save 900 1

appendfsync everysec — flush AOF once per second (small data loss window if a crash hits between flushes).
appendonly yes — log each XADD to the AOF file.
dir /data — write files to the mounted volume.
The save rules create snapshots on: 1 write in 900s, 10 writes in 300s, or 10,000 writes in 60s.

Testing Persistence

Basic check:

TS=$(date -u +"%Y-%m-%d %H:%M:%S")
redis-cli XADD userdata-stream * value '{"diag":"persist","ts":"'$TS'"}'
redis-cli XLEN userdata-stream
ls -lh /data
tail -n 5 /data/appendonly.aof

Restart test (in cluster):

Record the current stream length: redis-cli XLEN userdata-stream.
Delete the pod — the deployment recreates it.
Run the length check again: redis-cli XLEN userdata-stream.
It should match or be larger.

If the length resets to 0, check whether the PV is mounted in the Redis container, whether redis.conf has the right path, whether /data has permissions issues, and whether you're reading from the right DB index (redis-cli -n 1 XLEN userdata-stream targets DB 1).

To force a snapshot immediately:

redis-cli SAVE
ls -lh /data/dump.rdb

For near-zero data loss, switch to appendfsync always — at the cost of higher disk I/O. For local dev where persistence doesn't matter, leave defaults and skip the PV.

Restore: Snowflake → Redis Stream on Empty

After a pod restart with an empty /data volume, your Redis stream will be blank while Snowflake still holds the authoritative history. The restore step seeds the stream so any downstream consumers that rely on recent backlog have context to work with.

Trigger conditions — all must be true:

RESTORE_ON_EMPTY=1 (default)
Stream length is 0
The Redis marker key (_userdata_stream_restored by default) is absent

What happens:

The consumer queries Snowflake's TRACKED table joined to USER for rows within the last RESTORE_DAYS days (default 7).
Filters apply for saved / visited based on your flags.
Total rows are capped at RESTORE_LIMIT_TOTAL.
Rows are grouped by user + kind (visited / saved) and emitted as normal stream messages in batches of RESTORE_BATCH_SIZE.
The marker key is set so the restore doesn't run again.

In dry run mode (DRY_RUN=1), the Snowflake query still runs but the XADD calls are only logged, not executed.

One thing to remember: restore seeds the stream only. It does not rewrite Snowflake. If restore runs after a true Redis data-loss event, it replays recent history for observability — Snowflake remains the source of truth.

Var	Default	Description
`KEY_APPEND_PAGE`	1	Append page segment to visited keys (`/visited/<page>`).
`RESTORE_BATCH_SIZE`	100	Records per emitted message.
`RESTORE_DAYS`	7	Look-back window (days).
`RESTORE_FORCE`	0	Force restore even if stream not empty or marker exists.
`RESTORE_INCLUDE_SAVED`	1	Include saved rows.
`RESTORE_INCLUDE_VISITED`	1	Include visited rows.
`RESTORE_LIMIT_TOTAL`	2000	Hard cap on rows queried/emitted.
`RESTORE_LOG_SAMPLE`	5	Count of sample rows to mention in logs (size only; rows themselves not logged).
`RESTORE_MARK_KEY`	`_userdata_stream_restored`	Redis string key used as idempotent marker.
`RESTORE_MIN_ROWS_TRIGGER`	0	If >0 skip restore when candidate row count is below this threshold.
`RESTORE_ON_EMPTY`	1	Enable feature.

Redis with write-behind