Tableau ShadowDB on Azure Database for PostgreSQL

Background

Numerous Report Developers requested admin access to the Tableau production server database for running daily reports.

To avoid granting privileged access to non-admins and reduce database loads on the production database, we built a copy of the transaction database on Azure Database for PostgreSQL that updates automatically on a schedule.

Criteria

  • Day-old data is OK.
  • The size of the production database backup file is about 14GB.
  • The replica server cannot not be used for this - permissions needed to be managed separately from the master Tableau server.
  • The update process must run automatically, take less than three hours, and cause the least amount of disruption possible. It also must be located where network latency will be the least (i.e. avoid Expressroute traffic).

Givens

The following are out of scope for this post, but need to be already in place:

  • Tableau server - most recent Windows server (cloud VM in my case).
  • Tableau server backup - saved to directory on Tableau server (non-OS drive).
  • Azure Database for PostgreSQL service.

Setup Tableau Server

Get Windows sign-in credentials and access the Tableau server through a Remote Desktop connection.

Install PGTools

Download PostgreSQL v13 and install ONLY the pgAdmin4 and Command Line Tools.

Create Working Directories

Create a folder on the OS drive like c:\_tableau-shadow-database for a PowerShell script.

Create a folder on the Data drive like e:\_tableau-shadow-database for managing the updates.

Create PowerShell script

Create a c:\_tableau-shadow-database\tableau-shadow-database.ps1 file, and test each command in the script below until all commands work together.

The script performs the following tasks:

  1. Start logging to a file
  2. Unzip the backup to a working directory, and map where the pg_dump file is located (20 minutes).
  3. Connect to the Azure Database for PSQL service, restore the backup to a temporary database (2 hrs).
  4. Close connections to the old database, remove the old database, rename the temp database to the name of the working database, and set readonly user permissions on the temp database (1 min).
  5. Close the log file.
# Clear previous files
Remove-Item E:\_tableau-shadow-database\* -Recurse

# Turn logging ON
$log = "E:\_tableau-shadow-database\restore_log.txt";
Write-Host "";
Start-Transcript -Path $log;

# Unzip Tableau dump files
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Unzip Tableau dump files";

Expand-Archive -Force "E:\Program Files\Tableau\Tableau Server\path-and-backup-file.zip" "E:\_tableau-shadow-database"

# Set variables
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Set variables";

Set-Location 'C:\Program Files\PostgreSQL\13\bin\';
$env:PGPASSWORD = 'the-pg-password';
$h = 'the-database-name.postgres.database.azure.com';
$u = 'the-database-admin-name';
$p = 5432;
$maintenancedb = 'postgres';
$dumpfile = (Get-Childitem -Recurse -Path e:\_tableau-shadow-database -Include workgroup.pg_dump).FullName;

# Create empty Azure workgroup database
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Create empty Azure database"

.\createdb -h $h -p 5432 -U $u workgrouptemp

# Restore Tableau dump to temp database
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Restore Tableau dump to Azure database"

.\pg_restore -h $h -p 5432 -U $u -Fc -v -x -O -d workgrouptemp $dumpfile

# Terminate existing Azure database connections
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Terminate existing connections to Azure database";

.\psql -h $h -p $p -U $u -d $maintenancedb -c --% "SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'workgroup' AND pid <> pg_backend_pid();"

# Drop existing Azure database
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Drop existing Azure database"

.\psql -h $h -p 5432 -U $u -d $maintenancedb -c --% "DROP DATABASE workgroup;"

# Rename temp database
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Rename temp Azure database"

.\psql -h $h -p 5432 -U $u -d $maintenancedb -c --% "ALTER DATABASE workgrouptemp RENAME TO workgroup;"

# Restore Permissions on workgroup database
.\psql -h $h -p $p -U $u -d $maintenancedb -c --% "GRANT CONNECT ON DATABASE workgroup TO readonly;"
.\psql -h $h -p $p -U $u -d 'workgroup' -c --% "GRANT USAGE ON SCHEMA public TO readonly;"
.\psql -h $h -p $p -U $u -d 'workgroup' -c --% "GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;"
.\psql -h $h -p $p -U $u -d 'workgroup' -c --% "GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO readonly;"
.\psql -h $h -p $p -U $u -d 'workgroup' -c --% "ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES to readonly;"
.\psql -h $h -p $p -U $u -d 'workgroup' -c --% "ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO readonly;"
.\psql -h $h -p $p -U $u -d 'workgroup' -c --% "CREATE USER reporting_user WITH LOGIN NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION PASSWORD 'the-readonly-user-password';"
.\psql -h $h -p $p -U $u -d 'workgroup' -c --% "GRANT readonly TO reporting_user;"

# Turn logging OFF
Write-Host "";
Write-Host $(get-date -f "yyyy/MM/dd hh:mm:ss tt") "--- Job ended"
Write-Host "";
Write-Host "";
Stop-Transcript

Schedule Task (Windows)

Go to Start -> Settings -> Apps -> Find a setting -> "Task Scheduler" => Task Scheduler Library

Actions -> Create Task…

  • General
    • Name: Run Tableau ShadowDB script
    • Description: Runs Tableau Shadow Database PowerShell script
    • Security Options: Run whether user is logged on or not
  • Triggers -> New
    • Begin the task -> On a schedule, Weekly, Start: 12/4/2020 9:01PM, Recur every 1 week on Friday, Stop task if it runs longer than 1 day
  • Actions
    • Start a program -> Program/script: powershell, Add arguments -File C:_tableau-shadow-database\restore-tableau-db-to-shadow-db.ps1
  • Conditions
    • Defaults
  • Settings
    • Allow task to be run on demand
    • Stop the task if it runs longer than 1 day
    • If the running task does not end when requested, force it to stop

DB Access Permissions

See https://aws.amazon.com/blogs/database/managing-postgresql-users-and-roles/
See https://www.postgresql.org/docs/current/sql-createrole.html

Reporting Users

Define what the default "readonly" role has access to and then create a reporting user that assumes this role.

Grant connect on database
GRANT CONNECT ON DATABASE workgroup TO readonly;

Connect to [databasename] on local database cluster
\c workgroup

Grant Usage to schemas
GRANT USAGE ON SCHEMA public TO readonly;

Grant SELECT access to all current tables
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO readonly;

Grant SELECT access to all future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES to readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO readonly;

Create new user
CREATE USER reporting_user WITH LOGIN NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION PASSWORD '0nCeUp0n@Th^me';
GRANT readonly TO reporting_user;

Test user access by exiting current psql connection and logging in as reporting_user
"c:\Program Files\PostgreSQL\13\bin\psql" -h the-shadow-database.postgres.database.azure.com -p 5432 -U reporting_user@the-shadow-database -d postgres

App Users

Define a new role that has a bit more than "readonly" access to and then create an app user that assumes this role.