Notes on using Python for working with Tableau, Solr and Lucidworks Fusion server data.
Environment
Install Python (Windows) choco install python3
Install PyCharm choco install pycharm-community
Install Tableau Server Client (TSC) pip install tableauserverclient
Notes:
The Tableau REST API does not allow substring filtering (has operator) on anything other than Query_Jobs. In other words, you cannot return workbooks with the string "foo" in the title. The in operator is available on workbooks, datasources and views, but is only meant to work on tags. Otherwise, the equals operator takes an exact string. See reference.
Choice! Return ALL workbooks from server, store results in memory or file, iterate over results in using Python to return items by substring.
The Tableau server-client is currently missing a get-view-by-id method for views, and a for loop using the get-workbook-by-id is slow.
Choice! Use python REST instead of the tableau-server-client for tagging.
Projects don't have a site_id. Need to authenticate to each site individually and find what projects the site contains.
Code Stuff
Code - Tag Tableau Workbooks and Views
Individual Tableau report developers can easily make their products more find-able on a Tableau server by manually adding tags to their views, workbooks and data sources. But, let's say non-techy business users want to scan 20 thousand Tableau data assets and conditionally apply something like a "Confidential" tag on many assets at once without involving IT. This might call for a Python script that works with the Tableau REST API and follows the values specified in a CSV file.
Declare imports
The Tableau Server/Client is used as a fast way to get all Tableau Workbooks and Views. The REST module is used to add tags to the Tableau server by LUID. The XML parsing module is used for POSTing data to Tableau (JSON could be used, but the environment I'm working with here prefers XML) The CSV parser gets input from a CSV file.
# -------------------# Imports# -------------------# Import the Tableau Server-Client moduleimport tableauserverclient as TSC
# Import the logging modulesimport logging, logging.config
# Import the REST moduleimport requests
# Import the XML parsing moduleimport untangle
# Import the csv parserimport csv
Python
Log activities
During development, the logger can be set to stream feedback to the IDE. For implementation, the logger can be set to log to a file. The file is useful for validating and documenting the work that's been done. The code includes a statement for filtering out some of the non-root Tableau logging feedback.
Locate the CSV file in the current directory and get the Tableau Sites and Projects that needs to be processed. This is the precursor to getting the CSV file from a service.
# -------------------# Data# -------------------# Get a list of items to process from CSV file
dataset =[]withopen('tableau_security_tags_list.csv')as csvDataFile:
csvReader = csv.reader(csvDataFile)
count =0for row in csvReader:iflen(row[5])>0:if count >0:
dataset.append([row[0], row[1], row[2], row[3], row[4], row[5]])
count = count +1# logging.info('Process the following data:')# logging.info(dataset)# Loop through the items list and build list of unique URLs
logging.info('Process the following urls:')
listOfUrls =[]
listOfUrlsTest =[]for item in dataset:
u = item[1].strip()
s = item[2].strip()id= u +'-'+ s.replace(' ','-')ifidnotin listOfUrlsTest:
listOfUrlsTest.append(id)# Add url to list
listOfUrls.append([id, u])
logging.info(u)# Loop through the items list and build list of unique sites
logging.info("*************************")
logging.info('Process the following sites:')
listOfSites =[]
listOfSitesTest =[]for item in dataset:
u = item[1].strip()
s = item[2].strip()id= u +'-'+ s.replace(' ','-')ifidnotin listOfSitesTest:
listOfSitesTest.append(id)# Add site to list
listOfSites.append([id, u, s])
logging.info(s)# Loop through the items list and build list of projects
logging.info("*************************")
logging.info('Process the following projects:')
listOfProjects =[]for item in dataset:
u = item[1].strip()
s = item[2].strip()
p = item[3].strip()
t = item[4].strip()
a = item[5].strip()id= u +'-'+ s.replace(' ','-')+ p.replace(' ','-')# Add project to list
listOfProjects.append([id, s, p, t, a])
logging.info(p)
Python
Set variables, cache and test Tableau assets
This is the meant and potatoes work of applying the tags. Tableau requires separate logins for each site. There's a bit of trickiness around for the "default" site that does not apply to named sites.
# -------------------# Process the tags# -------------------# Process data for each url separatelyfor url in listOfUrls:
logging.info("*************************")# Server admin credentials
tableauSignIn ='AD Service Account username'
tableauPassword ='AD Service Account password'# Set the Tableau server-client url
server = TSC.Server(url[1])# Set the API version
server.use_server_version()
api_version = server.version
# Set the Tableau Python API sign-in authorization (needs to be server admin)
initialTableauAuth = TSC.TableauAuth(tableauSignIn, tableauPassword)# Cache a list of site LUIDs
site_cache ={}with server.auth.sign_in(initialTableauAuth):for luid in TSC.Pager(server.sites):
site_cache[luid.id]= luid.name
logging.info("site luid is {"+"\n".join("{!r}: {!r},".format(k, v)for k, v in site_cache.items())+"}")
response = requests.request("POST", url[1]+'/api/'+ api_version +'/auth/signout')
logging.info(listOfSites)# Get a list of sites related to the urlfor site in listOfSites:if site[0]== url[0]:# Set the site LUID used for the REST calls
siteId =''for key, value in site_cache.items():if value.lower()== site[2].lower():
siteId = key
# Set the site name
siteName =''if value.lower()=='default':
siteName =""else:
siteName = value.replace(' ','')# Set the authorization string = TSC.TableauAuth('USERNAME', 'PASSWORD', site_id='CONTENTURL')# Note: CONTENTURL is "url_namespace" for the site in the Postgres Sites table# Note: CONTENTURL has no spaces# Note: CONTENTURL for the "Default" site is an empty string# Note: CONTENTURL "Default" site is not always named "Default"# Note: CONTENTURL does not handle being a variable with empty quotes, so create with quotes belowif(site[2].lower()=='default')or(site[2].lower()=='sjhs tableau site'):
tableauAuth = TSC.TableauAuth(tableauSignIn, tableauPassword, site_id="")else:
tableauAuth = TSC.TableauAuth(tableauSignIn, tableauPassword, site_id=site[2].replace(' ',''))# Sign in to a specific sitewith server.auth.sign_in(tableauAuth):
logging.info("")
logging.info('Signing in to the "'+ site[2]+'" site')# Set the Tableau REST sign-in token
auth_url = url[1]+'/api/'+ server.version +'/auth/signin'
auth_payload ='<tsRequest><credentials name="'+ tableauSignIn +'" password="'+ tableauPassword +'"><site contentUrl="'+ siteName +'" /></credentials></tsRequest>'
auth_headers ={'Content-Type':'application/xml'}
auth_response = requests.request("POST", auth_url, headers=auth_headers, data=auth_payload)
auth_response_obj = untangle.parse(auth_response.text)
auth_token = auth_response_obj.tsResponse.credentials['token']# Set the XML REST headers with the auth token
xml_headers ={'X-Tableau-Auth': auth_token,'Content-Type':'text/plain'}# Cache assets locallywith server.auth.sign_in(tableauAuth):
logging.info('')
logging.info('Caching projects, workbooks, and views data in "'+ site[2]+'" site')# Cache all projects# -------------------
project_counter, project_cache =0,{}for project in TSC.Pager(server.projects):
project_cache[project_counter]=(project.id, project.name)
project_counter = project_counter +1# logging.info("{" + "\n".join("{!r}: {!r},".format(k, v) for k, v in project_cache.items()) + "}")# Cache all workbooks# -------------------
workbook_counter, workbook_cache =0,{}for workbook in TSC.Pager(server.workbooks):
workbook_cache[workbook_counter]=(workbook.id, workbook.name, workbook.project_id, workbook.tags)
workbook_counter = workbook_counter +1# logging.info("{" + "\n".join("{!r}: {!r},".format(k, v) for k, v in workbook_cache.items()) + "}")# Cache all views# -------------------
view_counter, view_cache =0,{}for view in TSC.Pager(server.views):
view_cache[view_counter]=(view.id, view.name, view.project_id, view.workbook_id, view.tags)
view_counter = view_counter +1# logging.info("{" + "\n".join("{!r}: {!r},".format(k, v) for k, v in view_cache.items()) + "}")for project in listOfProjects:
u = project[0]
s = project[1]
projectName = project[2]if u == url[0]+projectName.replace(' ','-'):
logging.info('')# Set the tag label
newTagLabel = project[3]# Set the action to take
actionToTake = project[4]# Build list of workbooks and views needing tags# ------------------for project_value in project_cache.values():
cachedProjectId = project_value[0]
cachedProjectName = project_value[1]# Enable partial match# if re.search(x, project.name):# Enable exact matchif cachedProjectName == projectName:
logging.info('Project - '+ projectName)for workbook_value in workbook_cache.values():
workbookId = workbook_value[0]
workbookName = workbook_value[1]
workbookProjectId = workbook_value[2]
workbookTags = workbook_value[3]if workbookProjectId == cachedProjectId:# create the REST URL
workbook_Url = url[1]+'/api/'+ api_version +'/sites/'+ siteId +'/workbooks/'+ workbookId +'/tags'# add or delete tags
tag_item('Workbook', workbookName, workbook_Url, workbookTags, newTagLabel, actionToTake)for view_value in view_cache.values():
viewId = view_value[0]
viewName = view_value[1]
viewWorkbookId = view_value[3]
viewTags = view_value[4]if viewWorkbookId == workbookId:# create the REST URL
view_Url = url[1]+'/api/'+ api_version +'/sites/'+ siteId +'/views/'+ viewId +'/tags'# add or delete tags
tag_item('View', viewName, view_Url, viewTags, newTagLabel, actionToTake)# Log out of tableau server
response = requests.request("POST", url[1]+'/api/'+ api_version +'/auth/signout')# Sign out
logger.info('')
logger.info('Signing out of '+ url[1])# Job complete
logger.info('')
logger.info('Job is complete')
Python
Apply tags
Actual tagging is done with a function that composes each POST request. The function looks for a setting in the CSV file which defines whether to add, test, or verify the job. This is useful for collecting separate log files which record the previous and post job states of tags on the Tableau server.
# -------------------# Define the tagging process# -------------------deftag_item(type, name, itmUrl, existingTags, newTagLabel, actionToTake):
xml_payload ='<tsRequest><tags><tag label="'+ newTagLabel +'"/></tags></tsRequest>'# Not already taggedif newTagLabel notin existingTags:# remove labelif actionToTake =='add':
logging.info('___ Add "'+ newTagLabel +'" tag to '+type+' - '+ name +'"')
response = requests.request("PUT", url=itmUrl, headers=xml_headers, data=xml_payload)
logging.info(response.text.encode('utf8'))# announce that label does not exist (test/verify)else:
logging.info('___Would have added "'+ newTagLabel +'" tag to '+type+' - '+ name +'"')# Already taggedelse:# remove labelif actionToTake =='remove':
logging.info('___ Remove '+ newTagLabel +' from "'+type+' - '+ name +'"')
response = requests.request("DELETE", url=itmUrl +'/'+ newTagLabel, headers=xml_headers)
logging.info(response.text.encode('utf8'))else:# announce that label already exists (test/verify)
logging.info('___ '+type+' - '+ name +' is already tagged "'+ newTagLabel +'"')
Python
Send Tableau Tags to Solr Index
Declare imports
The Tableau Server/Client is used as a fast way to get all Tableau Workbooks and Views, the REST module is used to add tags to the Tableau server by LUID, and the pysolr module gets the data into the Solr index.
# -------------------# Imports# -------------------# Import the Tableau Server-Client moduleimport tableauserverclient as TSC
# Import the logging modulesimport logging, logging.config
# Import the REST moduleimport requests
# Make solr request for all workbooksimport pysolr
Python
Log activities
During development, the logger can be set to stream feedback to the IDE. For implementation, the logger can be set to log to a file. The file is useful for validating and documenting the work that's been done. The code includes a statement for filtering out some of the non-root Tableau logging feedback.
# -------------------# Logging# -------------------# Set the logger to stream or file
logger = logging.getLogger()# log to console (stream) or file
handler = logging.StreamHandler()# handler = logging.FileHandler('app.log', mode='w')# settings
formatter = logging.Formatter('%(asctime)s %(name)-2s %(levelname)-2s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)# filter out Tableau (non-root) logging
logging.config.dictConfig({'version':1,'disable_existing_loggers':True,})
Python
Set variables
The job type allows for testing ahead of implementation. The other variables prepare credentials and server addresses.
# -------------------# Settings# -------------------# Job type# jobType = 'test'
jobType ='apply'# Set the Tableau server URL
tableau_url ='http://whatever-this-is'# Set the Tableau server-client
server = TSC.Server(tableau_url)# Set the Tableau API version
server.use_server_version()
api_version = server.version
# Set the Tableau sign-in user (needs to be server admin)
tableauSignIn ='AD Service Account username'
tableauPassword ='AD Service Account password'# Set the Tableau Python API sign-in authorization (needs to be server admin)
initialTableauAuth = TSC.TableauAuth(tableauSignIn, tableauPassword)# Set the Fusion server Index Profile URL (We're actually using Ludidworks Fusion here)
fusion_url ='http://fusion-index-profile'# Set the Fusion API headers
headers ={'content-type':'application/json','Accept-Charset':'UTF-8','Authorization':'Basic abc123DEFgiaa=='}# Set the Solr server
host ='http://solr server address'
querystring ='*:*'
fields ='id,title'
qt ='select'
filterquery_views ='_lw_data_source_s:("Tableau-Azure-Views")'
filterquery_workbooks ='_lw_data_source_s:("Tableau-Azure-Workbooks")'
rows ='100000'
Python
Cache data
Get all Tableau workbook and view IDs already in Solr.
# -------------------# Get all items from Solr Index# -------------------# Solr Core
logging.info('Caching Solr Data')
solr = pysolr.Solr(host, search_handler=qt, always_commit=True)
results_workbooks = solr.search(querystring,**{'fl': fields,'fq': filterquery_workbooks,'rows': rows,# 'hl': 'true',# 'hl.fragsize': 10,})
solr = pysolr.Solr(host, search_handler=qt, always_commit=True)
results_views = solr.search(querystring,**{'fl': fields,'fq': filterquery_views,'rows': rows,# 'hl': 'true',# 'hl.fragsize': 10,})
solr_cache_workbooks ={}for row_workbook in results_workbooks:
solr_cache_workbooks[row_workbook['id']]=(row_workbook['id'])
logging.info('Returned {0} workbooks from Solr'.format(len(results_workbooks)))
logging.info('Cached {0} workbooks from Solr'.format(len(solr_cache_workbooks)))
solr_cache_views ={}for row_view in results_views:
solr_cache_views[row_view['id']]=(row_view['id'])
logging.info('Cached {0} views from Solr'.format(len(solr_cache_views)))
logging.info('Returned {0} views from Solr'.format(len(results_views)))
logging.info('')
Python
Cache more data
Cache all workbook and view assets from Tableau server, then send only tagged assets to Solr if the asset already exists in the Solr index.
# -------------------# Identify the sites# -------------------# Cache a list of site LUIDs
site_cache ={}with server.auth.sign_in(initialTableauAuth):for luid in TSC.Pager(server.sites):
site_cache[luid.id]= luid.name
# logging.info("site luid is {" + "\n".join("{!r}: {!r},".format(k, v) for k, v in site_cache.items()) + "}")# -------------------# Process the tags# -------------------# Set the site LUID used for the REST calls, e.g. '87cfedbd-70c5-4cf6-9cbf-a63e80cce7a2'for key, value in site_cache.items():# Set the authorization string = TSC.TableauAuth('USERNAME', 'PASSWORD', site_id='CONTENTURL')# Note: CONTENTURL is "url_namespace" for the site in the Postgres Sites table# Note: CONTENTURL has no spaces# Note: CONTENTURL for the "Default" site is an empty string# Note: CONTENTURL "Default" site is not always named "Default"# Note: CONTENTURL does not handle being a variable with empty quotes, so create with quotes belowif(value.lower()=='default')or(value.lower()=='sjhs tableau site'):
tableauAuth = TSC.TableauAuth(tableauSignIn, tableauPassword, site_id="")else:
tableauAuth = TSC.TableauAuth(tableauSignIn, tableauPassword, site_id=value.replace(' ',''))# Sign in to sitewith server.auth.sign_in(tableauAuth):
logging.info('Caching Tableau')# Cache all workbooks# -------------------
workbook_cache ={}for workbook in TSC.Pager(server.workbooks):
workbook_cache[workbook.id]=(workbook.name, workbook.tags)# logging.info("{" + "\n".join("{!r}: {!r},".format(k, v) for k, v in workbook_cache.items()) + "}")
logging.info('Cached {0} workbooks from Tableau'.format(len(workbook_cache)))# Cache all views# -------------------
view_cache ={}for view in TSC.Pager(server.views):
view_cache[view.id]=(view.name, view.tags)# logging.info("{" + "\n".join("{!r}: {!r},".format(k, v) for k, v in view_cache.items()) + "}")
logging.info('Cached {0} views from Tableau'.format(len(view_cache)))# Build workbook datasets# -------------------
logging.info('')
logging.info('Build workbooks dataset')
workbook_set ={x: workbook_cache[x]for x in workbook_cache if x in solr_cache_workbooks}
logging.info('Built dataset with {0} workbooks'.format(len(workbook_set)))# Build view datasets# -------------------
logging.info('')
logging.info('Build views dataset')
view_set ={x: view_cache[x]for x in view_cache if x in solr_cache_views}
logging.info('Built dataset with {0} views'.format(len(view_set)))# POST all workbook tags to Fusion# -------------------
logging.info('')
logging.info('Sending all workbook tags to Fusion')for w_key, w_value in workbook_set.items():id= w_key
name = w_value[0]
tag =', '.join(w_value[1])type='Workbook'iflen(tag)>1:
data ='{"id": "'+id+'", "tag":"'+ tag +'"}'
runjob(name, tag,type, data)# POST view tags to Fusion# -------------------
logging.info('')
logging.info('Sending all view tags to Fusion')for v_key,v_value in view_set.items():id= v_key
name = v_value[0]
tag =', '.join(v_value[1])type='View'iflen(tag)>1:
data ='{"id": "'+id+'", "tag":"'+ tag +'"}'
runjob(name, tag,type, data)# Sign out
response = requests.request("POST", tableau_url +'/api/'+ api_version +'/auth/signout')
logger.info('All done and signed out')
Python
Add the tags to Solr
Use a function to test or apply each POST to a Fusion Index Profile.
# -------------------# Rinse/Repeat Function# -------------------# Run job of each typedefrunjob(name, tag,type, data):if jobType =='test':# Run as TEST
logging.info('Would have added tag "'+ tag +'" to '+type+' "'+ name +'"')else:# Run for REAL LIKE
response = requests.post(fusion_url, headers=headers, data=data)
logging.info(response.text.encode('utf8'))