Fusion Server - Storing and Analyzing Signals

Lucidworks Fusion signals data can reveal a lot about how people use a web application. Strategies for storing and analyzing signals data are therefore key to platform maintenance. Here are a few methods and insights along those lines.

Analyzing application usage - Tableau dashboard
Get live signals data from Fusion Search
Export historical signals data to SQL server
Combine live and historical signals data in Tableau dashboard

Analyzing application usage - Tableau dashboard

Tableau Professional is useful for combining and analyzing live data from the Fusion signals collection and historical Fusion log and signals data stored on a SQL server. Apache Zeppelin is a free alternative worth looking into if Tableau is too costly.

Get live signals data from Fusion Search

A direct connection from Tableau Professional through the Fusion SQL service to a Fusion 3.x server Log and Signals collection typically returns all relevant data fields except for count_i, raw_content, root, text, version, weight_d and id fields. Any other missing field may simply have no data for the given query date range.

Since the Solr id field is essential for combining live with exported data in a Tableau report, here's how to ensure the logs and signals collections will return it.

Go to the server console of any Fusion Server, server or cluster, and list all the tables that Fusion SQL is exposing:
Fusion 3.x - curl -u admin:password -X GET "http://servername:8764/api/apollo/catalog/fusion/assets"
Fusion 4.x - curl -u admin:password -X GET "http://servername:8764/api/catalog/fusion/assets"

{
    "projectId": "fusion",
    "name": "collection_signals",
    "assetType": "table",
    "description": "Fusion collection collection_signals",
    "format": "solr",
    "options": [
        "collection -> collection_signals",
        "solr.params -> sort=id asc",
        "exclude_fields -> id,_lw_*,*_\\d_coordinate,_raw_content_"
    ],
    "cacheOnLoad": false,
    "createdOn": "2018-12-04T16:14:18.179Z",
    "id": "fusion.collection_signals",
    "updatedOn": "2018-12-04T16:14:18.00Z",
    "version": 1618938490498580480
}

GET the definition of the specific collection's log or signals table:
Fusion 3.x - curl -u admin:password -X GET "http://servername:8764/api/apollo/catalog/fusion/assets/collectionname_signals" > ./tableDef.json
Fusion 4.x - curl -u admin:password -X GET "http://servername:8764/api/catalog/fusion/assets/collectionname_signals" > ./tableDef.json

Remove "id," from the "exclude_fields -> id,lw,\d_coordinate,_raw_content" string in the downloaded JSON file:

Save and PUT the file back to the server. This request may take time to process on a large collection.
Fusion 3.x - curl -u admin:password -X PUT -H "Content-type:application/json" --data-binary @tableDef.json "http://servername:8764/api/apollo/catalog/fusion/assets/collectionname_signals"
Fusion 4.x - curl -u admin:password -X PUT -H "Content-type:application/json" --data-binary @tableDef.json "http://servername:8764/api/catalog/fusion/assets/collectinname_signals"

{
    "projectId": "fusion",
    "name": "collection_signals",
    "assetType": "table",
    "description": "Fusion collection collection_signals",
    "format": "solr",
    "options": [
        "collection -> collection_signals",
        "solr.params -> sort=id asc",
        "exclude_fields -> _lw_*,*_\\d_coordinate,_raw_content_"
    ],
    "cacheOnLoad": false,
    "createdOn": "2018-12-04T16:14:18.179Z",
    "id": "fusion.collection_signals",
    "updatedOn": "2018-12-04T16:14:18.00Z",
    "version": 1618938490498580480
}

After restarting the Fusion service on all servers in the cluster, the Solr id field will be available to the FusionSQL connection, and the new definition will survive a server restart.

Export historical signals data to SQL server

Beyond keeping several months of data in Fusion Search for training ML models, historical log and signals data should be moved off of and deleted from the Fusion Search server or cluster. This approach survives Fusion log and signal schema upgrades easier, and avoids service hangs from analytics requests that return everything from years ago to today. Here's how to export the log and signals data to CSV or SQL storage.