Sisense Performance: A Billion Records in a Single Server

Browsing our site, speaking to our team, or reading about us in analyst reviews, you might have noticed that we dig technology here at Sisense. That’s why when clients and prospects desire to push the limits, both in terms of data complexity and quantity, we happily oblige. 
After asking what our recommendations would be for the most data to host in a single Sisense server, one newly signed client (a prospect at the time) passed us one billion transactional records and three million dimensional records to host in a single Sisense node  thats 500gb of data to test with 100 concurrent users logging in and banging around on the server. We used a 32 CPU core and 244gb RAM cloud machine for the job, in agreement with our straightforward specs. We’ll cut to the chase and share the details from Load Impact below. 
Tested Setup
  • AWS Instance - r4.8xlarge (32CPU, 244gb RAM)
  • 100 Concurrent Users
  • 120 minutes
  • 38 Max Concurrent Queries
    • Sisense Concurrency is defined as querying within the same millisecond
  • 2 types of usage scenarios
    • 50of users returned results from the entire billion record dataset
    • 50of the users viewed a subset of data, simulating use by clients whosee only their own data
Conclusions
  • Query response time averaged 0.1 seconds and maxed at 3.1 seconds. Thisrepresents the time for Sisense to receive a query from the web application and return a result set to the client application.
  • The Sisense Elasticube RAM consumption remained stable at approximately100gb despite the 500gb+ of data loaded into disk of the Elasticube Server.
  • The average CPU usage during the load test was approximately 10-20%. This is spread across all of the distinct CPU cores.
Performance Details 
We used a tool called logz.io to analyze the server performance during the load test to aggregate logs into kpi’s which we can analyze to determine the impact on the server and determine impact in production. 
Here’s what those query performance results looked like across the hour-long test. To summarize, no query took longer than 3.1 seconds to return results to the web front end.
Sisense_performance.PNG
When it comes to the server usage, we passed the test in flying colors as well. Our amazing in-chip technology was on full display - we hosted 500gb of data without utilizing more than 128gb of RAM. CPU utilization during query times never rose above 75throughout the load test, and it averaged less than 20%.
Sisense_performance_1.PNG
Methodology
We used a tool called Load impact to create artificial users that log in and interact with dashboards to mimic production. That includes the following types of actions in Sisense: 
  • Loading a dashboard with nine widgets
  • Changing filter from one account to another and from one year to two years
  • Filtering by clicking on context from one chart to control the others
  • Drilling from country town to region-level data
  • Downloading a .csv of the information in a Sisense widget
  • Switch dashboard, repeat all steps above.
The two different user types (scenarios 1 and 2 below) performed the same steps. One group, however, had a where clause appended to all their queries to limit their view to one out of the seven customer accounts. This simulates the external, OEM use case for deploying to clients to view your dashboards.
Here is a visualization describing the usage pattern over the timeframe. Across the two hours on the x-axis, the number of virtual users (VUs) is displayed on the y-axis. As you can see, the number of users ramped for 50 minutes, remained steady for 10 minutes,and then did the same thing during the second hour.
Sisense_performance_2.PNG
The concurrent number of queries over the two-hour test increased throughout the period of testing, as shown below. In Sisense, concurrency represents two or more users initiating a query within the same millisecond.Sisense_performance_3.PNG
Data Details
The data represented one billion purchases on a website, each with its own unique transaction ID. The purchases were split into three categories - planes, trains, and automobiles. Furthermore, the analysts wanted to kick the tires on Sisense’s ability to join large tables on demand. On user request, a three million record dimension table would join with that one billion record fact table to provide revenues from the fact table grouped by origin/destination combinations, contained in the dimension table. 
The Elasticube looked like this:
Sisense_performance_4.PNG
Dashboard Details
At the end of the day, the client wanted dashboards that tracked revenues, bookings, and average revenues per booking across time, across client types and fee types. 
Here's one of the dashboards used during the testing:
Sisense_performance_5.PNG