A query is a request for information from the database using a stylized language. You can use various components to build a query, such as named expressions (2.x), knowledge objects (3.0), and functions.
The types and number of components you use to structure a query can affect how quickly you receive the results. Since the complexity and focus of queries can vary greatly from company to company, as well as from user to user within a company, rules for constructing queries that are too specific would be limiting.
This topic provides general guidelines for creating efficient queries that optimize performance.
If you are new to creating queries, we suggest that you walk through the Tourist's Guide and Explorer Guide for a solid basis in how to construct queries, and then try out some of the examples in the Query Cookbook for good measure.
General guidelines for efficient queries
Follow these basic guidelines for optimum query performance:
- Run sampled queries whenever possible for optimum performance. Unsampled queries always take longer than sampled queries due to the amount of data that must be scanned.
- Use the minimum number of measures. The more measures there are in a query, the slower the query performance.
- Use the minimum number of filters possible. The more filters there are in a query the longer it takes, due to the number of computations that must be performed.
- Keep the number of referenced columns to a minimum. The more columns that are referenced, the slower the query will be due to the amount of data that must be scanned.
- Keep query date ranges to a minimum. The longer the query date ranges are the slower the query will be, due to the amount of data that must be scanned.
- If you use a group by, in general you can expect slower results. If you use multiple group bys, query performance will be even slower due to the added results that must be returned.
- Use a minimum number of named expressions, metrics, cohorts, and funnels in queries for optimum performance. Each of these objects requires individual computations, so the more you use in a query the slower the response time will be.
- Named expressions require more computations than simple filters, and therefore are slower to return results.
- Over use of cohorts and metrics slows query performance, due to the computations they require. Sometimes a simple filter is all that's needed.
- A text contains filter takes up more resources—resulting in a slower response time—than a is one of or is not one of filter.
- In most cases, to improve performance when using a funnel, click Reset funnel if event matches Step 1 to ensure the funnel is cacheable.
- Table View queries are more efficient than time view queries, because a Time View query requires computations for every point on the graph and a Table View query only requires one computation.
- Queries involving sets contain more data and therefore will have a slower response time than queries that don't contain sets.
- The more data you have, the slower your query response will be due to the amount of data that must be scanned.
Cluster performance cluster tips
Use the following basic guidelines for optimum performance when running queries:
- Delete dashboards that are not being used, so resources are not used to update them.
- Moderate the number of external API queries that are running while UI queries are running. The more external API queries that are running, the slower the UI queries will be.
- Delete unused columns in your data to enhance performance.
- If you run queries during a high import volume, your query response will be slower due to the resources being consumed on the data and string tiers.
Simpler is usually faster
In most cases when building queries, the simpler the construction the more efficient the query, due to fewer necessary computations. This is especially true when query cohorts and metrics contain filters. The computations required for cohorts and metrics slow query response time, and are often used when a simple filter would suffice.
- In this example, we have a metric shoe_purchases_per_user.
- Now we want to view the median shoe_purchases_per_user for all active users in the time period.
- Next, we want to view this median calculation, but only for the groups of users that also had at least one accessories purchase, and decide to use an existing cohort or metric to filter to this group of users.
This produces the correct results, but not as quickly as we would've liked.
- We realize that since we are filtering on a per-actor metric, we can use a simple filter to produce the same result with a much quicker response time. The simple filter is more efficient here, because the additional scan to calculate the users in the cohort is not required.