Add a pre-filter to your query
Sometimes when you're building queries in Scuba, some data is less useful for you. For example, you might have backend events with names that mirror frontend events, or device heartbeat data that you don't care about.
To simplify your query building, you can create a pre-filter. In general, pre-filters are used for query performance optimization, as a pre-filter is applied after retrieving data from disk, before any other calculations are performed.
Create a pre-filter
To define a pre-filter, do the following:
In the Scuba UI ,navigate to Explore or any of the apps.
Define a query.
Below the query definition, but above the GO button, look for pre-filter. Click the + icon, then click all events to define your pre-filter.
A pre-filter must be an event property that uses only other event properties (including columns from a dataset).
About default pre-filters and board filters
You might notice a pre-filter on your query even before you add one. If so, this means that your admin has defined a default pre-filter. A default pre-filter runs on all queries, but not on boards or in the flow builder.
Remove a default pre-filter by clicking its adjacent trash icon. To reapply a default pre-filter, click Clear all in the query builder.
A board can also supply a pre-filter to queries that are pinned to it. To remove a board filter from your query, open the query in Explore. See Save variants of a board with board filters for more information.
Keep in mind: when a pre-filter is applied, Scuba still initially scans all events. As a result, a pre-filter might not significantly improve performance, depending on your query definition.
Also note: using a pre-filter can affect your results. Because a pre-filter will cause the query engine to ignore events that do not match your filter(s), in certain circumstances this can cause different results than intended.
What’s the difference between a pre-filter and a measure filter?
Let’s say you want to create a table, showing the number of confirmed purchases with at least one item, split by the number of confirmed purchases per user. You can do this using the filter IS_NOT_EMPTY(item)
and actor property purchase_confirmed
(count of purchase
events per user). You can then choose to put the IS_NOT_EMPTY(item)
filter into your “count events” measure, or put it into the pre-filter. Both would show a count of events, use a filter of IS_NOT_EMPTY(item)
, and split by purchase_confirmed
, but each would return different results.
Why?
When you use a pre-filter, it’s as if we’re completely discarding any events that don’t match IS_NOT_EMPTY(item)
. When we then go to count purchase_confirmed
events per user, there are none because they don't have a value for item
and are therefore not queryable.
On the other hand, when we put the filter in the measure, we’re not “discarding” events like the pre-filter does. We’re instead looking at events that have a value for item
, then looking at the associated user, and then asking how many events that user has that match purchase_confirmed
, and splitting by that number. Thus, even though purchase_confirmed
events may not pass the item
filter, they're still available for aggregation.
Here’s how the query engine would proceed using each method of filtering:
Using a pre-filter
Remove all events without a value for
item
.Count how many
purchase_confirmed
events each user has in the filtered dataset.Since no
purchase_confirmed
events have a value foritem
, and we removed all the events without anitem
value, this will be 0.
Split the data by the unique values produced in step 2. Since the value is 0 for each purchase event, there is only one row (of 0) in the returned table chart.
Using a measure filter
Look at every event that has a value for
item
.Count how many
purchase_confirmed
events each associated user has.Since the dataset itself isn’t filtered, all of the
purchase_confirmed
events are still countable, and returns results.
Split the data by unique values produced in step 2. Since each user with
purchase_confirmed
events also hasitem
events, we would see multiple rows in our returned table chart.
The crucial difference is that with the filter in the measure, all events are still there to be aggregated/queried on. With a pre-filter, it’s as if those events don’t exist at all.