Sometimes when you're building queries in Scuba, some data is less useful for you. For example, you might have backend events with names that mirror frontend events, or device heartbeat data that you don't care about.
To simplify your query building, you can create a pre-filter. In general, pre-filters are used for query performance optimization, as a pre-filter is applied after retrieving data from disk, before any other calculations are performed.
Create a pre-filter
To define a pre-filter, do the following:
In the Scuba UI ,navigate to Explore or any of the apps.
Define a query.
Below the query definition, but above the GO button, look for pre-filter. Click the + icon, then click all events to define your pre-filter.
A pre-filter must be an event property that uses only other event properties (including columns from a dataset).
About default pre-filters and board filters
You might notice a pre-filter on your query even before you add one. If so, this means that your admin has defined a default pre-filter. A default pre-filter runs on all queries, but not on boards or in the flow builder.
Remove a default pre-filter by clicking its adjacent trash icon. To reapply a default pre-filter, click Clear all in the query builder.
A board can also supply a pre-filter to queries that are pinned to it. To remove a board filter from your query, open the query in Explore. See Save variants of a board with board filters for more information.
Keep in mind: when a pre-filter is applied, Scuba still initially scans all events. As a result, a pre-filter might not significantly improve performance, depending on your query definition.
Also note: using a pre-filter can affect your results. Because a pre-filter will cause the query engine to ignore events that do not match your filter(s), in certain circumstances this can cause different results than intended.
What’s the difference between a pre-filter and a measure filter?
Let’s say you want to create a table, showing the number of confirmed purchases with at least one item, split by the number of confirmed purchases per user. You can do this using the filter
IS_NOT_EMPTY(item) and actor property
purchase_confirmed (count of
purchase events per user). You can then choose to put the
IS_NOT_EMPTY(item) filter into your “count events” measure, or put it into the pre-filter. Both would show a count of events, use a filter of
IS_NOT_EMPTY(item), and split by
purchase_confirmed, but each would return different results.
When you use a pre-filter, it’s as if we’re completely discarding any events that don’t match
IS_NOT_EMPTY(item). When we then go to count
purchase_confirmed events per user, there are none because they don't have a value for
item and are therefore not queryable.
On the other hand, when we put the filter in the measure, we’re not “discarding” events like the pre-filter does. We’re instead looking at events that have a value for
item, then looking at the associated user, and then asking how many events that user has that match
purchase_confirmed, and splitting by that number. Thus, even though
purchase_confirmed events may not pass the
item filter, they're still available for aggregation.
Here’s how the query engine would proceed using each method of filtering:
Using a pre-filter
Remove all events without a value for
Count how many
purchase_confirmedevents each user has in the filtered dataset.
purchase_confirmedevents have a value for
item, and we removed all the events without an
itemvalue, this will be 0.
Split the data by the unique values produced in step 2. Since the value is 0 for each purchase event, there is only one row (of 0) in the returned table chart.
Using a measure filter
Look at every event that has a value for
Count how many
purchase_confirmedevents each associated user has.
Since the dataset itself isn’t filtered, all of the
purchase_confirmedevents are still countable, and returns results.
Split the data by unique values produced in step 2. Since each user with
purchase_confirmedevents also has
itemevents, we would see multiple rows in our returned table chart.
The crucial difference is that with the filter in the measure, all events are still there to be aggregated/queried on. With a pre-filter, it’s as if those events don’t exist at all.