Add a pre-filter to your query

Sometimes when you're building queries in Scuba, some data is less useful for you. For example, you might have backend events with names that mirror frontend events, or device heartbeat data that you don't care about.

To simplify your query building, you can create a pre-filter. In general, pre-filters are used for query performance optimization, as a pre-filter is applied after retrieving data from disk, before any other calculations are performed.

Create a pre-filter

To define a pre-filter, do the following:

In the Scuba UI ,navigate to Explore or any of the apps.
Define a query.
Below the query definition, but above the GO button, look for pre-filter. Click the + icon, then click all events to define your pre-filter.

A pre-filter must be an event property that uses only other event properties (including columns from a dataset).

About default pre-filters and board filters

You might notice a pre-filter on your query even before you add one. If so, this means that your admin has defined a default pre-filter. A default pre-filter runs on all queries, but not on boards or in the flow builder.

Remove a default pre-filter by clicking its adjacent trash icon. To reapply a default pre-filter, click Clear all in the query builder.

A board can also supply a pre-filter to queries that are pinned to it. To remove a board filter from your query, open the query in Explore. See Save variants of a board with board filters for more information.

Keep in mind: when a pre-filter is applied, Scuba still initially scans all events. As a result, a pre-filter might not significantly improve performance, depending on your query definition.

Also note: using a pre-filter can affect your results. Because a pre-filter will cause the query engine to ignore events that do not match your filter(s), in certain circumstances this can cause different results than intended.

What’s the difference between a pre-filter and a measure filter?

Let’s say you want to create a table, showing the number of confirmed purchases with at least one item, split by the number of confirmed purchases per user. You can do this using the filter IS_NOT_EMPTY(item) and actor property purchase_confirmed (count of purchase events per user). You can then choose to put the IS_NOT_EMPTY(item) filter into your “count events” measure, or put it into the pre-filter. Both would show a count of events, use a filter of IS_NOT_EMPTY(item), and split by purchase_confirmed, but each would return different results.

Why?

When you use a pre-filter, it’s as if we’re completely discarding any events that don’t match IS_NOT_EMPTY(item). When we then go to count purchase_confirmed events per user, there are none because they don't have a value for item and are therefore not queryable.

On the other hand, when we put the filter in the measure, we’re not “discarding” events like the pre-filter does. We’re instead looking at events that have a value for item, then looking at the associated user, and then asking how many events that user has that match purchase_confirmed, and splitting by that number. Thus, even though purchase_confirmed events may not pass the item filter, they're still available for aggregation.

Here’s how the query engine would proceed using each method of filtering:

Using a pre-filter

Remove all events without a value for item.
Count how many purchase_confirmed events each user has in the filtered dataset.
1. Since no purchase_confirmed events have a value for item, and we removed all the events without an item value, this will be 0.
Split the data by the unique values produced in step 2. Since the value is 0 for each purchase event, there is only one row (of 0) in the returned table chart.

Using a measure filter

Look at every event that has a value for item.
Count how many purchase_confirmed events each associated user has.
1. Since the dataset itself isn’t filtered, all of the purchase_confirmed events are still countable, and returns results.
Split the data by unique values produced in step 2. Since each user with purchase_confirmed events also has item events, we would see multiple rows in our returned table chart.

The crucial difference is that with the filter in the measure, all events are still there to be aggregated/queried on. With a pre-filter, it’s as if those events don’t exist at all.