Skip to main content
You are viewing the documentation for Interana version 2. For documentation on the most recent version of Interana, go to


Scuba Docs

How it Works

Christina Noren
This applies tov2.25

Interana is full stack behavioral analytics software. Behavioral analytics allows users to explore the activity of any digital service. Full stack means that Interana includes both its own web-based visual interface and a highly scalable distributed back end database to store the data and process queries. Interana is deployed as Managed Edition, where Interana deploys and managed a dedicated Interana cluster on the customer's behalf in the customer's own AWS or Azure environment.

Interana is designed for interactive, exploratory, daily use by many users in roles across all kinds of digital businesses. These range from “non-technical” roles like product and growth, to content, support and sales, in addition to the engineers and data scientists you might expect. The Visual Explorer encourages rapid iteration with point-and-click query building and interactive visualizations - no SQL or other query syntax to learn. Living Dashboards allow any user to go from any dashboard chart to explore its underlying data - where they can change parameters and drill down to explore and understand what’s behind the summary. In most organizations using Interana, users that were expected to be passive consumers of dashboards have become active ad hoc explorers using dashboards as jumping-off points.

Interana’s unparalleled speed in processing queries against raw event data at massive scale enables users to iterate rapidly. Interana typically returns results in seconds against trillions of rows of raw data. All calculations are performed at query-time. Interana does not rely on pre-calculated summaries for anything. Any pre-calculation would limit flexibility and create the dead ends common to virtually any other self-service analytics product. This process also creates burden on data engineers to maintain models of the data and service one-off requests to support new analyses.

Why does this matter? With Interana, you can conceive of a new flow that may be occurring and immediately identify which users completed that flow, and ask questions about how those users’ activity differs from other users. With any other product, you would have to wait for someone with data modeling privileges to model a new flow and likely have to wait for that flow to be processed against historical data before running new queries about users who completed that flow.

Interana was built for behavior. Any kind of behavior. Interana allows not only for simple “slice-and-dice” type summaries of raw events (which pretty much any reporting interface would give you), but also true behavioral analytics - questions about the sequences of events, or journeys, that actors take through digital services. Interana is also designed to be flexible about what kind of actors and what kinds of digital services it can support. There aren’t assumptions built into it specific consumer-facing mobile or web services. (Although it is certainly frequently applied to those types of services.)

Interana’s behavioral analysis is based on several types of behavioral “building blocks.” These are not pre-computed but are user definable and re-usable - and can build on one another. Metrics calculate summary statistics on any values in sets of events for filters, time periods, actors’ journeys, sessions and more. Sessions divide and filter actor journeys into sub-series of events. Interana then computes statistics on session counts, durations and numbers of events. Funnels specify steps to match actor journeys against any expected flow and calculate statistics on how many actors completed each step. Funnels can also be used to find paths to, from and between particular steps. Cohorts segment users based on their behavior and attributes within a time period, enabling both behavioral and demographic segmentation.

Interana analyzes the behavior of actors of any kind, not just “users.” Actors on digital services may be people who access your service directly - who may be more than one kind of user. For example, Interana easily analyzes the behavior of users on different sides of two-sided marketplaces - like riders and drivers in a ridesharing service. Actors can also be real or virtual things - like devices, topics, accounts or “bots” that can also be thought to “behave” and about whom you may have the same kinds of journey, metrics, funnel and behavioral segmentation questions that you have about people. You have the ability to define multiple actors in the same data so that you can look at behavior from different points of view.

Interana ingests event data as soon as it sees it. Generally administrators tell it to watch S3 or other cloud file locations and say what Interana dataset those locations should populate. It’s typical to make a location that Interana is watching a data sink for a modern data pipeline. (Soon it will have the ability to listen on an HTTP port for data sent by other systems or by client side code. Interana ninjas often set up additional clever ways for it to see new data.) Interana likes to eat JSON and needs at least two fields identified - a timestamp and at least one actor to use as a “shard key.” Interana is just about as real time as your data pipelines make data available to it - often data is queryable seconds or less from arrival.

Interana stores raw event data in named datasets with a wide, flat schema. Interana automatically creates the schema for its datasets on the fields it sees in the JSON you send it. If new fields show up, it will just add them. It automatically decides whether the fields are integers, strings, or arrays of integers or strings - which determines the kinds of filtering and metrics you can do using those fields. If you want to analyze actor journeys across different data sources, just send them all to the same dataset and ensure that there are common timestamp and actor fields (shard keys).

You can have multiple datasets, but that’s to segregate completely independent data - we don’t support or need any joins or unions. Under the hood Interana uses a columnar storage engine - so there’s no performance penalty for having hundreds of sparse columns.

Lookups and derived columns make additional columns available at query time. Admins can join other external datasources to a dataset on any actor (or shard key) as lookups. This can be used to bring in metadata such as user account information or SKU details into events. Derived columns may also be defined to output additional columns based on code. At query time, lookup values and derived columns become available as additional columns for analysis as if they were additional fields in every event. This is effectively denormalizing, or flattening, the metadata.

User-defined named expressions like sessions also make new query-time only columns. This is possibly the most surprising thing for new users, but is immensely powerful. For example, if you have a session defined for a dataset, Interana makes a session id, session duration and event count available specific to that session definition. Each event gets the session id, duration and event count for the entire session of which it is part. You can then operate on the id, duration and event count that are determined at query time just as if they were fields or columns in your original events.

So, for example, you can find all checkout actions that were part of a session lasting less than 5 minutes. Or you can count sessions for an actor by counting unique session ids for events associated with them. Effectively, named expressions create additional state and context for individual events based on related or surrounding events.

Metrics, funnels and cohorts similarly add new query-time only columns. For example, you can find events that were found to be step 2 of a given funnel by querying on a current state column. You can find events that were found to be step 2 of instances of a given funnel that did not proceed to step 3 by querying on current and terminal state columns.

This is happening via lots of fast passes through the same raw data. Each pass builds up more state and summarization for every event. Interana does all of this in seconds, even on very large datasets. Because it is done at query time, lots of different views of sessions, funnels, cohorts and metrics can be defined by different users to apply to the same dataset and only the ones referenced in a given query are processed.

Sound a little like voodoo? Don’t worry. Start with the Tourist Guide, proceed onto the Explorer Guide, and you’ll be an expert Interanian in no time!

  • Was this article helpful?