This how-to will illustrate when to use a colocated column, and how to create one in Interana.
When to use a colocated column
Colocated columns are an alternative to creating new table copies when adding shard keys. When you want to establish a new shard key, you usually need to create a new table copy. Unfortunately as the name implies, each new table copy creates a complete copy of the underlying dataset, linearly increasing the storage required in your cluster. Colocated columns allow you to avoid creating new table copies when adding shard keys, if the shard key fulfills certain criteria.
If the shard key you want to add has a one-to-one or many-to-one relationship with an existing shard key, you can create a colocated column instead of creating a new table copy. The important point is that each instance of the new shard key must map to exactly one instance of an existing shard key.
Here's an example:
Suppose we have a dataset with shard key user_id that also contains column session_id, a unique token generated when each user logs into the site. If we wanted to study the properties of sessions, would we need a separate table copy sharded by session_id?
No additional table copy is needed if all events for a given session_id can be found within the same user_id shard. When that's the case, each session_id value uniquely identifies a corresponding user_id value. You can verify this relationship within Interana, by running a query selecting Count Unique of user_id as the Measure, and session_id as the Compare.
Creating a colocated column
To set up a colocated column in Interana, you must update the configDB with information about the related shard keys. The following sample commands will continue referring to our example dataset with user_id and session_id as colocated columns.
The first step is to find the column_id for both the existing shard key column, and the colocated column:
select column_id from table_columns where name="session_id"; # result SESSION_COLUMN_ID
select column_id from table_columns where name="user_id"; # result USER_COLUMN_ID
Then you must set the colocated_with_column_id field of the colocated column database entry to the column_id of the existing shard key:
update table_columns set colocated_with_column_id=USER_COLUMN_ID where column_id=SESSION_COLUMN_ID limit 1;
Once this configuration is set, the session_id column will function as a shard key in the UI, enabling all types of behavioral analysis on session_id — without requiring another table copy.
Determining whether a colocated column is appropriate requires understanding your data and how attributes of events relate to each other. But mastering the science of when and how to use colocated columns has the potential to provide increased flexibility, unlock new insights, and reduce resource usage in Interana.