What is Event Data
By definition, event data is data from “Any identifiable occurrence that has significance for system hardware or software. User-generated events include keystrokes and mouse clicks, among a wide variety of other possibilities.” Events describe an action performed by or associated with an entity at a certain time. Event data is a continuous stream of actions that reveals the patterns of events people, products, and machines make over time. It helps describe when and how things happen. Event data is the foundation for behavioral analytics; enabling understanding of how customers behave and products are used.
Event data is simply any data point that has a timestamp, entity, and attributes of an action. As simple as that sounds, events are at the heart of many businesses. Clickstreams, logs, data from IoT devices, sensor data, and more are all event data. A mouse click is an event; it happens at a point in time and its context includes attributes such as where the entity clicked and what was clicked.
Analysis of event data is based on key concepts about chronologically ordered data and its relationship to the world. For example, event data is generated by an entity who follows a path through a conversion flow, taking action at certain points along the way. If we examine the events of all entities that went through the conversion flow, we can understand their behavior and start to answer questions such as:
- What are the characteristics of entities that converted or dropped off? 
- Why did some entities take longer to convert, and why? 
- What happened between each step of the conversion flow? 
What does event data look like?
Each piece of event data has three key pieces of information: a timestamp, one or more entities, and attributes.
- Timestamp: Just like it sounds, it records at what point in time the action took place. 
- Entity: Who took the action. This could be a person, machine, sensor, etc. 
- Attributes: These are inherent characteristics that describe what happened, like a click or a call. The more properties and information captured here, the richer the data. 
Here is a simple example of an event captured on a website in JSON:
{“timestamp”: “2015-06-31T13:50:00-0600”, 
“id”: “05632”, 
“attributes”: { “type”: “click”, 
 “page”: “request_demo”, 
“previous_page”: “product_tour”, 
“session_length”: “1060”, 
“browser”: “chrome”, 
“ip_address”: “10.0.0.1”, 
“ip_region”: “united states”, 
“ip_state”: “california”,
“ip_city”: “san francisco”}}What Makes Event Data Different?
Event Data is Attribute-Rich
Event data can have hundreds of attributes that describe each event. Because we use event data to discover behavior patterns, we want to have the full context for every event. Every attribute we store is context we can analyze; this makes event data rich. For “Shopper D” in the example above, we can store attributes like first and last names, birth date, gender, favorite color, home town, and preferred payment method. Then we could define a cohort of shoppers who are over 50 and whose hometown is New York, and follow their behavior over time. Another reason events can have hundreds of properties is that they may describe not just one entity, but multiple entities involved in a single event. The attributes of each entity become part of the event data. For every transaction on an e-commerce site there may be a supplier, a vendor, a shopper and a 3rd party payer (credit card company, PayPal), any of whom may participate in a given event during the transaction.
Event Data is Massive
For most companies, it is their fastest growing type of data. But why is it so big? Event data captures the actions that an entity takes over time, so for every one entity, you could have tens of thousands of actions. Imagine a popular wearables company with hundreds of thousands of devices in the market. Each wearable device could generate thousands of rows of event data daily, quickly adding up to billions of events in just a short period of time.
Event Data is Denormalized
In an event data store, data is structured but never normalized. This is unlike a relational database, in which redundant data is normalized and referenced from a single location in a single table. Every time a value changes, the previous value is overwritten and only the last update is available. But, when we analyze event data, we want to know the state of the world at the moment of the event. For example, imagine storing data from an anemometer, which measures windspeed. The meter takes a reading every 30 seconds, and the windspeed value is automatically updated in the weather database. In this case, we will always know how fast the wind was blowing in the last 30 seconds, but we will never know how the windspeed has changed over the last hour. This is why, in an event data store, data is always appended and never updated. Every “windspeed” event is stored permanently. For a weather station that measures not just windspeed but also temperature, humidity, barometric pressure and precipitation, every attribute is stored for every sensor reading. Only when event data is denormalized can we use it to find patterns and gain insight into change over time.
Event Data can be Schema-less
As mentioned earlier, different types of events and even individual events of the same type may have different numbers of attributes. In other words, the data does not necessarily follow a particular schema. Since event data may be schema-less or adhere loosely to a schema, storing event data does not require a declared schema and accepts any number of attributes per event. A time attribute and an entity attribute are required for each event; any other attributes can be arbitrary. For example, while a group is running, their activity trackers could record 5 attributes: distance, stride length, heart rate, and speed. But, when they start to walk, their activity trackers may only capture two attributes: heart rate and stride length.
Event Data is Connected by Time
Event data has a native concept of time and illustrates the connections between related events in a specified time period. This makes it easy to combine multiple data streams, because they all have time in common. For example, three separate data streams from mobile logs, web logs, and purchase history have time as a common reference and can thus be merged into a single source for even richer insights.