Multi-node cluster
A multi-node cluster consists of a set of connected systems (nodes) that work together, and in many ways can be viewed as a single system. The nodes of a cluster are usually connected through local area networks, with each node running its own instance of the same operating system.
A Scuba multi-node cluster consists of the following nodes that can each be deployed on a separate device, or grouped (as stacked services) on two or more devices:
config node — Node from which you administer the cluster. MySQL database (DB) is only installed on this node for storage of Scuba metadata. Configure this node first.
api node — Serves the Scuba application, merges query results from data and string nodes, and then presents those results to the user. Nginx is only installed on the api node.
ingest node — Connects to data repositories (S3, Azure, local file system), downloads new files, processes the data and then sends to data and string tiers, as appropriate.
data node — Data storage, must have enough space to accommodate all events and stream simultaneous query results.
string node — String storage for the active strings in the dataset, stored in compressed format. Requires sufficient memory to hold the working set of strings accessed during queries.
listener node — Streams live data from the web or cloud, also known as streaming ingest. This node is optional during installation.