Estimated time to read: 9 minutes
EraSearch is a log-management platform for storing, exploring, and managing large amounts of data. Use it to simplify your log-management setup, lower costs, and get the most out of your data.
This page describes EraSearch and ways to use it. If you want to start using EraSearch now, visit the getting started series.
What EraSearch is¶
EraSearch is a petabyte-scale platform for log management. With all EraSearch instances, you can:
- Store and manage lots of data with fewer resources.
- Use the REST API to interact with data.
- Explore data with an Elasticsearch-like language.
- Write and act on data with several integrations.
Store and manage lots of data with fewer resources¶
EraSearch stores data in cold storage and keeps a hot cache for faster queries. It separates storage and compute to help you:
- Use fewer resources.
- Handle more data.
- Lower your costs.
Use the REST API to interact with data¶
Connect and work with EraSearch using the EraSearch REST API. With the API, you can write and query data with HTTP requests using your preferred language or framework.
Here's an example of using the API with cURL to write data to an EraSearch instance on EraCloud:
Explore data with an Elasticsearch-like language¶
EraSearch supports parts of Elasticsearch's Query Domain Specific Language (DSL) and several Elasticsearch aggregations. You can use the same syntax to specify queries and learn about your data.
Here's an example of a query to EraSearch using Elasticsearch's query-string syntax. It requests all data where
_line contains the word
trail exists, and
elevation is greater than one.
Write and act on data with several integrations¶
EraSearch integrates with several popular tools, helping you fit EraSearch into your existing setup. Some of those integrations include Grafana, Vector, Telegraf, Logstash, Fluentd, and Cloudflare.
Ways to use EraSearch¶
There are two ways to use EraSearch: EraSearch on EraCloud and self-hosted EraSearch.
EraSearch on EraCloud¶
This is Era Software’s SaaS product. It offers hosted EraSearch fully managed by Era Software. To start using EraSearch on EraCloud, visit the getting started series.
Self-hosted EraSearch is when you run and manage EraSearch on your own cloud infrastructure. Reach out to us at Era Software to get started with self-hosted EraSearch.
EraCloud vs. self-hosted EraSearch¶
This table outlines some of the differences between EraCloud and self-hosted EraSearch:
|EraSearch on EraCloud||Self-hosted EraSearch|
|Setup and installation||Era Software's sign-up UI||Self-install in your environment|
|Managing and hosting||Era Software on Amazon Web Services (AWS)||You, on your own cloud provider (AWS, Google Cloud Platform (GCP), or Microsoft Azure)|
|Authentication||API key||Basic auth|
|Authorization||Role-based access control (RBAC) for EraCloud-specific features||RBAC for database features|
|Data exploration and alerting||EraSearch UI or integrations||Integrations|
Era Software is working towards feature parity across EraCloud and self-hosted EraSearch. Here are some of the items we're focusing on:
- Cloud providers - In addition to AWS, future EraCloud versions will support GCP and Microsoft Azure.
- EraSearch UI - Future self-hosted EraSearch versions will support the EraSearch UI. Until then, you can use Grafana to visualize and interact with your self-hosted EraSearch data.
- RBAC - In addition to RBAC for EraCloud-specific features, future EraCloud versions will support RBAC for database features.
Here's a common workflow for EraSearch on EraCloud, and some documentation to get you started:
- Collecting real-time data with Vector
- Storing data in EraSearch on EraCloud
- Viewing and querying data in the EraSearch UI
- Alerting on data in the EraSearch UI
Here's a common workflow for self-hosted EraSearch, and some documentation to get you started:
- Collecting real-time data with Telegraf
- Storing data in self-hosted EraSearch
- Querying and visualizing data in Grafana
- Managing users and roles with EraSearch RBAC
How EraSearch works¶
This section describes EraSearch's database architecture. If you're a self-hosted user, this information helps you install and manage EraSearch. If you're an EraCloud user, you don't need this information to work with EraSearch on EraCloud. But, feel free to continue reading if you want to learn more!
EraSearch is made up of internal services. With this service-based architecture, you can customize EraSearch to meet your needs by:
- Configuring, managing, and updating services without impacting other services.
- Adding resources to specific services without having to scale the whole database.
There are four internal EraSearch services, and every service has a specific role. The sections below list the services and what they do.
The API Service receives and handles all client requests, including writes and queries. You can run several API Services to scale EraSearch, increasing how many write and query requests you can make to the database.
The API Service delegates incoming queries to other services and then responds with a single combined result. The diagram section below goes into more detail about that workflow.
The Cache Service handles the in-database hot cache, including:
- Writing data to local storage.
- Compacting data to maximize query performance.
- Servicing query results from local storage.
You can run several Cache Services to scale your database.
The Coordinator Service generates and works with object IDs (OIDs). OIDs are unique identifiers that EraSearch uses to store and retrieve data.
The Coordinator Service stores OIDs in Redis.
The Storage Service works with object storage (for example, S3 in AWS). The service's main roles are to:
- Batch data for long-term storage.
- Help you manage object storage costs.
- Optimize object-storage communication with minimal networking overhead.
This diagram shows how EraSearch's services work together to form the database. It also outlines how writes and queries flow through the system. The sections below go into more detail about the write and query flows.
Here's what happens when you send a write request to EraSearch:
- Clients send a write request to the API Service. Write requests can have one or more documents.
- The API Service requests OIDs from the Coordinator Service. The Coordinator Service creates one OID for each document.
- The API Service sends the OIDs and documents to the Storage Service which batches the data.
- The Storage Service sends the batched data to object storage for long-term storage.
- The Cache Service receives the OIDs and documents, and it compacts the data for future queries.
- The API Service sends a response to the client, acknowledging the write.
Here's what happens when you send a query to EraSearch:
- Clients send a query using Elasticsearch's query-string syntax to the API Service.
- The API Service computes the query results by:
- Sending the query to all Cache Services.
- Merging the results it gets from the Cache Services into one response.
- The API Service returns the query results to the client.
This section introduces compactions and cache eviction. Knowing about these automatic processes can help you monitor and troubleshoot EraSearch. The content below is likely most useful for database administrators.
A compaction occurs when a Cache Service pod consolidates data (or roots) on disk. Compactions make your queries faster.
When you write data to EraSearch, the data format is optimized for write performance. That format ensures EraSearch can quickly index and store incoming data. However, this write-optimized format takes up more volume on disk and reduces query performance. The compacted file formats are optimized for read performance and reduce the number of files, ensuring reads are as fast as possible.
Compactions have levels, which are defined as:
- Level 0 (or L0) is the initial write-optimized format containing less than 1,500 documents.
- Level 1 (or L1) is the first level of compaction. L1s are created by combining 1,500 to 150,000 documents.
- Level 2 (or L2) is the second and final level of compaction. L2s are created by combining more than 150,000 documents.
EraSearch is architected in such a way that failing compactions are safe and expected. When a compaction fails:
- Queries for the data in the failed compaction use a less-efficient data format, making them slower.
- Resource backpressure builds slightly, as the system will eventually retry the failed compaction.
Review the Metrics Reference for ways to identify when compactions are failing and when it might be a sign to take action.
Cache eviction is when a Cache Service pod discards data (or roots) older than a configured Time-To-Live (TTL) to keep the underlying database storage at a healthy level.
Note that cache eviction doesn't mean EraSearch removes the data from object storage as well. If you query data that's no longer in the cache, EraSearch moves the data back into the cache from object storage (a process referred to as rehydration).
Roughly once a minute, each Cache Service pod identifies cache data that is older than the TTL and asynchronously discards it. For example, if you configure EraSearch with a seven-day cache TTL, then EraSearch discards data older than seven days roughly once a minute.
Review the Metrics Reference for ways to monitor cache eviction and disk utilization.
To learn more about getting data into EraSearch, visit the Era Software blog and these documents:
- Writing bulk data
- Writing data with Cloudflare
- Writing data with Fluentd
- Writing data with Fluent Bit
- Writing data with Logstash
- Writing data from Node.js
- Writing data with Telegraf
- Writing data with Vector
To learn more about exploring data in EraSearch, visit:
- Alerting with PagerDuty
- Alerting with Slack
- Connecting EraSearch to Grafana
- Exploring data with the EraSearch UI