Skip to content

EraSearch overview

Estimated time to read: 9 minutes

EraSearch is a log-management platform for storing, exploring, and managing large amounts of data. Use it to simplify your log-management setup, lower costs, and get the most out of your data.

This page describes EraSearch and ways to use it. If you want to start using EraSearch now, visit the getting started series.

What EraSearch is

EraSearch is a petabyte-scale platform for log management. With all EraSearch instances, you can:

  • Store and manage lots of data with fewer resources.
  • Use the REST API to interact with data.
  • Explore data with an Elasticsearch-like language.
  • Write and act on data with several integrations.

Store and manage lots of data with fewer resources

EraSearch stores data in cold storage and keeps a hot cache for faster queries. It separates storage and compute to help you:

  • Use fewer resources.
  • Handle more data.
  • Lower your costs.

Use the REST API to interact with data

Connect and work with EraSearch using the EraSearch REST API. With the API, you can write and query data with HTTP requests using your preferred language or framework.

Here's an example of using the API with cURL to write data to an EraSearch instance on EraCloud:

$ curl -XPOST 'https://db-eraeraera123123123eraera.e12.eradb.com/_bulk' \
  -H 'Authorization: Bearer abcdefghijklmnop12345678910' \
  -d '{"index":{"_index":"my_era_logs"}}
  {"_line": "my first log line"}'

Explore data with an Elasticsearch-like language

EraSearch supports most of Elasticsearch's query string syntax. Use the same syntax to write queries, and explore data based on keywords, ranges, booleans, and wildcards.

Here's an example of a query to EraSearch. It requests all data where _line contains the word verified, trail exists, and elevation is greater than one.

_line:verified AND _exists_:trail AND elevation:>1

Write and act on data with several integrations

EraSearch integrates with several popular tools, helping you fit EraSearch into your existing setup. Some of those integrations include Grafana, Vector, Telegraf, Logstash, Fluentd, and Cloudflare.

To see all EraSearch integrations, visit the lists of write integrations and explore integrations.

Ways to use EraSearch

There are two ways to use EraSearch: EraSearch on EraCloud and self-hosted EraSearch.

EraSearch on EraCloud

This is Era Software’s SaaS product. It offers hosted EraSearch fully managed by Era Software. To start using EraSearch on EraCloud, visit the getting started series.

Self-hosted EraSearch

Self-hosted EraSearch is when you run and manage EraSearch on your own cloud infrastructure. Reach out to us at Era Software to get started with self-hosted EraSearch.

EraCloud vs. self-hosted EraSearch

This table outlines some of the differences between EraCloud and self-hosted EraSearch:

EraSearch on EraCloud Self-hosted EraSearch
Setup and installation Era Software's sign-up UI Self-install in your environment
Managing and hosting Era Software on Amazon Web Services (AWS) You, on your own cloud provider (AWS, Google Cloud Platform (GCP), or Microsoft Azure)
Authentication API key Basic auth
Authorization Role-based access control (RBAC) for EraCloud-specific features RBAC for database features
Data exploration and alerting EraSearch UI or integrations Integrations

Era Software is working towards feature parity across EraCloud and self-hosted EraSearch. Here are some of the items we're focusing on:

  • Cloud providers - In addition to AWS, future EraCloud versions will support GCP and Microsoft Azure.
  • EraSearch UI - Future self-hosted EraSearch versions will support the EraSearch UI. Until then, you can use Grafana to visualize and interact with your self-hosted EraSearch data.
  • RBAC - In addition to RBAC for EraCloud-specific features, future EraCloud versions will support RBAC for database features.

Common workflows

Here's a common workflow for EraSearch on EraCloud, and some documentation to get you started:

Here's a common workflow for self-hosted EraSearch, and some documentation to get you started:

How EraSearch works

This section describes EraSearch's database architecture. If you're a self-hosted user, this information helps you install and manage EraSearch. If you're an EraCloud user, you don't need this information to work with EraSearch on EraCloud. But, feel free to continue reading if you want to learn more!

EraSearch is made up of internal services. With this service-based architecture, you can customize EraSearch to meet your needs by:

  • Configuring, managing, and updating services without impacting other services.
  • Adding resources to specific services without having to scale the whole database.

Key services

There are four internal EraSearch services, and every service has a specific role. The sections below list the services and what they do.

API Service

The API Service receives and handles all client requests, including writes and queries. You can run several API Services to scale EraSearch, increasing how many write and query requests you can make to the database.

The API Service delegates incoming queries to other services and then responds with a single combined result. The diagram section below goes into more detail about that workflow.

Cache Service

The Cache Service handles the in-database hot cache, including:

  • Writing data to local storage.
  • Compacting data to maximize query performance.
  • Servicing query results from local storage.

You can run several Cache Services to scale your database.

Coordinator Service

The Coordinator Service generates and works with object IDs (OIDs). OIDs are unique identifiers that EraSearch uses to store and retrieve data.

The Coordinator Service stores OIDs in Redis.

Storage Service

The Storage Service works with object storage (for example, S3 in AWS). The service's main roles are to:

  • Batch data for long-term storage.
  • Help you manage object storage costs.
  • Optimize object-storage communication with minimal networking overhead.

Architecture diagram

This diagram shows how EraSearch's services work together to form the database. It also outlines how writes and queries flow through the system. The sections below go into more detail about the write and query flows.

EraSearch architecture diagram

Write flows

Here's what happens when you send a write request to EraSearch:

  1. Clients send a write request to the API Service. Write requests can have one or more documents.
  2. The API Service requests OIDs from the Coordinator Service. The Coordinator Service creates one OID for each document.
  3. The API Service sends the OIDs and documents to the Storage Service which batches the data.
  4. The Storage Service sends the batched data to object storage for long-term storage.
  5. The Cache Service receives the OIDs and documents, and it compacts the data for future queries.
  6. The API Service sends a response to the client, acknowledging the write.

Query flows

Here's what happens when you send a query to EraSearch:

  1. Clients send a query using Elasticsearch's query string syntax to the API Service.
  2. The API Service computes the query results by:
  3. Sending the query to all Cache Services.
  4. Merging the results it gets from the Cache Services into one response.
  5. The API Service returns the query results to the client.

Advanced topics

This section introduces compactions and cache eviction. Knowing about these automatic processes can help you monitor and troubleshoot EraSearch. The content below is likely most useful for database administrators.

Compactions

A compaction occurs when a Cache Service pod consolidates data (or roots) on disk. Compactions make your queries faster.

When you write data to EraSearch, the data format is optimized for write performance. That format ensures EraSearch can quickly index and store incoming data. However, this write-optimized format takes up more volume on disk and reduces query performance. The compacted file formats are optimized for read performance and reduce the number of files, ensuring reads are as fast as possible.

Compactions have levels, which are defined as:

  • Level 0 (or L0) is the initial write-optimized format containing less than 1,500 documents.
  • Level 1 (or L1) is the first level of compaction. L1s are created by combining 1,500 to 150,000 documents.
  • Level 2 (or L2) is the second and final level of compaction. L2s are created by combining more than 150,000 documents.

EraSearch is architected in such a way that failing compactions are safe and expected. When a compaction fails:

  • Queries for the data in the failed compaction use a less-efficient data format, making them slower.
  • Resource backpressure builds slightly, as the system will eventually retry the failed compaction.

Review the Metrics Reference for ways to identify when compactions are failing and when it might be a sign to take action.

Cache eviction

Cache eviction is when a Cache Service pod discards data (or roots) older than a configured Time-To-Live (TTL) to keep the underlying database storage at a healthy level.

Note that cache eviction doesn't mean EraSearch removes the data from object storage as well. If you query data that's no longer in the cache, EraSearch moves the data back into the cache from object storage (a process referred to as rehydration).

Roughly once a minute, each Cache Service pod identifies cache data that is older than the TTL and asynchronously discards it. For example, if you configure EraSearch with a seven-day cache TTL, then EraSearch discards data older than seven days roughly once a minute.

Review the Metrics Reference for ways to monitor cache eviction and disk utilization.

Next steps

To start using EraSearch on EraCloud, visit the getting started series. If you're interested in self-hosted EraSearch, reach out to us at Era Software.

To learn more about getting data into EraSearch, visit the Era Software blog and these documents:

To learn more about exploring data in EraSearch, visit:


Last update: September 27, 2022