Skip to content

Getting started with EraStreams

Estimated time to read: 5 minutes

Info

EraStreams is beta software. If you want to learn more about EraStreams and use it, sign up for beta access and we'll get back to you!

EraStreams is a lightweight and performant pipeline for observability data. Use EraStreams to collect data from several sources and write the data to EraSearch.

With EraStreams, you can:

  • Collect observability data from several sources, including Azure Event Hubs, Kafka, Kubernetes, and log files.
  • Edit, transform, and deduplicate incoming data.
  • Push data to EraSearch and other sinks.

This guide shows how to install and configure EraStreams. By the end of this guide, you'll be using EraStreams to collect data from Azure Event Hubs and write data to EraSearch.

Before you begin

The steps below are intended for existing self-hosted EraSearch users working with Azure Event Hubs. To start using self-hosted EraSearch, contact us at Era Software.

You also need:

  • Access to your EraSearch Helm values file (values-eradb.yaml)
  • EraSearch version 1.22+

Setting up EraStreams

Follow these steps to install and configure EraStreams:

  1. Add this streams section to your EraSearch Helm values file (values-eradb.yaml):
    streams:
      enabled: true
      image:
        repository: us.gcr.io/eradb-registry/streams
        tag: latest
        pullSecrets:
          - name: eradb-registry
      customConfig:
        sources: {} 
        sinks: {} 
    
  2. Create the Azure Event Hubs data source.

    In the same file, add the content below to customConfig.sources, setting:

    • bootstrap_servers to a comma-separated list of host:port pairs for the Event Hubs Service Bus.
    • group_id to the Event Hubs consumer group.
    • topics to a list of Event Hubs topic names to read events from.
    • librdkafka_options."sasl.password" to the raw connection string of the shared access policy for Event Hubs to pull data from.
    Tip: Passing in secrets for librdkafka_options."sasl.password"

    For additional security, follow these steps to use a secret reference for librdkafka_options."sasl.password" instead of hardcoding it:

    1. Create the secret with these commands, replacing CONNECTION_STRING, THIS_SECRET_NAME, and NAMESPACE with your own values:

      $ YOUR_EH_ENDPOINT="Endpoint=sb://CONNECTION_STRING" 
      
      $ echo "apiVersion: v1
      kind: Secret
      type: Opaque
      metadata:
        name: \"THIS_SECRET_NAME\" 
      stringData:
        \"eh-endpoint\": \"${YOUR_EH_ENDPOINT}\"" | kubectl apply -n "NAMESPACE" -f -
      
    2. In values-eradb.yaml, add this env block to the streams configuration, replacing THIS_SECRET_NAME with the name you used above:

      streams:
        env:
          - name: "EH_ENDPOINT"
            valueFrom:
              secretKeyRef:
                name: "THIS_SECRET_NAME" 
                key: "eh-endpoint"
      
    3. In values-eradb.yaml, set streams.customConfig.sources.librdkafka_options."sasl.password" to "${EH_ENDPOINT}".

    streams:
      [...]
      customConfig:
        sources:
          event_hub_in:
            acknowledgements:
              enabled: true
            type: kafka
            bootstrap_servers: XXXXX.servicebus.windows.net:9093
            group_id: "XXXXX"
            topics:
              - "^my_topic.+"
              - "streams-test"
              - "streams-test-2"
            decoding:
              codec: "json"
            tls:
              enabled: true
            librdkafka_options:
              "security.protocol": "sasl_ssl"
              "sasl.mechanism": "PLAIN"
              "sasl.username": "$$ConnectionString" # Keep this setting as the raw string '$$ConnectionString'
              "sasl.password": "Endpoint=sb://XXXXX.servicebus.windows.net/;SharedAccessKeyName=XXXXX;SharedAccessKey=XXXXX"
              "receive.message.max.bytes": "300000000" # This setting is optional; it lets EraSearch accept larger messages
    

    Note

    event_hub_in is the name of the Azure Event Hubs data source. Data source names must be unique, and you use data source names in transformation and sink configurations.

  3. Create the EraSearch data sink.

    In the same file, add the content below to customConfig.sink, replacing:

    • YOUR_ERASEARCH_URL with your EraSearch URL.
      • Example: http://localhost:9200.
    • YOUR_INPUT with the name of the data source or transformation to pull data from. You can use * to pull data from all sources.
      • Example: event_hub_in.
    • YOUR_INDEX_NAME with the target EraSearch index - EraSearch creates the index for you.
    streams:
      [...]
      customConfig:
        sinks:
          erasearch:
            acknowledgements:
              enabled: true
            type: elasticsearch
            healthcheck: false
            endpoint: "YOUR_ERASEARCH_URL"
            inputs:
              - "YOUR_INPUT"
            bulk:
              index: "YOUR_INDEX_NAME"
    

    Tip

    For Kafka-configured streams (including Azure Event Hubs), you can add the topic to the target index name with this configuration:

    streams:
      [...]
      customConfig:
        sinks:
          erasearch:
            [...]
            bulk:
              index: era-{{"{{"}} topic {{"}}"}}-%F
    

You've configured an EraStreams data source and sink. If you're not using EraSearch RBAC, you're all set. You can now upgrade your EraSearch release and query the Azure Event Hubs data in the index you set above.

If you're using EraSearch RBAC, continue to the section below to finish setting up EraStreams.

Setting up RBAC with EraStreams

Follow the steps below to let EraStreams write to EraSearch. You'll need an API key with write permissions for the relevant indexes. Follow steps 1 and 2 in Giving RBAC write permissions to tools to create that key.

  1. Create the Kubernetes secret.

    Enter the command below in your terminal, replacing:

    • THIS_SECRET_NAME with the name of the Kubernetes secret.
    • ERASEARCH_API_KEY with the API key to use for EraStreams requests to EraSearch.
    • NAMESPACE with EraSearch's Kubernetes namespace.
    echo 'apiVersion: v1
      kind: Secret
      type: Opaque
      metadata:
        name: "THIS_SECRET_NAME"
      stringData:
        "streams-api-key": "ERASEARCH_API_KEY"' | kubectl apply -n "NAMESPACE" -f -
    
  2. Add the secret to your EraSearch Helm values file (values-eradb.yaml).

    Under streams, add this env section, replacing THIS_SECRET_NAME with the name you used above:

    streams:
      [...]
      env:
        - name: "ERASEARCH_API_KEY"
          valueFrom:
            secretKeyRef:
              name: "THIS_SECRET_NAME"  
              key: "streams-api-key"    
      customConfig:
        sinks:
          [...]
    
  3. In the same file, add the following requests.headers.Authorization to all sink configurations:

    streams:
      [...]   
      customConfig:
        sinks:
          erasearch:
            [...]
            request:
              headers:
                Authorization: "Bearer ${ERASEARCH_API_KEY}"
    

You're all set. You can now upgrade your EraSearch release and query the Azure Event Hubs data in the index you set above.

If you have any issues setting up EraStreams, contact us at Era Software. Also, let us know if you want EraStreams to support additional data sources or sinks.


Last update: October 6, 2022