Step 5: Build an aggregation pipeline

In the previous step we used SQL to transform a single field that held an HTTP log message into a set of fields. Let’s now take the output of that pipeline and build another one.

In this next pipeline, we’re going to create a new stream that shows how many log events occurred for each type of http_method in a 30 second time frame. This functionality has real-world uses, such as spotting anomalies in application behavior.

Select a field to use as a watermark

To start with, we’ll configure some essential metadata. Watermarks are a core component of the processing logic in time-based aggregations. Although we won’t delve into the details here, you can explore Windowing in SQL and watermarks at your convenience.

  1. From the Streams page, select the http_events stream and select the Schema tab.

    An image showing the stream monitoring overview. align="left"
  2. Next to the event_ts field, select the Watermark icon. Ignore the Max Lateness field; we won’t need it for this tutorial.

    An image showing the icon for selecting a field as a watermark. width="900"
  3. Click Save to apply the watermark.

Create a Pipeline

With the watermark configured, we’ll build the pipeline itself now.

  1. Navigate to the Pipelines page, and select New Pipeline.

  2. In the Pipeline Editor, paste the following SQL statement.

    INSERT INTO http_events_count
    SELECT window_start, window_end, http_method, COUNT(*) AS `count`
    FROM TABLE(
        TUMBLE(TABLE http_events, descriptor(`event_ts`), INTERVAL '30' SECONDS))
    GROUP BY window_start, window_end, http_method

    This does the following:

    • Every 30 seconds, returns a count of HTTP method invocations.

    • Groups the results by the window start time, window end time, and the http_method used.

  3. Click Start from earliest in the preview pane. You should see a count of how many times a specific method was invoked, within a 30-second time window.

    An image showing SQL pipeline preview output for the given query. width="900"
  4. Select Next to open the output stream configuration dialog. Click Create Stream and then select Next.

    An image showing the output stream configuration dialog. width="900"
  5. Then provide a name and an optional description for this pipeline and continue by clicking Create and start.

  6. In the start up dialog, choose "S" as task size and select Earliest before confirming these settings by clicking Start.

After a minute or two, your pipeline is running and processing data from the http_events input stream.

An image showing the running pipeline. width="900"

If you click the blue metrics icon in the lower right area, you should see some metrics for this pipeline.

An image showing input and output metrics for the running pipeline. width="900"

Note that metrics are reset upon each activation. Stopping and restarting a pipeline resets metrics back to zero.