Esper

/Esper
Esper 2017-11-03T20:31:57+00:00

Esper is software for complex event processing (CEP) and streaming analytics, available for .NET as NEsper.

Esper and NEsper enable rapid development of applications that process large volumes of incoming messages or events, regardless of whether incoming messages are historical or real-time in nature. Esper and NEsper filter and analyze events in various ways, and respond to conditions of interest.

Esper and Event Processing Language (EPL) provide a highly scalable, memory-efficient, in-memory computing, SQL-standard, minimal latency, real-time streaming-capable Big Data processing engine for any-velocity online and real-time arriving data and high-variety data, as well as for historical event analysis.

Esper is not limited to running on a single machine and runs well inside a distributed stream processing framework. Esper makes sense and can run in any architecture and any container, as it has no dependencies on external services and does not require any particular threading model or model of how time advances and does not require any external storage. Esper works well with event-time and watermark-based time management.

Esper has an horizontal scale-out architecture for linear horizontal scalability, elastic scaling, load distribution, balancing and re-balancing, fault tolerance, dynamic discovery of nodes through seed nodes, replication and multi-datacenter support. Esper's horizontal scale-out architecture builds on Apache Kafka and Apache Zookeeper, see Esper Enterprise Edition.

The design priorities for Esper are:

  1. Low latency and high throughput.
  2. Expressiveness, conciseness, extensibility of the EPL language.
  3. Compliance to standards and best practices.
  4. Light-weight in terms of memory, CPU and IO usage.

Esper Provides an Online Application for EPL:
Esper EPL Online

Technology Introduction

Complex event processing (CEP) delivers high-speed processing of many events across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time (source:Wikipedia).

Esper offers a Domain Specific Language (DSL) for processing events. The Event Processing Language (EPL) is a declarative language for dealing with high frequency time-based event data. EPL is compliant to the SQL-92 standard and extended for analyzing series of events and in respect to time.

Some typical examples of applications are:

  • Business process management and automation (process monitoring, BAM, reporting exceptions, operational intelligence)
  • Finance (algorithmic trading, fraud detection, risk management)
  • Network and application monitoring (intrusion detection, SLA monitoring)
  • Sensor network applications (RFID reading, scheduling and control of fabrication lines, air traffic)

Commonly Asked Questions

How does it work? How does it compare with other CEP products? And to stream processing? How has this been tested? What is the performance? What is the license? Can I get support?

Esper Feature Summary

Byte Code Generation is a technique that blends state-of-the-art from modern compilers and MPP databases.
Esper generates JVM byte code from EPL statements. This is a new feature in Esper version 7.x.

Byte code generation can significantly speed up processing as it eliminates many virtual calls, casts and branching.

Byte code generation allows the virtual machine to optimize the generated code and allows the hardware to execute faster.

Esper implements the best architecture for performance engineering in data processing by performing byte code generation.

Not all workloads can benefit from byte code generation to the same degree.

Data windows are for managing fine-grained event expiry. They instruct the engine how long to retain relevant events or under what conditions events can be discarded. Data windows operate on the level of individual queries, streams and subqueries.

Sliding windows: time, length, sorted, ranked, accumulating, time-ordering, externally-timed (value-based windowing), expiry-expression-based with aggregations

Tumbling windows: time, length and multi-policy; first-event; expiry-expression-based with aggregations

Combine windows with intersection and union semantics.

Partitioned windows. Dynamically shrinking or expanding windows.

For example, use a time window to keep arriving events for N seconds. The engine let events go (expires) that are older than N seconds.

Having a good variety of configurable and combinable data windows available allows you to address more analysis requirements and address common requirements concisely.

Named windows are globally visible data windows that allow sharing sets of events between queries efficiently, removing the need to keep the same events in multiple places.

Define custom criteria for entering events and for expiring events.

Esper supports fire-and-forget (on-demand) queries against named windows including joins.

Esper supports explicit indexes (hash and btree).

Esper supports update-insert-delete (aka. merge or upsert) and select-and-delete in a single atomic operation.

Allows defining event expiry once and apply it across multiple queries.

On-demand (fire-and-forget, execute-once, non-continuous) queries are useful for getting current state once and upon request.

Explicit indexes help in reusing indexes between queries, in performance and in query (statement, we use the term interchangeably) planning.

Atomic operations allow more concise EPL and can help performance.

Tables are global data structures that can hold aggregation state alongside data and events, and allow update-in-place.

Table columns can be of type aggregation; table rows can hold the aggregation state itself.

Tables allow statements to co-locate aggregation state, update-in-place data and event data conveniently.

Esper supports fire-and-forget (on-demand) queries against tables including joins.
Esper supports explicit indexes (hash and btree).
Esper supports update-insert-delete (aka. merge or upsert) and select-and-delete in a single atomic operation.

Allow multiple statements to aggregate into the same state (co-aggregation).

Co-locating and updating-in-place can have significant performance advantages and can reduce memory use.

See Named Windows above for advantages to using fire-and-forget, explicit indexes and merge/upsert.

This category is event series analysis - analyzing a series, stream or historical events.
Match-recognize is a query model for pattern matching based on regular expressions. Regular expressions are often easy to understand. Many patterns can be expressed concisely with match-recognize.

Patterns is pattern language that provides logical and temporal event correlation.

Timer-control is part of patterns and includes a crontab-like 'at' operator.

The lifecycle of patterns can be controlled by timer and via operators, repeat-number and repeat-until, every-distinct, while.

Patterns offer an expressive way to specify more complex time and/or correlation relationships.

Patterns, for example time-repeating patterns that trigger based on time passing, are often combined in the from-clause with other streams or used as triggers.

Grouping, aggregation, rollup, cube, sorting, filtering, transforming, merging, splitting or duplicating of event series or streams.

These typical operations on a series of events build the foundation of many analysis solutions.

A stream by itself has near-zero cost in terms of memory or CPU use.

Individual aggregation functions can specify their own level of grouping in addition to or separate from the group-by clause.

Context declarations allow providing the context information of your situation detection. Contexts can control detection lifetime and concurrency aspects.

Context dimensions can, for example, be based on consistent-hashes, keys, categories, or overlapping and non-overlapping.

Context partitions can be initiated by event arrival and patterns and can be terminated based on a correlation, for example.

Context declarations can be nested to provide finer-grained control.

This allows framing the situation to be detected.

The engine processes context partitions concurrently allowing effective use of multiple threads and fine-grained locking.

Output rate limiting and stabilizing, snapshot output This provides fine grained control over output frequency and content.
Event consumption is control over non-consuming (event available for further matching) and consuming (event not available for further matching) operation. Some use cases require consumption and others don't.
Enumeration methods execute lambda-expressions. They are useful for analyzing a collection of values or events. They are stateless. The analysis function is passed as a parameter into the enumeration method.
Date-time methods provide common date-time operations. This helps when performing date-time arithmetic, for example.
Allan's interval algebra with support for point-in-time events and events with duration. This helps when you want to compare events in terms of their interval and time relationships.
Declared expressions allow reusing common expressions within and across queries. So you don't need to duplicate common expressions.
Script integration for calling external scripting language scripts right within EPL, such as JavaScript, MVEL or other JSR 223 scripts. This allows specifying code as part of the EPL query.

Joining external data for easy integration with external data sources such as web services, for example.

Relational database access via SQL-query joins with event streams: LRU (least-recently used) and expiry-time query result caches; Keyed cache entries for fast cache lookup; Engine indexes cached rows for fast filtering within a large number of SQL-query result rows; Multiple SQL-queries in one query transparently integrates multiple autonomous database systems.

Method invocation joins.

This provides one common means for integrating relational and non-relational external data.
Variables and constants with guarantees of consistency and atomicity of variable updates within and across queries.

These can occur in any expression and can make EPL dynamically controllable and easier to maintain.

Constants ensure query optimizations for constant values are possible for variables that never change.

Approximation Algorithms for summarizing data in streams.

The Count-min sketch (or CM sketch) is a probabilistic sub-linear space streaming algorithm which can be used to summarize a data stream in many different ways (Source: Wikipedia).

Count-min sketch computes an approximate frequency and top-k, without retaining distinct values in memory.

Event representation discusses event typing, event objects and event type relationships.
Events can be Java objects or Map interface implementations or Object-array (Object[]) or Avro/JSON or XML documents. Freedom of choosing the best object type(s) for your use case considering trade-offs between types, and without requiring transformation
on both the incoming side and the outgoing side. Freedom to use dynamic types that are not predefined classes. Power to use existing objects when they are already available.
Esper supports event-type inheritance and polymorphism for all event types including for Map and object-array representations. Allows modeling event type hierarchy, extension and event behavior.
Event properties can be simple, indexed, mapped or nested. By supporting nesting as well as key-value properties and multi-value properties the event information model can be rich and more useful.
Esper allows querying of deep event object graphs and XML structures, for example, including for Map and object-array event representations. Relationships between events and related data structures such as reference data can be naturally modeled.
Esper supports dynamic typing of properties, further supported by cast, instanceof and exists functions. Useful when, at the time of continuous query creation, it is not known whether properties will be present and what the property type may be for any given event that arrives.
Esper supports a create-schema syntax to declare event types from a column-and-type list, from existing classes or from other types, by means of templating for example, with declarative inheritance.

Types can be defined explicitly or implicitly (insert-into).

When un-deploying a module of EPL queries the engine can drop associated types that are no longer used anywhere.

Variant event-typed streams allows treating disparate types of events as the same type, such as when the event type can only be known at runtime, when the event type is expected to vary, or when optional properties are desired. For use cases that have multiple un-reconciled event types.
Versioned events that update, provide a new version or that revise an existing event. Allows expressing more concisely when events are actually newer versions of previous events.
EPL syntax for contained events.

Contained-event select syntax for easy handling of coarse-grained, business-level events which themselves contain events or that need to be broken down into rows.

Allows writing EPL directly against "unpacked", packaged-within or inner-repeated data.

SQL-standard based and other applicable standards

Familiar SQL-standard-based continuous query language using insert into, select, from, where, group-by, having, order-by, limit and distinct clauses.

Inner-joins and outer joins (left, right, full) of an unlimited number of streams or windows.

Sub-queries including "exists" and "in".

Rollup and Cube with grouping set definition.

SQL provides well-defined semantics, is standardized and can help flatten the learning curve.

The design of EPL is as close as feasible to SQL and extends SQL.

Support for relevant parts of ISO 8601 for the exchange of date and time-related data.

Execution characteristics
Scalability in the face of large numbers of continuous queries. Let's say you have 10.000 queries that all read from the same input stream, check if a specific attribute (namely, price) of an event is inside a
given random interval or that use equals on some event attributes. Esper detects that many queries have a condition on the same variable(s) and creates a decision tree, thereby evaluation cost of an event is only log N and only in the worst case O(N).
Allow for high degree of parallelization processing the same query and processing multiple queries. Stateless queries process lock-free.

The Esper design can help maximize throughput under threading but protect state from concurrent modification. Esper can execute stateless and lock-free where possible.

Thereby Esper can achieve data parallelism and component parallelism.

Multithread-safe.

Create, start and stop queries during operation.

Applications can retain full control over threading; Inbound, outbound and execution threading configurable and none provided by default.

Since Esper doesn't have a strong opinion on what threads exist and since Esper doesn't have to queue events, it is suitable to run in any container or process.

Esper can achieve optimal performance by not requiring thread handoffs, context switches or queue synchronization.

API

Full control over the concept of time.

Supports externally-provided time as well as current system time, allowing applications full control over the concept of time within an engine and full control over
which thread(s) evaluate timer schedule for queries.

This can be useful for replaying historical data.

Allows to use a more precise or accurate time then perhaps provided by the JVM.

Allows control over time passing.

Multiple independent engines per process This is useful when you want separation but want to operate in the same JVM instance.
Add and remove queries at runtime. New queries can be created at runtime without stopping processing.

Enable/disable continuous queries and/or partitions without losing state.

Control event visibility and the concept of time on a query level.

Esper can disable/enable any query and context partition without the need of removing and adding it again, also during runtime.

This allows loading queries from historical data and merge continuous queries to receive online event streams after historical load completed.

Push and pull

Support for both push- or subscription-based delivery to listeners/subscribers/observers as well as a pull- or receive-based for querying current results.
Concurrency-safe and read-write locked for multiple readers.

Sometimes it is more convenient for an application to ask for current results then constantly receive data.

Increases performance since the engine can skip pushing output.

Mature API

Module parsing and deployment API.

Esper provides a small test framework for unit or regression testing EPL-based applications.

API maturity helps you since between releases you can expect little to no code changes and release compatibility.

Esper offers an organization of EPL into modules for convenient deployment management.

Statement (Query) Object Model

A set of classes providing an object-oriented representation of a EPL query.

Full and complete specification of a query via object model.

Round-trip from object model to query text and back to object model.

Build, change or interrogate EPL queries beyond the textual representation.

This feature can make tool development easier. It also makes otherwise opaque EPL strings useful.

Prepared queries and substitution parameters (named and number-indexed).

Precompile a query with substitution parameters and efficiently execute or start the parameterized queries multiple times, similar to JDBC prepared queries.

Reduces execution time for fire-and-forget queries and creation time for continuous queries.
JSON and XML output event rendering Easy output formatting for common formats without custom code.
Data Flow-Type Invocation of EPL operators For highest-performance use cases, custom flows or IO the dataflow declaration offers a lower-level access or control over EPL select and event bus operations.

Extensibility
Pluggable architecture for event pattern and event stream analysis via user-defined functions, plug-in views, plug-in aggregation functions, plug-in pattern guards and plug-in pattern event observers and event instance methods. Virtual data windows for transparently backing named windows with an external store. Applications can plug-in their own event representation and dynamic type resolution Extending the EPL grammar allows for application-provided features to seamlessly integrate.

Input and Output Adapters

CSV input adapter reads comma-separated value formats; simulate multiple event streams with timed, coordinated playback via timestamp column; load generation; preloading of reference data

JMS input+output adapter based on Spring JMS templates

HTTP input+output adapter

Kafka input+output adapter

DB output adapter for running DML and for keyed update-insert (aka. upsert)

Socket input adapter

See Apache Camel and other ESBs for more adapters.

JMX metrics exposure.

Mostly applications don't need much input or output adapter code as the API makes feeding events and receiving events easy.

Examples
Numerous examples, online solution patterns page To get started and for self-help.
Benchmark kit The benchmark kit is a possible foundation for performing your own measurements. *Note the documentation chapter on additional performance tips that are not necessarily implemented by the benchmark.