EsperTech-BlogAuthor

About EsperTech Inc.

This author has not yet filled in any details.
So far EsperTech Inc. has created 16 blog entries.

Esper-8 Migrating Configuration From Older Versions

Introduction

Configuration now has 3 sections:

  1. Common section: All configuration common to the compiler and the runtime
  2. Compiler section: Configuration that applies to the compiler only
  3. Runtime section: Configuration that applies to the runtime only

The compiler ignores the runtime section of the configuration, and the runtime ignores the compiler section of the configuration.

Configuration Object Class and Package Name

Esper-8 moved configuration classes to com.espertech.esper.common.client.configuration and the common, compiler and runtime packages.

Esper-8 renamed configuration classes. A table of renamed classes can be found below. Certain configuration inner classes have been moved to com.espertech.esper.common.client.util.

Migrating XML files

The new schema is esper-configuration-8-0.xsd .

NOTE: The new schema is not compatible with the version 7 schema.

The new schema moves all configuration to <common>, <compiler> and <runtime> elements.

EsperTech provides an online cloud-based application that can read an existing Esper-7 XML configuration and produce an Esper-8 XML configuration. The link is http://esper-config-upgrade7to8.appspot.com.
The class that provides the functionality is com.espertech.esper.common.client.configuration.ConfigurationSchema7To8Upgrade.

Table of Updates

The below table is structured by configuration topic. Within the topic the table describes into which section (common, compiler, runtime) the configuration moves.

In some cases, specifically for engine settings, multiple sections may apply. In this case the table provides instructions which subset of properties belongs to which section.

Category Move to Esper-7 Esper-8
Event Types common <event-type> <common><event-type></common>
ConfigurationEventTypeLegacy ConfigurationCommonEventTypeBean
ConfigurationEventTypeMap ConfigurationCommonEventTypeMap
ConfigurationEventTypeObjectArray ConfigurationCommonEventTypeObjectArray
ConfigurationEventTypeXMLDOM ConfigurationCommonEventTypeXMLDOM
ConfigurationEventTypeAvro ConfigurationCommonEventTypeAvro
Event Type Auto Name common <event-type-auto-name> <common><event-type-auto-name></common>
Imports common <auto-import> <common><auto-import></common>
Annotation Imports common <auto-import-annotations> <common><auto-import-annotations></common>
Database References common <database-reference> <common><database-reference></common>
ConfigurationDBRef ConfigurationCommonDBRef
Method Join References common <method-reference> <common><method-reference></common>
ConfigurationMethodRef ConfigurationCommonMethodRef
Variables common <variable> <common><variable></common>
ConfigurationVariable ConfigurationCommonVariable
Variant Stream common <variant-stream> <common><variant-stream></common>
ConfigurationVariantStream ConfigurationCommonVariantStream
Byte Code Generation compiler <bytecodegen> <compiler><bytecode></compiler> (Note: Element name rename. Note: Configuration items changed significantly.)
ConfigurationEngineDefaults.ByteCodeGeneration ConfigurationCompilerByteCode (Note: Configuration items changed significantly.)
Plug-In View compiler <plugin-view> <compiler><plugin-view></compiler> (NOTE: requires configuring a forge class)
ConfigurationPlugInView ConfigurationCompilerPlugInView
Plug-In Virtual Data Window compiler <plugin-virtualdw> <compiler><plugin-virtualdw></compiler> (NOTE: requires configuring a forge class)
ConfigurationPlugInVirtualDataWindow ConfigurationCompilerPlugInVirtualDataWindow
Plug-In Aggregation Function compiler <plugin-aggregation-function> <compiler><plugin-aggregation-function></compiler> (NOTE: requires configuring a forge class)
ConfigurationPlugInAggregationFunction ConfigurationCompilerPlugInAggregationFunction
Plug-In Aggregation Multi-Function compiler <plugin-aggregation-multifunction> <compiler><plugin-aggregation-multifunction></compiler> (NOTE: requires configuring a forge class)
ConfigurationPlugInAggregationMultiFunction ConfigurationCompilerPlugInAggregationMultiFunction (NOTE: requires configuring a forge class)
Plug-In Single-Row Function compiler <plugin-singlerow-function> <compiler><plugin-singlerow-function></compiler>
ConfigurationPlugInSingleRowFunction ConfigurationCompilerPlugInSingleRowFunction
Plug-In Pattern Guard compiler <plugin-pattern-guard> <compiler><plugin-pattern-guard></compiler> (NOTE: requires configuring a forge class)
ConfigurationPlugInPatternObject ConfigurationCompilerPlugInPatternObject
Plug-In Pattern Observer compiler <plugin-pattern-observer> <compiler><plugin-pattern-observer></compiler> (NOTE: requires configuring a forge class)
ConfigurationPlugInPatternObject ConfigurationCompilerPlugInPatternObject
Plug-In Loader runtime <plugin-loader> <runtime><plugin-loader></runtime>
ConfigurationPluginLoader ConfigurationRuntimePluginLoader
Engine Settings - Threading runtime <engine-settings><threading></engine-settings> <runtime><threading></runtime>
ConfigurationEngineDefaults.Threading ConfigurationRuntimeThreading
Engine Settings - Event Type Meta Information common <engine-settings><event-meta></engine-settings> <common><event-meta></common>
ConfigurationEngineDefaults.EventMeta ConfigurationCommonEventTypeMeta
Engine Settings - View Resources compiler <engine-settings><view-resources></engine-settings> <compiler><view-resources></compiler>
ConfigurationEngineDefaults.ViewResources ConfigurationCompilerViewResources
Engine Settings - Logging common, compiler and runtime <engine-settings><logging></engine-settings> <common><logging></common> (see below)
<compiler><logging></compiler> (see below)
<compiler><logging></compiler> (see below)
ConfigurationEngineDefaults.Logging ConfigurationCommonLogging, ConfigurationCompilerLogging, ConfigurationRuntimeLogging
common Query-plan and jdbc
compiler Code logging
runtime Audit pattern, execution path, timer debug
Engine Settings - Logging common and compiler and runtime <engine-settings><logging></engine-settings> <common><logging></common> (see below)
<compiler><logging></compiler> (see below)
<runtime><logging></runtime> (see below)
ConfigurationEngineDefaults.Logging ConfigurationCommonLogging, ConfigurationCompilerLogging, ConfigurationRuntimeLogging
Engine Settings - Variables runtime <engine-settings><variables></engine-settings> <runtime><variables></runtime>
ConfigurationEngineDefaults.Variables ConfigurationRuntimeVariables
Engine Settings - Patterns runtime <engine-settings><patterns></engine-settings> <runtime><patterns></runtime>
ConfigurationEngineDefaults.Patterns ConfigurationRuntimePatterns
Engine Settings - Match-Recognize runtime <engine-settings><match-recognize></engine-settings> <runtime><match-recognize></runtime>
ConfigurationEngineDefaults.MatchRecognize ConfigurationRuntimeMatchRecognize
Engine Settings - Match-Recognize runtime <engine-settings><match-recognize></engine-settings> <runtime><match-recognize></runtime>
ConfigurationEngineDefaults.MatchRecognize ConfigurationRuntimeMatchRecognize
Engine Settings - Stream Selection compiler <engine-settings><stream-selection></engine-settings> <compiler><stream-selection></compiler>
ConfigurationEngineDefaults.StreamSelection ConfigurationCompilerStreamSelection
Engine Settings - Time Source common and runtime <engine-settings><time-source></engine-settings> <common><time-source></common> for the "timeUnit" setting and <runtime><time-source></runtime> for the "timeSourceType" setting
ConfigurationEngineDefaults.TimeSource ConfigurationCommonTimeSource and ConfigurationRuntimeTimeSource
Engine Settings - Metrics Reporting runtime <engine-settings><metrics-reporting></engine-settings> <runtime><metrics-reporting></runtime>
ConfigurationMetricsReporting ConfigurationRuntimeMetricsReporting
Engine Settings - Language compiler <engine-settings><language></engine-settings> <compiler><language></compiler>
ConfigurationEngineDefaults.Language ConfigurationCompilerLanguage
Engine Settings - Expression compiler and runtime <engine-settings><expression></engine-settings> <compiler><expression></compiler> and <runtime><expression></runtime>
ConfigurationEngineDefaults.Expression ConfigurationCompilerExpression and ConfigurationRuntimeExpression
compiler For all expression settings except "selfSubselectPreeval" and "timezone"
runtime For expression settings "selfSubselectPreeval" and "timezone"
Engine Settings - Execution common, compiler and runtime <engine-settings><execution></engine-settings> <common><execution></common> and <compiler><execution></compiler> and <runtime><execution></runtime>
ConfigurationEngineDefaults.Execution ConfigurationCommonExecution and ConfigurationCompilerExecution and ConfigurationRuntimeExecution
common For "threadingProfile" setting
compiler For "filterServiceMaxFilterWidth" setting
runtime For all other settings
Engine Settings - Exception Handling runtime <engine-settings><exceptionHandling></engine-settings> <runtime><exceptionHandling></runtime>
ConfigurationEngineDefaults.ExceptionHandling ConfigurationRuntimeExceptionHandling
Engine Settings - Condition Handling runtime <engine-settings><conditionHandling></engine-settings> <runtime><conditionHandling></runtime>
ConfigurationEngineDefaults.ConditionHandling ConfigurationRuntimeConditionHandling
Engine Settings - Scripts compiler <engine-settings><scripts></engine-settings> <compiler><scripts></compiler>
ConfigurationEngineDefaults.Scripts ConfigurationCompilerScripts
Revision Event Type <revision-event-type> Not supported by Esper 8
Plug-in Event Representation <plugin-event-representation> Not supported by Esper 8
<plugin-event-type> Not supported by Esper 8
<plugin-event-type-name-resolution> Not supported by Esper 8

Removed Configuration

  • The view resources configuration removes the share-views setting and the allow-multiple-expiry-policy setting.
  • The execution configuration removes the allow-isolated-service setting.
  • The runtime threading engine-fairlock is now runtime-fairlock.
  • The runtime metrics reporting engine-interval is now runtime-interval.
  • The runtime metrics reporting jmx-engine-metrics is now jmx-runtime-metrics.
  • The HA-configuration enginelock-settings is now runtimelock-settings.
2019-06-28T17:17:58+00:00Esper-8|

Esper-8 Conceptual Differences to Older Versions

This section overviews conceptual changes and is not an all-inclusive statement of changes. It is intended to provide background information.

Removed CGLIB Dependency

As the compiler generates all the byte code to access event properties and call methods we removed the CGLIB library as a dependency entirely.

ANTLR Jar Dependency Only at Compile-Time

Only the compiler needs the ANTLR library and the runtime does not need ANTLR anymore. The only dependency of the runtime is the SLF4J library.

Public API Names

In Esper 8 the compiler and runtime and replace the service provider of Esper 7. The term "service provider" and "engine" were thus replaced by runtime. The resulting name changes in the public API are substantial. Some of the renamed classes are listed below. Please consult the API-migration doc for a table on impacted classes.

  • EPServiceProviderManager is now EPRuntimeProvider
  • EPServiceProvider is now EPRuntime
  • EPRuntime is now EPEventService
  • EPAdministrator is now EPDeploymentService

Names for Statements, Event Types, Named Windows, Tables, Variables, Contexts, Expressions, Scripts and Indexes

The compiler does not have access to deployment-time names and therefore cannot determine whether a name is already used or not.

For example, an EPL module defines a statement by name "A" using "@name('A')". The statement name may already be used by another statement in the runtime environment. In Esper-7 a statement name was unique across the runtime and Esper-7 simply generated a new statement name. Generating a new name however is not an option for variable names, context names, named window names and others.

Therefore in Esper-8 a name is unique in combination with the deployment-id. This is true for all names such as statement name, context name, variable name and others.

For example, an EPL module may define a statement by name "A". The compiler compiles the module to byte code. The runtime deploys the byte code using deployment id 'D1'. The statement is uniquely named as a combination of deployment id and statement name. The combined key is {deploymentId='D1', statementName='A'}.

This change meant that all APIs that handle names must take the deployment id as an additional parameter or must return the deployment id as part of the result. Your application must iterate and find statements using deployment ids.

EPL Module as the Source Code Unit

EPL modules consist of EPL statements. The compiler compiles EPL modules to byte code and you can deploy that byte code to the runtime. Any single EPL statement must be part of a module for compiling and deploying.

Esper-8 thus removes the single EPL statement as a unit and removes the "createEPL" methods for creating a single statement. It also removes the statement lifecycle, therefore the statement "stop", "start" and "destroy" methods are removed. They are replaced by "deploy" and "undeploy" for byte code and deployment lifecycle.

Configuration

The compiler and runtime are now completely separated. This meant that the configuration that applies to the compiler needed to be separated from the configuration that applies to the runtime. There are some common parts to configuration that can apply at compile-time and at runtime.

Keeping one configuration object that contains a common, a compiler and a runtime configuration is easiest to use. It however means that existing configuration must all migrate to one of the 3 sections.

In effect this means the configuration XML now has a <common>, <compiler> and <runtime>. In turn the configuration class offers Configuration#getCommon, Configuration#getCompiler and Configuration#getRuntime.

Bean Event Types

One of the issues that users of Esper-7 had trouble with was the fact that Esper-7 identified Bean-style event types by class disallowing use of the same class to represent multiple event types without subclassing. Esper-8 identifies a Bean-style event type by name that consistent with all other event types.

In Esper-7 the from-clause could specify a full-qualified or otherwise resolvable class name and Esper-7 would dynamically allocate a bean event type for the same fully-qualified class name. With the compiler architecture this behavior would cause implicit type references.

In Esper-8 all event types are either preconfigured or allocated using create schema or insert into.

Duplicate Listener Interfaces

In Esper-7 there were two types of listeners. Esper-8 simplifies the listener interface into a single UpdateListener interface.

Subscribers

As the compiler produces byte code for delivering results to subscribers, that byte code is unnecessary for applications that don't use subscribers. In Esper-8 there is a new setting to control whether the compile produces byte code for subscribers.

Substitution Parameters

In many cases the compiler must know the type of the substitution parameter to ensure type-safety. Therefore Esper-8 supports a type name for substitution parameters, using "?:name:type". The compiler must assume Object-type in case the EPL does not provide a type.

2019-06-28T17:18:05+00:00Esper-8|

Esper-8 Compiler and Runtime

Beginning with version 8 Esper has a compiler and runtime architecture. With version 8 Esper is a language, a language compiler and a runtime environment.

The Esper language is the Event Processing Language (EPL). It is a declarative, data-oriented language for dealing with high frequency time-based event data. EPL is compliant to the SQL-92 standard and extended for analyzing series of events and in respect to time.

The Esper compiler compiles EPL source code into Java Virtual Machine (JVM) bytecode so that the resulting executable code runs on a JVM within the Esper runtime environment.

The Esper runtime runs on top of a JVM. You can run byte code produced by the Esper compiler using the Esper runtime.

The Esper architecture is similar to that of other programming languages that are compiled to JVM bytecode, such as Scala, Clojure and Kotlin for example. Esper EPL however is not an imperative (procedural) programming language.

Runtime - Better performance

The Esper-8 compiler produces byte code that has better performance executing queries in the runtime. The compiler produces byte code for select-clauses, result-set-building, all expressions and event properties and much more. This removes virtual calls as interface calls are replaced by just byte code. The Esper compiler also removes certain down casting and branching that would otherwise need to be done.

Since the Esper-8 compiler produces byte code this gives the JVM a chance to use JIT (just-in-time) to produce native machine code, remove null checks, inline methods, predict branches etc..

Runtime - Uses less heap memory

For aggregations the Esper-8 compiler produces a custom aggregation row class that has fields which represent the aggregation state. Therefore each aggregation row does not have additional objects to represent things like averages or sums and instead things like average and sum in Esper-8 are simply a bunch of fields on the same object. Since aggregations are often used with group-by and since the runtime may be tracking a lot of rows, it is nice that the number of objects per row is always one for Esper-8. For Esper-7 the number of objects per row would be one object per row and one object for each value in the row.

When the Esper-8 compiler produces byte code it discards any intermediary representation of the EPL. The compiler discards all things like the abstract syntax tree (AST) that resulted from parsing EPL. It discards all the expression trees which represent the different expressions. It discards any statement specification objects and the query plan. All this intermediary information now results in byte code.

The JVM loads the byte code that the EPL compiler produced into a separate "classes" space and does not use heap. In Esper-8 there is only a small amount of EPL metadata that uses heap. JVM byte code is compact and space-efficient.

Runtime - Removes runtime library dependencies

The Esper-8 runtime only needs the SLF4J logging framework and has no other library dependencies. Keeping the runtime small, clean and embeddable is the advantage.

Runtime - Recovery performance

The Esper-8 compiler produces byte code that can be saved to a jar file. When recovering a system the byte code is already available and ready for loading into a runtime. This cuts recovery time dramatically because it eliminates the compiler steps of parsing, walking, validating and query planning. On recovery the Esper-8 runtime can simply load existing classes and initialize. This works well together with EsperHA which also loads only metadata at recovery time and since EsperHA loads state lazily when needed after a recovery.

Compiler performance

The Esper-8 compiler is a stateless service that can compile in parallel and with multiple threads. The Esper-7 architecture did not allow parallel operations in respect to compiling. The Esper-8 compiler can compile any number of EPL modules in parallel (as long as they are not interdependent).

Separation of concerns

The Esper-8 compiler can compile EPL in a process or system that is separate from the runtime thus eliminating the possibility of impacting a running system.
Before Esper-8 there was no way to fully validate EPL without deploying. With Esper-8 the validation/compilation and deployment are well separated.

Artifact management

The Esper-8 compiler produces byte code and jar files. This enables managing and deploying jar files using build tools and repositories.

Tools - Improved use with JVM profiler tools

As JVM profiler tools can instrument byte code the profiler tool reporting is more relevant.

Multi-tenancy related functionality

The compiler architecture provides a clean strategy for name management and managing dependencies between EPL modules as compiling and deploying are steps that can be separated in time and on different systems. The Esper-8 compiler and runtime allow modules to have their own namespace thus preventing name conflicts. With Esper-8 there is now explicit management of inter-module dependencies on EPL-objects such as event types, named window and others.

2019-06-28T17:18:12+00:00Esper-8|

On Lateness, Event Time vs. Processing Time and Watermarks, Order-Of-Arrival, Retraction

EPL is powerful, but unfortunately this also means that when you design EPL you need to know how time passes and what event ordering is available or not. You can run Esper in an environment that doesn't pass time and that has completely disordered events.

There are many facilities that EPL offers that do not at all depend on event order or time passing. Some examples are...

  • A join select * from OrderEvent#unique(orderId) as ord, AccountEvent#unique(accountId) as act where ord.accountId = act.accountId  joins the last order event per order id with the last account event per account id
  • An aggregation select count(*), ip_address from PortScanEvent group by ip_address counts the port scans per ip address
  • A complex filter create context DynamicFilterContext initiated by CriteriaEvent as criteria and context DynamicFilterContext select * from News(language IN (context.criteria.languageIds) AND companies.anyOf(x=> x.company.id IN (context.criteria.companyIds)) allows you to dynamically filter news based on a criteria event (this sort of thing makes use of Esper filter indexes to fast-match)

Many facilities that EPL offers however do depend on time passing and do depend on event ordering. For example...

  • A pattern timer:interval(1 minute) and not SensorEvent alerts when an interval of one minute passes during which Esper receives no sensor event
  • A pattern match-recognize ( pattern (A B) ) matches when there is an event A that is immediately followed by an event B

When you run Esper, you need to look at your requirements and decide for each requirement whether you need to wait for late arriving events, and for how long to wait, and what to do for events that arrive later than that. You need to determine if and how you want to advance time. You need to determine whether to reorder events. Esper has a time-order window that can help with all that however you still need to define these parameters.

The pattern timer:interval(1 minute) and not SensorEvent detects the absence of the event with near-zero latency by itself. The actual latency depends on how time moves forward. For example your design may use a watermark and move its time based on allowed lateness. The 1 minute interval can span any amount of actual natural time, and so can the latency of detection when using watermarks.

The pattern timer:interval(1 minute) and not SensorEvent outputs a result after 1 minute without a sensor event. Assume that at a later time we do discover that a sensor event has come in. This would mean the output needs to be retracted. Retraction is not something that Esper handles currently.

Architecture Diagram

This diagram organizes the various components that are needed to fulfill event and time analysis requirements. The diagram displays the components in levels. The positioning of a component helps describe the role that the component plays in the logical architecture. For example, Filter Service is a component that is used by the EPL pattern engine, and in turn relies on Expression Evaluation and the Type System.

Like most every diagram, don't take the diagram literally. It's merely meant as a kind of overview.

2019-06-28T17:18:39+00:00Architecture|

On Windows

This post explores what the term Window means.

Wikipedia under the topic of Data Stream Management System lists a nice definition of Window:

Instead of using synopses to compress the characteristics of the whole data streams, window techniques only look on a portion of the data. This approach is motivated by the idea that only the most recent data are relevant. Therefore, a window continuously cuts out a part of the data stream, e.g. the last ten data stream elements, and only considers these elements during the processing. There are different kinds of such windows like sliding windows that are similar to FIFO lists or tumbling windows that cut out disjoint parts. Furthermore, the windows can also be differentiated into element-based windows, e.g., to consider the last ten elements, or time-based windows, e.g., to consider the last ten seconds of data.

In Esper we have...

  • Data is the events that are arriving into Esper
  • A window that considers the last 10 elements is #length(10), aka. length window
  • A window that considers the last 10 seconds of data is #time(10), aka. time window.

The smallest unit of change to a window is an individual event. A new event goes inside the window. The old event escapes the window. And thus events come and go. When such a change happens Esper determines if this change is meaningful. It does that by incrementally updating aggregations and match-recognize patterns each time an event comes and goes. When for example a query compares an aggregation against a threshold value it indicates this meaningful change to the application.

---> Esper evaluates windows continuously and incrementally on the level of individual events.

I have looked up Apache Flink which has a write-up on windows among its documentation in Application Development->Streaming->Operators->Windows. In Flink windows are at the heart of processing infinite streams. Windows split the stream into “buckets” of finite size, over which we can apply computations.. Flink forms a window and only when the window is completely formed does it apply a computation and then form a new window, unless I'm mistaken. This sounds a lot like batch processing to me. But batch != window.

In Esper a window can be an arbitrary subset of events. Here are two examples.

  • #length(10)#time(10) considers the last 10 elements that are not older than 10 seconds
  • create window MyWindow#keepall with on-merge, to insert and remove events according to any criteria

Other systems seem to form a window only from the data that arrives next to each other. It seems impossible in other systems to form a window across arbitrary data. They seem to require inserting into a table of some kind as I understand.

The term window in Esper means subset of events and in some systems means batch-delineation.

Go to Top