ESexp

ESexp is the s-expression evaluator used by Evolution. It is used by Evolution/Camel.Search for evaluating search and filtering expressions, and in other parts of Evolution for similar tasks.

There are two operations ESexp provides, Parsing converts the s-expression into a syntax tree, and Evaluation which executes the expression.

Parsing

Parsing is entirely implemented in ESexp and internally implemented using a GScanner, although it could easily be implemented using a much simpler tokeniser.

The parsing process converts a text string into a syntax tree, represented by a tree of ESexpTerm values.

 enum _ESExpTermType {
        ESEXP_TERM_INT  = 0,    /* integer literal */
        ESEXP_TERM_BOOL,        /* boolean literal */
        ESEXP_TERM_STRING,      /* string literal */
        ESEXP_TERM_TIME,        /* time_t literal (number of seconds past the epoch) */
        ESEXP_TERM_FUNC,        /* normal function, arguments are evaluated before calling */
        ESEXP_TERM_IFUNC,       /* immediate function, raw terms are arguments */
        ESEXP_TERM_VAR,         /* variable reference */
 };
 
 struct _ESExpSymbol {
        int type;               /* ESEXP_TERM_FUNC or ESEXP_TERM_VAR */
        char *name;
        void *data;
        union {
                ESExpFunc *func;
                ESExpIFunc *ifunc;
        } f;
 };
 
 struct _ESExpTerm {
        enum _ESExpTermType type;
        union {
                char *string;
                int number;
                int bool;
                time_t time;
                struct {
                        struct _ESExpSymbol *sym;
                        struct _ESExpTerm **terms;
                        int termcount;
                } func;
                struct _ESExpSymbol *var;
        } value;
 };

This structure is all pretty straightforward. The parser performs syntax checking and will fail if syntax is incorrect, or an unknown function is invoked.

Memory management is handled internally for all expressions.

Syntax

The s-expression syntax is very simple. The grammar is basically:

 expression = '(' command argument * ')'
 
 command = IDENTIFIER
 
 argument = expression | string | integer | boolean | symbol
 
 string = '"' ( CHAR | '\' '"' | '\' "'" | '\' '\' ) * '"'
 
 integer = '-' [0-9] * | [0-9]*
 
 boolean = '#t' | '#f'
 
 symbol = IDENTIFIER

Where IDENTIFIER is similar to a C identifier (see the source for exact specification). symbol isn't properly implemented, but is intended to be a variable interface.

commands include those defined by the base class, and ones that the application adds.

Evaluation

Evaluation then walks the syntax tree, invoking all of the called functions. Although there are some built-in functions for basic control flow and arithmetic, they may all be overriden by the application for different types (even the control-flow ones).

Results

All functions return a single result type. In addition to the basic integer, string, and boolean types, there is also support for a time_t type, and an array type.

 enum _ESExpResultType {
        ESEXP_RES_ARRAY_PTR=0,
        ESEXP_RES_INT,
        ESEXP_RES_STRING,
        ESEXP_RES_BOOL,
        ESEXP_RES_TIME,
        ESEXP_RES_UNDEFINED
 };
 
 struct _ESExpResult {
        enum _ESExpResultType type;
        union {
                GPtrArray *ptrarray;
                int number;
                char *string;
                int bool;
                time_t time;
        } value;
 };

Results need to be allocated and freed using the appropriate api functions; they will all be automatically freed if any are leaked, although they should be freed when evaluating intermedate results.

An implementation will normally have to do special processing for array types - although some basic set operations on const string arrays are available in the base class. Note in particular that array types will not have their content freed by ESexp.

Function types

Functions are either immediate or not. Immediate functions have their arguments evaluated to simple types (i.e. functions are executed), whereas non-immediate functions do not. This allows non-immediate functions to implement short-cut evaluation and control flow functions, including application-supplied functions.

All functions return a single value and take a variable number of arguments; application control performs and type conversion or checking itself.

 typedef struct _ESExpResult *(ESExpFunc)(struct _ESExp *sexp, int argc,
                                         struct _ESExpResult **argv,
                                         void *data);
 
 typedef struct _ESExpResult *(ESExpIFunc)(struct _ESExp *sexp, int argc,
                                          struct _ESExpTerm **argv,
                                          void *data);
 
 void e_sexp_add_function(ESExp *f, int scope, char *name, ESExpFunc *func, void *data);
 void e_sexp_add_ifunction(ESExp *f, int scope, char *name, ESExpIFunc *func, void *data);

As can be seen above, immediate functions are given ESexpTerms as their arguments, this allows them to perform their own evaluation of the arguments, perhaps optionally. A non-immediate function only gets given ESexpResults which have all already been evaluated.

Builtin functions

ESexp includes some basic functions which operate on the supplied types where appropriate.

Set functions

The set functions work on both arrays of const strings and scalar values.

'(' 'and' expression* ')' '(' 'or' expression* ')'

Their result depends on the context they are used in. If the first argument is an array type, then all other arguments must be array types, and an array is the result.

If the first argument is not an array type, then all arguments must be boolean types. They both perform short-cut evaluation in the scalar context, that is, and will stop and return #f as soon as any argument is #f, and or will return #t as soon as any argument is #t.

Comparision functions

The comparison functions return boolean values and allow functions to perform decision making.

'(' '<' expression expression ')' '(' '>' expression expression ')' '(' '=' expression expression ')'

Note that these only binary operations, although they could be extended to any number of arguments.

In all cases each expression must evaluate to the same type. If evaluated on strings, then a case-insensitive strcmp is called on the two strings.

In addition there is the unary not function, which only works in a scalar context, it only works on it's first argument, and is an immediate function.

'(' 'not' expression *1 ')'

Arithmetic functions

Arithmetic functions are also provided to work on basic scalar types. They are all immediate.

'(' '+' expression * ')' '(' '-' expression * ')'

These perform the expected job on integral types. + performs string concatenation on string types.

There are also two related functions for casting to different types.

'(' 'cast-string' expression ')' '(' 'cast-int' expression ')'

Cast integer, boolean or string types (only) to string or integer types respectively.

Control flow functions

Control flow functions let the expression take different courses or are used to concatenate multiple functions.

'(' 'if' expression expression expression ? ')'

In this case, if the first expression returns a boolean value of #t, then the second expression will be evaluated. If not, and it is supplied, then the third expression will be evaluated. The result will be the result of the expression evaluated, or undefined if no else clause was supplied.

'(' 'begin' expression * ')'

begin will evaluate multiple expressions in turn. The result will be the result of the last expression evaluated.

Although there are no loop functions provided, it would be very simple to implement one if it was required - however the code is intended to be used for expression evaluation, not a general purpose language.

Examples

This will just cover the basic functions and syntax, more advanced examples should be provided for a given implementation. Without functions for external input these examples are rather contrived.

Example: simple arithmetic

 (+
   40
   (-
     30
     12
     11)
   7)

This will return an integer type, with a value of:

 30 - 12 - 11 = 7
 40 + 7 + 7 = 54

Example: control flow

 (begin
  (if (< 10 (+ 5 5))
      "smaller"
      (begin (+ 5 5) "bigger"))
  (+ "a" "b"))

This will return a string "ab", and the internal if will only evaluate the string "smaller".

Notes

The disksummary-branch of ESexp contains some new enhancements which lets application re-use expressions, it separates the expression parsing and evaluating stages. Application code can parse expressions and save the syntax tree, then evaluate that separately any number of times.

Apps/Evolution/EDS.ESexp (last edited 2013-08-08 22:50:04 by WilliamJonMcCann)