Read time: 1.9 minutes (191 words)

Parsing Expressions

So fat, the rules we have set up just define basic tokens. We have not created any rules the control how those tokens can be arranged. Parsing mathematical expressions is going to cure that!

Here is a rule set that defines an expression

scadparser/ebnf/scad.ebnf
expression
	=
	| addition
	| subtraction
	| term
	;

addition
	=
	left:expression op:'+' ~ right:term
	;

subtraction
	=
	left:expression op:'-' ~ right:term
	;

term
	=
	| multiplication
	| division
	| factor
	;

multiplication
	=
	left:term op:'*' ~ right:factor
	;

division
	=
	left:term op:'/' ~ right:factor
	;

factor
	=
	| '(' ~ expression ')'
	| number
	| identifier
    ;

This set of rules shows some new features of TatSu. The left:expression notation will cause TatSu to generate something called an Abstract Syntax Tree which the parser is building as it parses. Basically, this AST details the rules followed to either accept or reject the input chink of code we process.

There is one more rule in this set, one that is important:

scadparser/ebnf/scad.ebnf
number
    =
    | float:real
    | int:integer
    ;

The order of these options is important. We need to try tp process a real number first, so parsing captures the decimal point. If we tried the integer option first, the leading 1 would be accepted and the rest would not be recognized as a valid construct.

For our expression testing, all we need to know is that TatSu is going to produce a Python dictionary showing the expression structure. Standard things like operator precedence are handled by these rules.

tests/test_expressions.py
 1import pytest
 2import tatsu
 3
 4
 5@pytest.mark.parametrize('t, e', [
 6    ('1+2', "{'left': {'int': '1'}, 'op': '+', 'right': {'int': '2'}}"),
 7    ('1*2', "{'left': {'int': '1'}, 'op': '*', 'right': {'int': '2'}}"),
 8    ('1.0+2*3', "{'left': {'float': '1.0'}, 'op': '+', 'right': {'left': {'int': '2'}, 'op': '*', 'right': {'int': '3'}}}")
 9])
10def test_expressions(scadparser, t, e):
11    ast = scadparser.parse(t, start='expression')
12    assert str(ast) == e
13

This test is complicated by the need to specify the Python dictionary the parser will produce.