Embedded Scripting Languages
Table of Contents
- 1. Embedded Scripting Languages
- 1.1. Overview
- 1.2. Embedded Scripting Languages Selection
- 1.3. MuParser - Math expression parser
- 1.4. Exprtk - Math expression parser
- 1.5. Lua scripting engine
- 1.6. Squirrel Scripting Language
- 1.7. Duktape - Embeddable Javascript Engine
- 1.8. QuickJS - ES20 Javascript Engine
- 1.9. Chaiscript
- 1.10. Python Engine via Pybind11
- 1.11. Scheme-like lisp interpreter
- 1.12. Expression interpreter
- 1.13. Operator Precedence Climbing Algorithm
1 Embedded Scripting Languages
1.1 Overview
Embedded scripting languages, sometimes called extension languages, are programming languages available as self-contained software libraries and designed to be embedded in a host programming language, often a compiled statically typed language. Embedded scripting languages allow adding new features, modifying the behavior of a program at runtime without recompilation and source code modification. Since embedded scripting languages are available as libraries, the host language can inspect variables defined by a scripting language program and call functions defined in the scripting language for possibly allowing end-users to customize and configure the application. Functions, also called subroutines or procedure, or performance-intensive functions, that would not be suitable to implement in a scripting language, registered by the host language can also be called by the embedded scripting language.
Embedded scripting is useful for a wide range of user-case, including, video games, interactivity, repl, configuration languages, sandboxing and DSL - domain-specific programming languages. In video game programming, languages such as Lua, Moonscript or GDScript (Godot game engine) allow non-programmers such as game designers or game modders to configure, extend and add new functionality to games written in C++ at runtime without any recompilation and dealing with low level programming. These programming languages can also be useful for adding interactivity, faster iteration cycle and feedback. An example of this case is the beloved text editor Emacs that uses Elisp as its embedded scripting language. End-users can inspect, modify and customize Emacs behavior at runtime with instantaneous feedback by just evaluating Elisp functions or modifying variables at the scratch buffer or Elisp interactive REPL, the IELM repl. Another case where embedded scripting can be found is web browsers, that are usually written in C++ and use JavaScript, also known by the official name of ECMascript, and html (non-turing complete) as its embedded scripting languages. Html defines how the user interface look like and JavaScript modifies the browser behavior and the html interface at runtime allowing curious users to evaluate JavaScript, in the development tools console for inspecting and interacting with html DOM (Domain Object Model) objects with instantaneous feedback and synchronization between a modified DOM element and its graphical representation.
Common API Functionalities
Some common API functionalities of a embedded scripting language API are evaluation of source code provided as string, file or input stream; ability to register functions of the host language and call them from the scripting language and possiblity insptecting variables defined in the scripting language. Another common feature is the ability to call functions defined in the scripting language from the host language. The following code block provide some examples about theses features.
Engine engine; // Ability to evaluate source code at runtime value* result = engine.eval("10 * 3 - 5 / 100 * x3 + 2.51 * if(x > 0, 1/x, 0/0) ;") if( result == nullptr){ printf("There was an error => shutdown the application."); exit(-1); } printf(" [INFO] Result = %f ", result->to_float()); // Interactive Python-like REPL for(;;){ std::string user_input = getline_from_console(): if(user_input == "quit" || user_input == "exit"){ break; } result = engine.eval(line); assert ( result == nullptr ); printf(" RESULT>> %s ", result->to_string().to_cstr()); } // ---- Ability to register functions of the host language -------// value* udf_user_defined_function(size_t size, value* args) { // .... ... do computations here... return result; } engine.add_function("user_defined_function, "&udf_user_defined_function); result = egine.eval(r"( x = 10.2512; y = user_defined_function('hello', x); z = x + y; return z; )"; // ------- Ability to call functions defined in the scripting language -------// engine.eval(" def myfunc(x, y) = 3 * x + 5.1 * y; ") FuncObject* = engine.getFunction("myfunc"); assert FuncObject != nullptr; auto args = engine.make_args(); args->add_int(10); args->add_float(20.0); result = FuncObject->call(args);
Reasonable Features of Embedded Scripting Languages
This section lsits some reasonable features that a language designed to be embedded should have:
- Ligthweight - Small footprint, small size and memory requirements.
- Scripting Engine as a library:
- Must be available as C or C++ library which exposes the interpreter API allowing the client code to evaluate scripting code from strings or files and also to retrieve objects from the virtual machine memory.
- Sandboxing or capability limitations
- A reasonable requirement for an embedded scripting language is limiting the capabilities and APIs which can be used possibly allowing the execution of non-trusted scripts. Example: the Javascript engines of most web browsers do not allow the script to interact with file systems or operating system APIs.
- Permissive License for static linking
- Use Cases
- DSL - Domain Specific Languages
- Game Engines
- Configuration files => Data Description language.
- User content
- Allow application runtime changes without recompilation.
- User extension without modification of source code.
Examples of embedded scripting languages usage and use-cases
- TCL - Tool Command Language => Used by many EDA - Electronic Design Automation Sofware in electronic engineering.
- JavaScript => Used in Web Browsers, which are written mostly in C++, for controlling user interaction, animation and etc.
- TinyScheme => Used by GNU GIMP drawing application ans Apple's MacOSX sandbox configuration.
- Lua language => Used by many game engines and also by applications such as: Nginx web server; Linux Conky; Geany Editor; NMap editor and so on.
- Squirrel Language => Used in game engines
- Python
- => Python is used as embedded scripting language by GDB - GNU Debugger ; WinDBG - Windows Debugger; IDA - Debugger for Reverse Engineering.
- Disadvantages: Python has a large footprint; it was not designed as an embedded scripting language and it is not possible to forbid the interpreter from calling file system and process creation APIs.
- AutoLisp [proprietary] => Lisp-like language used in Autocad.
- SQL (Structured Query Language) => Many databases are implemented in C or C++ uses SQL as scripting language and domain specific language (DSL) for querying and storing data. Example: SQLite, Postgres SQL, …
- Emacs Lisp => Emacs core, including the LISP engine/interpreter, is written in C and other parts are written in Emacs Lisp. This lisp dialect allows extending Emacs at with custom extensions (plugins) and also modifying the application behavior at runtime.
- VBA - Visual Basic for Application [proprietary] => Used in Microsft Office Suite, specially in Microsft Excel.
Considerations for choosing embedded scripting languages
- Small footprint and small overhead
- Stability of C or C++ API
- C++ API bindings
- Permissive license for static linking or option of dual license for static linking.
- Documentation and of C or C++ binding APIs, examples about binding code
- Garbage collector implementation
- Heavy duty computations with significant overhead should be performed on the C++-side, specially loops.
- JIT - Just-In-Time compiler => can increase the performance by
translating bytecodes into machine code.
- Example: Lua JIT, Javascript V8 engine used by Chrome browser.
- Features for running untrusted scripts or configuration:
- Ability for restricting capabilities such as creating process, accessing the file system and so on.
- Non-turing complete => better for configuration
- Familiarity of the targe audience
- Lua is pervasive in Games
- TCL is pervasive on Electronic Design Automation Software
- Javascript is widely used on the Web, NodeJS and also as embedded scripting language of web browsers.
Further Reading
1.2 Embedded Scripting Languages Selection
Selection of embedded scripting languages and engines available as libraries:
Non categorized:
- TCL - Tool Command Language
- License: akin to BSD
- Implementation: C
- Syntax Type: N/A
- Note: pervasive in EDA - Electronic Design Automation software.
- See:
- mruby (Embeddable Ruby implementation)
- License: BSD
- Implementation: C
- Syntax type: Ruby
- Wren
- License: MIT
- Implementation: C
- Syntax Type: N/A
- Jinx
- License: MIT
- Implementation: C++17
- Syntax Type: N/A
- Gravity
- License: -
- Implementation: C
- Syntax type: Apple's SWIFT language
- DaScript
- License: -
- Implementation: C++14
- Syntax type: Akin to Python
Lua or similar to Lua (mimics Lua syntax)
- Lua
- License: MIT
- Implementation: C
- Syntax Type: syntax inspired by scheme.
- LuaJIT (Lua with JIT - Just-in-Time compiler which translates
bytecodes to native machine code for better performance.)
- License: MIT
- Implementation: C
- Syntax type: lua
- GameMonkey Script
- License: MIT
- Implementation: C++
- Syntax type: Akin to Lua
- Squirrel
- License: MIT
- Implementation: C++
- Syntax type: Lua-like
- Note: Despite be implemented in C++, does not expose a C++ API, it exposes a C API.
- Squilu (Squirrel fork)
- License: MIT
- Implementation: C++
- Syntax type: Lua-like
Similar to C++ (syntax that mimics C++)
- AngelScript (Note: statically typed)
- License: Zlib
- Implementation: C++
- Syntax type: C++-like
Smiliar to Javascript or subset of Javascript (ECMAScript)
- SpiderMonkey - Java Script Engine (VM - Virtual Machine)
used by Mozzila Firefox.
- License: MPL 2.0
- Implementation: C and C++
- Syntax tyope: JavScript (ECMAscript)
- Used by:
- Firefox Web Browser
- MongoDB Document Database
- CoucheDB
- Adobe Acrobat Reader
- See:
- Repository:
- Duktape
- License: MIT
- Implementation: C
- Syntax type: Javascript (aka ECMAScript) E5/E5.1
- Espruino
- License: MPL 2
- Implementation: C
- Syntax type: ES5 - Javascript (ECMaScript)
- JerryScript
- License: Apache v2
- Implementation: C
- Syntax type: Javascript (ECMAScript)
- ChaiScript
- License: BSD
- Implementation: C++
- Syntax type: Javascript-like
Similar to Lisp or Scheme
- ECL - Embeddable Common Lisp
- License: LGPL - 2
- Implementation: C
- Syntax type: Lisp, Common Lisp
- TinyScheme
- License: BSD
- Implementation: C
- Syntax tyope: scheme, lisp
- Note: Used in GNU GIMP as scripting language and Apple MacOSX's sandbox as configuration language.
- S7 Scheme (Variant of TinyScheme)
- License: BSD
- Implementation: C
- Syntax type: C
- Chibi Scheme
- Implementation: C
- Syntax type: Scheme, Lisp
- Janet Language
- License: MIT
- Implementation: C
- Syntax type: Clojure, Lisp
- ArkScript
- License: MPL license
- Implementation: C
- Syntax type: Lisp-like, more clojure-like
License obligations and requirements:
- LGPL allows dynamically linking of closed source applications, but static linking requires source code disclosure and release under the same license. Some LGPL libraries, such as QT, allows static linking via commercial license.
- MIT, BSD, APACHE and so on => Add a copy of the license; Give credit.
1.3 MuParser - Math expression parser
MuParser is a non-turing complete embedded scripting engine for evaluating math expressions.
Web Site:
Repository:
Conan Reference:
Sample Project
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.9) project(cppexperiments) set(CMAKE_CXX_STANDARD 17) set(CMAKE_VERBOSE_MAKEFILE ON) # ============= Conan Bootstrap =============================# # Download automatically, you can also just copy the conan.cmake file if(NOT EXISTS "${CMAKE_BINARY_DIR}/conan.cmake") message(STATUS "Downloading conan.cmake from https://github.com/conan-io/cmake-conan") file(DOWNLOAD "https://github.com/conan-io/cmake-conan/raw/v0.13/conan.cmake" "${CMAKE_BINARY_DIR}/conan.cmake") endif() include(${CMAKE_BINARY_DIR}/conan.cmake) # Possible values "default" and "llvm8" set(CONAN_PROFILE default) conan_cmake_run(REQUIRES muparser/2.2.6@conan/stable BASIC_SETUP BUILD missing) #======= Targets Settings ===============+# add_executable(muparser1_formula muparser1_formula.cpp) target_link_libraries(muparser1_formula muparser)
File: muparser1_formula.cpp
#include <iostream> #include <string> #include <cmath> #include <muParser.h> // Defined for UNIX-like: Linux, Mac OSX, QNX, ... #if __unix__ #include <unistd.h> #endif bool isTTY_terminal() { #if __unix__ return ::ttyname(STDIN_FILENO) != nullptr; #else return false; #endif } double future_value(double PV, double rate, double nper) { return PV * std::pow(1 + rate / 100.0, nper); } int main() { mu::Parser p1; std::puts("\n====== EXPERIMENT 1 - Simple math expression ======="); p1.SetExpr("2^1 + 2^3 + 2^4"); std::cout << " [*] p1-A value: " << p1.GetExpr() << " = " << p1.Eval() << std::endl; p1.SetExpr("3.5 * 10 - sin(3.1415) * 2.0 + sqrt(10) / 100.0 + 2^3"); std::cout << " [*] p1-B value: " << p1.GetExpr() << " = " << p1.Eval() << std::endl; std::puts("\n====== EXPERIMENT 2 - Expression with variables ======="); p1.DefineConst("pi", 3.1415); double x = 10.0, y = 4.5; p1.DefineVar("x", &x); p1.DefineVar("y", &y); p1.SetExpr(" 3 * pi + sin(pi) + 4 * x + y - 5"); std::cout << " [*] p1-C value: " << p1.GetExpr() << " = " << p1.Eval() << std::endl; std::puts("\n====== EXPERIMENT 3 - Expression with custom function ======="); p1.DefineFun("fv", &future_value); p1.DefineFun("payoff", [](double S, double K){ return std::max(S - K, 0.0); }); p1.SetExpr("fv(100, 2, 8) + payoff(100, 90)"); std::cout << " [*] p1-D value: " << p1.GetExpr() << " = " << p1.Eval() << std::endl; std::puts("\n====== EXPERIMENT 4 - Parser error handling ======="); // When an error happens it throws an exception: mu::ParserError try{ p1.SetExpr("10.2 * 5.0 + a * 3"); double value = p1.Eval(); std::cout << " [*] p1-E value: " << value << std::endl; } catch (mu::ParserError const& ex) { std::cerr << " [ERROR] p1-C Parser error: " << ex.GetMsg() << std::endl; } std::puts("\n====== EXPERIMENT 5 - Calutor Interactive Shell ======"); if(isTTY_terminal()) { std::cout << " === Calculator Started OK. =====" << std::endl; mu::Parser p2; double ans = 0.0; p2.DefineVar("ans", &ans); std::string line; while(std::cin.good()) { std::cout << " EXPR => "; std::getline(std::cin, line); if(line == "") continue; if(line == "quit") break; p2.SetExpr(line); try { ans = p2.Eval(); std::cout << " => ans = " << ans << "\n\n"; } catch(mu::ParserError const& ex) { std::cerr << " [ERROR] Parser error " << ex.GetMsg() << std::endl; } } } #if _WIN32 std::cout << "Enter RETURN to exit. " << std::endl; std::cin.get(); #endif return 0; }
Program output:
$ ./muparser1_formula ====== EXPERIMENT 1 - Simple math expression ======= [*] p1-A value: 2^1 + 2^3 + 2^4 = 26 [*] p1-B value: 3.5 * 10 - sin(3.1415) * 2.0 + sqrt(10) / 100.0 + 2^3 = 43.0314 ====== EXPERIMENT 2 - Expression with variables ======= [*] p1-C value: 3 * pi + sin(pi) + 4 * x + y - 5 = 48.9246 ====== EXPERIMENT 3 - Expression with custom function ======= [*] p1-D value: fv(100, 2, 8) + payoff(100, 90) = 127.166 ====== EXPERIMENT 4 - Parser error handling ======= [ERROR] p1-C Parser error: Unexpected token "a" found at position 13. ====== EXPERIMENT 5 - Calutor Interactive Shell ====== === Calculator Started OK. ===== EXPR => 9.81 * sin(3.1415 / 2.0) + 100 * sqrt(285.6) + exp(3.65) => ans = 1738.26 EXPR => ans / 100.0 - 80.0 => ans = -62.6174 EXPR => ans * ans => ans = 3920.94 EXPR =>
1.4 Exprtk - Math expression parser
Exprtk is a MIT-license single-header, header-only and turing-complete math expression parsing engine.
Official Web Site:
Official repository:
Features:
- Header-only library in a single file.
- Turing-complete
- Sandboxed
- Supports binding to defined-functions in C++-side.
- Functions can operate on vectors
- Control structures: for-loop; if-else; ternary operator.
Drawbacks:
- The library exprtk.hpp has one megabyte (1 mb) in a single header file which significantly slows down the compile-time. Header-only design should only be used for small libraries.
Sample Project
- GIST: https://gist.github.com/2ff870b653b2ec1519bf0423165db1c5
- Note: The static library target 'mparser' makes the compile-time of the client code 'main.cpp' faster by using the PIMPL (Pointer-To-Implementation) design pattern and not exposing exprtk.hpp in the library header file.
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.9) project(exprk-parser) #========== Global Configurations =============# #----------------------------------------------# set(CMAKE_CXX_STANDARD 17) set(CMAKE_VERBOSE_MAKEFILE ON) set(CMAKE_CXX_EXTENSIONS OFF) #----------- Add dependencies --------------------------# #============= Functions and macros ===========================# macro(Download_Single_Headerlib FILE URL) file(DOWNLOAD ${URL} ${CMAKE_BINARY_DIR}/include/${FILE}) IF(NOT Download_Single_Headerlib_flag) include_directories(${CMAKE_BINARY_DIR}/include) set(Download_Single_Headerlib_flag TRUE) ENDIF() endmacro() Download_Single_Headerlib( exprtk.hpp https://github.com/ArashPartow/exprtk/raw/d81ac1a2ddd9877a7981d32c731fd9a75544ec68/exprtk.hpp ) #----------- Target settings -------------------------------# add_library( mparser mparser.cpp mparser.hpp ) add_executable( main main.cpp ) target_link_libraries( main mparser )
File: mparser.hpp (cmake target: mparser static library)
#ifndef _MPARSER_HPP_ #define _MPARSER_HPP_ #include <string> #include <memory> #include <functional> // Unique-ownership smart pointer for implementing PimPl // // Note: It was created because std::uniqe_ptr does // not support 'incomplete types'. // // Note: PimPl (Pointer-To-Implementation) idiom // is a technique for reducing compile-time and // maintaining ABI stability which mitigates the // fragile-ABI problem. // template<typename T, void (*disposer) (T*)> class pimpl_ptr { T* m_hnd; public: pimpl_ptr() : m_hnd(nullptr) //, m_disp(disp) { } pimpl_ptr(T* hnd) : m_hnd(hnd) { } ~pimpl_ptr() { this->release(); } // Disable copy constructor pimpl_ptr(pimpl_ptr const&) = delete; // Disable copy- pimpl_ptr& operator=(pimpl_ptr const&) = delete; // Move Ctor pimpl_ptr(pimpl_ptr&& rhs) { std::swap(m_hnd, rhs.m_hnd); } // Move assignment operator pimpl_ptr& operator=(pimpl_ptr&& rhs) { std::swap(m_hnd, rhs.m_hnd); } T* get() const { return m_hnd; } void release() { // Note: it is not possible to delete incomplete type // in this way: 'delete m_hnd;' disposer(m_hnd); m_hnd = nullptr; } void reset(T* hnd) { this->release(); m_hnd = hnd; } bool empty() const { return m_hnd == nullptr; } T& operator* () const { return *m_hnd; } T* operator-> () const { return m_hnd; } }; class MathEvaluator { struct impl; static void dispose(impl* p); pimpl_ptr<impl, MathEvaluator::dispose> m_pimpl; public: MathEvaluator(); ~MathEvaluator() = default; MathEvaluator& add_var(std::string name, double& ref); // Register function pointer callback or non-capture lambda MathEvaluator& add_function(std::string, double fptr (double) ); // Register function of two variables MathEvaluator& add_function(std::string, double fptr (double, double) ); double eval_code(std::string code); bool compile(std::string code); double value(); void repl(); }; #endif
File: mparser.cpp (cmake target: mparser static library)
#include "mparser.hpp" #include <iostream> #include <string> #include <exprtk.hpp> // --------------------------------------------------// struct MathEvaluator::impl { exprtk::expression<double> expr; exprtk::symbol_table<double> symbol_table; exprtk::parser<double> parser; std::string code; // Println function exprtk::rtl::io::println<double> println{}; }; // static method void MathEvaluator::dispose(impl* p) { delete p; } MathEvaluator::MathEvaluator(): m_pimpl(new impl) { m_pimpl->symbol_table.add_constants(); m_pimpl->expr.register_symbol_table(m_pimpl->symbol_table); m_pimpl->symbol_table.add_function("println", m_pimpl->println); } MathEvaluator& MathEvaluator::add_var(std::string name, double& ref) { m_pimpl->symbol_table.add_variable(name, ref); return *this; } MathEvaluator& MathEvaluator::add_function(std::string name, double fptr (double) ) { m_pimpl->symbol_table.add_function(name, fptr); return *this; } MathEvaluator& MathEvaluator::add_function(std::string name, double fptr (double, double) ) { m_pimpl->symbol_table.add_function(name, fptr); return *this; } bool MathEvaluator::compile(std::string code) { bool r = m_pimpl->parser.compile(code, m_pimpl->expr); if(!r){ std::string err = "Error: "; err = err + m_pimpl->parser.error(); std::cerr << " Error: " << err << "\n"; throw std::runtime_error(" [PARSER] Unable to parse expression."); } return r; } double MathEvaluator::value() { return m_pimpl->expr.value(); } void MathEvaluator::repl() { std::string line; double result; while( std::cin.good() ) { std::cout << " EXPRTK $ >> "; std::getline(std::cin, line); if(line.empty()) continue; if(line == "exit") return; try { this->compile(line); std::cout << this->value() << '\n'; } catch (std::runtime_error& ex) { std::cerr << "Error: " << ex.what() << '\n'; } } }
File: main.cpp (cmake target: main)
#include <iostream> #include <string> #include <cassert> #include "mparser.hpp" double myfun(double a, double b); void test_engine(MathEvaluator& engine, double& x); int main(int argc, char** argv) { std::puts(" [TRACE] I am up and running Ok. "); MathEvaluator engine; double x = 1.0, y = 2.0, z = 3.0; engine.add_var("x", x) .add_var("y", y) .add_var("z", z) .add_function("myfun", &myfun); assert(argc == 2); auto command = std::string(argv[1]); if(command == "test" ){ test_engine(engine, x); } if(command == "repl" ){ engine.repl(); } std::cerr << " [TRACE] Shutdown engine Ok. " << '\n'; return 0; } // -----------------------------------------// double myfun(double a, double b) { std::cerr << " [TRACE] a = " << a << "\n"; std::cerr << " [TRACE] b = " << b << "\n"; double r = 3.0 * a + 5.0 * b; std::cerr << " [TRACE] result = " << r << "\n"; std::cerr << "---------------------------------\n"; return r; } void test_engine(MathEvaluator& engine, double& x) { std::string code = R"( // Define local variables var a := 2.0 / exp(x) * x^2 + y; var b := 10.0 * sqrt(x) + z; // println('\n => x = ', x); // println('\n => y = ', y); // Call custom function var k := myfun(x, y); // Comment: the last expression is returned 4.0 * a + 3 * b + 10 * z + k; )"; engine.compile(code); x = 3.0; std::cout << " => x = " << x << " ; engine = " << engine.value() << "\n"; x = 5.0; std::cout << " => x = " << x << " ; engine = " << engine.value() << "\n"; x = 15.0; std::cout << " => x = " << x << " ; engine = " << engine.value() << "\n"; x = -15.0; std::cout << " => x = " << x << " ; engine = " << engine.value() << "\n"; x = 20.0; std::cout << " => x = " << x << " ; engine = " << engine.value() << "\n"; std::string code2 = R"( // Vector/array variable var xs [6] := { 2.0, 10.2, -2.50, 9.256, 100.0, 25.0 }; var ys [6] := { -2.0, 1.225, -5.56, 19.000, 125.0, 125.0 }; println(' => xs =', xs); println(' => ys = ', ys); println(' => 3 * xs + 4 * ys = ', 3 * xs + 4 * ys); println(' => sum(xs) = ', sum(ys) ); println(' => sum(ys) = ', sum(xs) ); )"; engine.compile(code2); engine.value(); }
Building
$ git clone https://gist.github.com/2ff870b653b2ec1519bf0423165db1c5 gist && cd gist
$ cmake --config Debug -H. -B_build
$ cmake --build _build --target
Running
Running 'test' subcommand.
# Redirect std::cerr or stderr to file $ _build/main test 2> out.log [TRACE] I am up and running Ok. => x = 3 ; engine = 121.546 => x = 5 ; engine = 140.43 => x = 15 ; engine = 218.19 => x = -15 ; engine = -nan => x = 20 ; engine = 251.164 => xs = 2.00000 10.20000 -2.50000 9.25600 100.00000 25.00000 => ys = -2.00000 1.22500 -5.56000 19.00000 125.00000 125.00000 => 3 * xs + 4 * ys = -2.00000 35.50000 -29.74000 103.76800 800.00000 575.00000 => sum(xs) = 262.66500 => sum(ys) = 143.95600 # View log file $ _build/main test 2> out.log [TRACE] I am up and running Ok. => x = 3 ; engine = 121.546 => x = 5 ; engine = 140.43 => x = 15 ; engine = 218.19 => x = -15 ; engine = -nan => x = 20 ; engine = 251.164 => xs = 2.00000 10.20000 -2.50000 9.25600 100.00000 25.00000 => ys = -2.00000 1.22500 -5.56000 19.00000 125.00000 125.00000 => 3 * xs + 4 * ys = -2.00000 35.50000 -29.74000 103.76800 800.00000 575.00000 => sum(xs) = 262.66500 => sum(ys) = 143.95600
Running 'repl' subcommand for interactive shell.
rlwrap _build/main repl
[TRACE] I am up and running Ok.
EXPRTK $ >> sin(pi) * 2.0 + 5.3 * cos(3/2 * pi)
-7.28665e-16
EXPRTK $ >> exp(3.1)
22.198
EXPRTK $ >> var a := exp(2.5); var b := a * 3.0 + 10.0; 3 * a + b
83.095
EXPRTK $ >>
EXPRTK $ >> println(' x = ', x)
x = 1.00000
0
EXPRTK $ >> myfun(5.1, 2.56)
[TRACE] a = 5.1
[TRACE] b = 2.56
[TRACE] result = 28.1
---------------------------------
28.1
EXPRTK $ >>
EXPRTK $ >> myfun(5.1, 2.0 * pi + 10.0)
[TRACE] a = 5.1
[TRACE] b = 16.2832
[TRACE] result = 96.7159
---------------------------------
96.7159
EXPRTK $ >>
EXPRTK $ >> exit
[TRACE] Shutdown engine Ok.
1.5 Lua scripting engine
1.5.1 Overview
Lua (moon in Portuguese) is a ligthweight multi paradigm scripting language written in C. Due to Lua be available as small footprint library, this language is widely used as embedded scripting by many applications for scripting and as configuration or data description language.
Use cases:
- Scripting for C or C++ applications
- Extension language => Add new functionality and updates without recompilation.
- Provide interactive REPL or shell to C or C++ applications.
- Program configuration (settings)
- Data description language
Some applications using Lua:
- Nmap network scanner
- MediaWiki (engine used by Wikipedia)
- Nginx Web Server
- Redis Database
- Linux Conky
- LuaTex
- NetBSD
- Cheat Engine
- Geany text editor
- Xmake building system for C, C++, Objective-C, Swift, Assembly, Golang, Rust, Dlang and Cuda
- Lots of games use lua for scripting and allow non-programmers, such as end-users and game artists to contribute to the game development, create animations, movements, finite state machines and so on.
- … …
More at:
Repository and C++ Binding Libraries
- Official Repository Mirror
- Lua Bind
- Sol2 (means 'sun' 2 in Portuguese)
1.5.2 Further Reading
Documentation
- Lua 5.1 Reference Manual - Documentation
- Lua (programming language) - Wikipedia
Lua C API
- Programming in Lua : 24.1
- lua-users wiki: Binding Code To Lua
- lua-users wiki: Simple Lua Api Example
- Programming in Lua : 24
- Exposing C functions to Lua
Lua embedded in Geany Text Editor
- https://plugins.geany.org/geanylua/geanylua-intro.html
- https://github.com/geany/geany-plugins/tree/master/geanylua/examples
- https://github.com/DGivney/geany-lua-scripts
- Collection of Lua scripts for Geany text editor
- https://github.com/geany/geany-plugins/tree/master/geanylua
Lua scripting in NMap Network Scanner
- Script Language | Nmap Network Scanning
- "The core of the Nmap Scripting Engine is an embeddable Lua interpreter. Lua is a lightweight language designed for extensibility. It offers a powerful and well-documented API for interfacing with other software such as Nmap. The second part of the Nmap Scripting Engine is the NSE Library, which connects Lua and Nmap. This layer handles issues such as initialization of the Lua interpreter, scheduling of parallel script execution, script retrieval and more. It is also the heart of the NSE network I/O framework and the exception handling mechanism. It also includes utility libraries to make scripts more powerful and convenient. The utility library modules and extensions are described in the section called 'NSE Libraries'."
- Nmap Scripting Engine Source Code / GITHUB
- Extending Nmap With Lua - DEV
Lua Scripting in Wireshark - Network Capture Application
Lua Scripting in XMake Building System
Lua scripting on NetBSD Kernel
- Lua in teh NetBSD Kernel
- Lua Kernel Scripting in NetBSD - Fast, Flexible Packet Filtering
- Towards a Lua scripted operating system
- NPF scripting with Lua by Lourival Vieira Neto
Lua scripting on Redis Database
Lua scripting on Nginx Web Server
- Lua on NGINX web server
- Pushing Nginx to its limit with Lua - CloudFlare
- Implementing a Fileserver with Nginx and Lua
Lua scripting on Unreal Engine / game engine
- Learn Lua in 15 Minutes
- Lua: The Little Language That Could
- Roberto De Ioris - Scriptiamo Unreal Engine con Lua - Codemotion Rome…
- GitHub - rdeioris/LuaMachine: Unreal Engine 4 Plugin for Lua APIs implementation
- LuaMachine by Roberto De Ioris in Code Plugins - UE4 Marketplace
- "Unreal Engine 4 Plugin for adding Lua scripting to your projects: If you want modders to customize your game/project, or you need to allow game designers to script parts of the logic, this plugin is for you; Contrary to the other Unreal Engine 4 Lua plugins, this one does not try to expose the Unreal Engine 4 api, but completely hides it exposing to the user/scripter only the features the developer decided to include (via Blueprints or C++); Currently Windows 64bit, Mac, Linux x86_64 (both Runtime and Editor), Linux AArch64, Android 32bit, Android 64bit, iOS (Runtime only) are supported."
Lua for DSL - Domain Specific Language
- lua-users wiki: Lua Data Formats
- "Lua can be used as a language to represent data, not just as a general programming language. Different languages have been devised for different types of data representation in text format."
- Writing a DSL in Lua
- A visual DSL toolkit in Lua: Past, present and future
- Choosing Lua as the data description and configuration language
- Declarative Internal DSLs in Lua / A Game-Changing Experience
- Lua Programming/Tables - Wikibooks, open books for an open world
- Lua/Tables - Wikiversity
- Lua declarative programming basics / Sudo Null IT News
- The basics and design of lua table
Videos
- CppCon 2017: Andreas Weis "Howling at the Moon: Lua for C++ Programmers"
- "C++ is a great tool for solving complex problems in a thorough way. But every once in a while, the desire for a simpler language emerges. For those parts of our code where performance is of secondary concern, but the ability to perform rapid iterations over the code is paramount, a scripting language might be a tempting choice. But integrating a second language besides C++ and managing the interaction between the two is also scary. Lua is a lightweight, dynamic language that was designed to be used as an embedded language within existing applications. It is easy to learn, has very reasonable runtime performance, and a memory footprint small enough that it is usable even on embedded systems. Furthermore, it is almost trivial to integrate with C++. This talk will give a brief introduction to the Lua scripting language, highlighting specifically how it can complement C++'s language features to enrich a developer's toolbox. In the second part of the talk, we will look at Lua's C API and give suggestions how to integrate it with a modern C++17 codebase. In particular we will focus on how to interface with the dynamic language Lua without compromising the benefits of C++'s strong type system."
- CppCon 2018: JeanHeyd Meneide “Scripting at the Speed of Thought: Lua and C++ with sol3”
- "A big part of accelerating development and promoting collaboration often translates to deferring a lot of the typical programmer work to a scripting language, to allow for those with more design-oriented ideas and experience to handle some of the workload. What happens, then, when you have to bind a scripting language like Lua into C++ to allow for this workflow? This session is going to be all about how you enable non-developers and developers alike to rapidly increase their development productivity by turning routines and algorithms into data that can be shipped alongside your product in Lua. We will talk primarily how you can use the library sol2 to abstract away the muck of working with the Lua C API and instead focus on using and describing both Lua and C++ with each other in a simple manner. We will demonstrate some of the more interesting properties of sol such as Overloading Support, Container Support, Usertypes – C++ classes made available with automatic support for unique/shared pointers – and Tables. By the time this session is over, you will have a firm grasp over what a good Lua Binding can offer you, and how you can accelerate your C++ development with it."
General
1.5.3 Example project with Sol2 C++ binding library
This sample project builds a C++ statically linked executable embedding the Lua scripting engine using the Sol2 binding library, which is header only. Neither Sol2 nor Lua libraries need to be installed before building this sample project as the CMake scripts take care of downloading and building all dependencies.
Lua repostiory mirror:
Sol2 library repository:
Sol2 library documentation:
Sample Project
GIST:
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.0) project(duktape-cc-trial) set(CMAKE_CXX_STANDARD 17) set(CMAKE_VERBOSE_MAKEFILE ON) include(lua-lib.cmake) #----- Target Definitions ----------------------------# add_executable( embed-lua-sol embed-lua-sol.cpp) target_link_libraries( embed-lua-sol lua::lualib ) # Lua REPL executable built from static library liblua.a (Linux) # Note: the main() function is in the file main.c in the lua sources directory add_executable( lua-repl $<TARGET_OBJECTS:lua::lualib> ) target_link_libraries( lua-repl m pthread )
File: lua-lib.cmake
- CMake Script for downloading sol2 binding library and lua library sources.
include(FetchContent) # Note: the 'add_subriectory' line was commented becuyase # library that will be downloaded does not have # a CMakeListst.txt file at the root directory. macro(Download_Library_Git NAME TAG REPOSITORY_URL) FetchContent_Declare( ${NAME} GIT_REPOSITORY ${REPOSITORY_URL} GIT_TAG ${TAG} ) FetchContent_GetProperties(${NAME}) if(NOT cpputest_POPULATED) FetchContent_Populate(${NAME}) message("${NAME}_SOURCE_DIR} = ${${NAME}_SOURCE_DIR}") # => Disable following line: the library does not have a CMakeLists.txt # at the root directory. # add_subdirectory(${${NAME}_SOURCE_DIR} ${${NAME}_BINARY_DIR}) endif() endmacro() # ====>> Download Lua library <<==========================# Download_Library_Git( lua v5.3.5 https://github.com/lua/lua ) file(GLOB_RECURSE lua_sources "${lua_SOURCE_DIR}/*.c") file(GLOB_RECURSE lua_headers" ${lua_SOURCE_DIR}/*.h") message( [TRACE] " lua_SOURCE_DIR = ${lua_SOURCE_DIR} ") add_library( lua STATIC ${lua_sources} ${lua_headers} ) target_include_directories( lua PUBLIC ${lua_SOURCE_DIR} ) add_library( lua::lualib ALIAS lua) # ====>> Download Sol C++ binding library <<====================# FetchContent_Declare( sol2 GIT_REPOSITORY https://github.com/ThePhD/sol2 GIT_TAG v3.2.0 ) FetchContent_MakeAvailable( sol2 ) include_directories( ${sol2_SOURCE_DIR}/include )
File: xmake.lua
- Building script for XMake building system, which uses lua as embedded scripting language and as a building system DSL (Domain Specific Language) .
- Xmake packages: sol2, lua, luajit
add_rules("mode.debug", "mode.release") includes_lua = false add_requires("sol2 v3.2.1") target("embed-lua-sol") set_kind("binary") set_languages("c++17") add_files("./embed-lua-sol.cpp") add_packages("sol2")
File: embed-lua-sol.cpp
#include <iostream> #include <string> #include <vector> #include <algorithm> #include <sol/sol.hpp> class Counter { private: std::string m_name; int m_counter; public: // Ctor [1] => Default ctor Counter(): Counter("untitled", 0) { } // Ctor [2] Counter(std::string name, int counter) : m_name{std::move(name)}, m_counter{counter} { std::cout << " [TRACE] Counter created with => { " << " name = " << m_name << " ; counter = " << m_counter << " } \n"; } int getCounter() const { return m_counter; } void setCounter(int n) { m_counter = n; std::cout << " [TRACE] I was set to value " << n << std::endl; } void increment() { m_counter++; std::cout << " [TRACE] increment event =>> counter = { " << m_name << " ; " << m_counter << " } " << std::endl; } }; double add_fun(double a, double b) { std::cout << " [TRACE] addfun() => a = " << a << " ; b = " << b << std::endl; return a + b; } void sol_eval(sol::state& ctx, std::string code) { try { ctx.script( std::move(code) ); } catch ( sol::error const& ex) { std::cerr << " [SOL ERROR] " << ex.what() << "\n"; } } int main() { // Create an instance of lua Engine (aka virtual Machine) sol::state ctx{}; // Load basic libraries (such as print) ctx.open_libraries(sol::lib::base, sol::lib::coroutine, sol::lib::string, sol::lib::io); // Register function pointer ctx.set_function("add_fun", &add_fun); // Register lambda object ctx.set_function("make_vector", [](int n ) -> std::vector<int> { auto vec = std::vector<int>(n); for(int i = 0; i < n; i++ ) vec[i]= i * 3; return vec; }); // Set variables in the Lua engine ctx["points"] = 2000; ctx.set("character", "<Marcus Tulius Cicero>"); /* ===========>>> E X P E R I M E N T / 1 <<=========*/ std::puts("\n [EXPERIMENT 1] ==>> Evaluating string as code "); { // ===>>> Eval code as string <<=== // Throws exception sol::error ctx.script(R"( print(" [LUA] Hello world lua "); x = add_fun(10, 20); print("\n [LUA] result = " .. x); v = make_vector(5); print("\n [LUA] Printing a vector "); for i = 1, 5 do print(" -> v[" .. i .. " ] = " .. v[i] ); end print(" [LUA] VAR points = " .. points ); print(" [LUA] VAR character = " .. character ); )"); } /* ===========>>> E X P E R I M E N T / 2 <<=========*/ std::puts("\n\n [EXPERIMENT 2] ==>> Reading configuration "); { ctx.script(R"( -- Simulation of user configuration from script asset_path = "C:\\\\Users\\myuser\\data\\files\\"; user_credits = 2000; width = 200.561; )"); auto asset_path = ctx.get<std::string>("asset_path"); int user_credits = ctx["user_credits"]; std::cout << " [*] => asset_path = " << asset_path << "\n"; std::cout << " [*] => user_credits = " << user_credits << "\n"; } /* ===========>>> E X P E R I M E N T / 3 <<=========*/ std::puts("\n\n [EXPERIMENT 3] ==>> Register C++ classes "); struct StatefulFunctor { int state = 0; StatefulFunctor(int state): state(state){ } int operator()(){ std::cout << " *=>> [StatefulFunctor] My new state is = " << this->state << "\n"; return state++; } }; auto stateful = StatefulFunctor(10); ctx.set_function("stateful", stateful); ctx.script(R"( stateful(); stateful(); stateful(); )"); ctx.new_usertype<Counter>( // Class name "Counter" // --- Register methods ------ // ,"getCounter", &Counter::getCounter ,"setCounter", &Counter::setCounter ,"increment", &Counter::increment // --- Register properties ---- // , "value", sol::property( &Counter::getCounter , &Counter::setCounter) ); sol_eval(ctx, R"( print("\n ----->>> Calling C++ classes from Lua <----- "); -- Create new instance (object) of C++ class Counter counter = Counter.new(); counter:increment(); counter:increment(); counter:increment(); x = counter:getCounter(); print(" [*] value of counter is equal to = " .. x); counter.value = 2000; print(" [*] Counter value is equal to = " .. counter.value ); )"); Counter* ptr = ctx.get<Counter*>("counter"); std::cout << " [FROM C++] counter value = " << ptr->getCounter() << "\n"; return 0; }
Building => Debug build:
$ git clone https://gist.github.com/17a37d905d3d71c0ae66661a189481b5 lua-sol && cd lua-sol
$ cmake --config Debug -H. -B_build
$ cmake --build _build --target
Building => Release build:
$ git clone https://gist.github.com/17a37d905d3d71c0ae66661a189481b5 lua-sol && cd lua-sol
$ cmake --config Release -H. -B_build
$ cmake --build _build --target
Check executable:
# Confirm whether the executable is statically linked against LuaLib $ ldd _build/embed-lua-sol linux-vdso.so.1 (0x00007ffdf0fc4000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe4490e6000) libm.so.6 => /lib64/libm.so.6 (0x00007fe448fa0000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe448f85000) libc.so.6 => /lib64/libc.so.6 (0x00007fe448dbb000) /lib64/ld-linux-x86-64.so.2 (0x00007fe4492fa000) # Static library $ file _build/liblua.a _build/liblua.a: current ar archive
Program output:
$ _build/embed-lua-sol [EXPERIMENT 1] ==>> Evaluating string as code [LUA] Hello world lua [TRACE] addfun() => a = 10 ; b = 20 [LUA] result = 30.0 [LUA] Printing a vector -> v[1 ] = 0 -> v[2 ] = 3 -> v[3 ] = 6 -> v[4 ] = 9 -> v[5 ] = 12 [LUA] VAR points = 2000 [LUA] VAR character = <Marcus Tulius Cicero> [EXPERIMENT 2] ==>> Reading configuration [*] => asset_path = C:\\Users\myuser\data\files\ [*] => user_credits = 2000 [EXPERIMENT 3] ==>> Register C++ classes *=>> [StatefulFunctor] My new state is = 10 *=>> [StatefulFunctor] My new state is = 11 *=>> [StatefulFunctor] My new state is = 12 ----->>> Calling C++ classes from Lua <----- [TRACE] Counter created with => { name = untitled ; counter = 0 } [TRACE] increment event =>> counter = { untitled ; 1 } [TRACE] increment event =>> counter = { untitled ; 2 } [TRACE] increment event =>> counter = { untitled ; 3 } [*] value of counter is equal to = 3 [TRACE] I was set to value 2000 [*] Counter value is equal to = 2000 [FROM C++] counter value = 2000
Run Lua repl executable (defined in CMake):
$ rlwrap _build/lua-repl Lua 5.3.5 Copyright (C) 1994-2018 Lua.org, PUC-Rio > > > print(" Hello world Lua / Luna / Moon REPL ") Hello world Lua / Luna / Moon REPL > for i = 1, 5 do print(" i = " .. i ) end i = 1 i = 2 i = 3 i = 4 i = 5 function myfunction(a, b) return math.sin(a) * math.exp(b) / a - a * b end > myfunction(3.5, 2.0) -7.7405591279893 function add (a) local sum = 0 for i,v in ipairs(a) do sum = sum + v end return sum end > add({ 2.5, 10.2, -2.51, 8.251, 10.56}) 29.001 function add (a) local sum = 0 for i,v in ipairs(a) do sum = sum + v end return su
Building and running with Xmake
Building with Xmake:
$ >> xmake build -P . -v [ 25%]: ccache compiling.release embed-lua-sol.cpp /usr/bin/gcc -c -m64 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -std=c++17 \ -isystem /home/user/.xmake/packages/s/sol2/v3.2.1/5fa1980e381b40ed942748bdef09488a/include \ -isystem /home/user/.xmake/packages/l/lua/v5.4.4/4de5c31814a64c0d9242d1e524082873/include/lua \ -DNDEBUG -o build/.objs/embed-lua-sol/linux/x86_64/release/embed-lua-sol.cpp.o embed-lua-sol.cpp [ 50%]: linking.release embed-lua-sol /usr/bin/g++ -o build/linux/x86_64/release/embed-lua-sol build/.objs/embed-lua-sol/linux/x86_64/release/embed-lua-sol.cpp.o \ -m64 -L/home/user/.xmake/packages/l/lua/v5.4.4/4de5c31814a64c0d9242d1e524082873/lib -s -llua -ldl -lm [100%]: build ok!
Generating a CMakeLists.txt file:
$ >> xmake project -v -k cmake -P . # Overrides CMakeLists.txt if it already exists configure { plat = linux buildir = build arch = x86_64 mode = release ccache = true ndk_stdcxx = true host = linux kind = static } create ok!
Running the application:
$ >> xmake run -P . [EXPERIMENT 1] ==>> Evaluating string as code [LUA] Hello world lua [TRACE] addfun() => a = 10 ; b = 20 [LUA] result = 30.0 [LUA] Printing a vector -> v[1 ] = 0 -> v[2 ] = 3 -> v[3 ] = 6 -> v[4 ] = 9 -> v[5 ] = 12 [LUA] VAR points = 2000 [LUA] VAR character = <Marcus Tulius Cicero> [EXPERIMENT 2] ==>> Reading configuration [*] => asset_path = C:\\Users\myuser\data\files\ [*] => user_credits = 2000 [EXPERIMENT 3] ==>> Register C++ classes *=>> [StatefulFunctor] My new state is = 10 *=>> [StatefulFunctor] My new state is = 11 *=>> [StatefulFunctor] My new state is = 12 ----->>> Calling C++ classes from Lua <----- [TRACE] Counter created with => { name = untitled ; counter = 0 } [TRACE] increment event =>> counter = { untitled ; 1 } [TRACE] increment event =>> counter = { untitled ; 2 } [TRACE] increment event =>> counter = { untitled ; 3 } [*] value of counter is equal to = 3 [TRACE] I was set to value 2000 [*] Counter value is equal to = 2000 [FROM C++] counter value = 2000
1.6 Squirrel Scripting Language
1.6.1 Overview
- Squirrel is a embedded scripting language, similar to Lua, but with C-like syntax, designed to be embedded in larger C or C++ applications such as game engines. Squirrel is written in C++, but it only exposes a C API, which makes binding C++ code cumbersome. However, there are many libraries which simplifies the embedding of squirrel in C++ codebases.
Official Web Site
Official Repository
Squirrel fork with a more C++-like syntax
Articles about squirrel language
- Squirrel Programming Guide | Dev Center
- Speaking UNIX: The Squirrel portable shell and scripting language
- http://wiki.ogre3d.org/Squirrel+Scripting+Language
Applications using Squirrel
- CodeBlocks IDE for C and C++
- OpenTTD Game - http://www.openttd.org/en/
Libraries for simplifying embedding squirrel in C++ code
Libraries for simplifying squirrel embedding in C++ code (binding C++ code):
1.6.2 Building Squirrel standalone REPL interpreter
Download and build:
$ mkdir ~/build && cd build
$ git clone https://github.com/albertodemichelis/squirrel
$ cmake -H. -B_build -DCMAKE_BUILD_TYPE=Debug
$ cmake --build _build --target
Play with squirrel interactive shell (REPL):
$ _build/bin/sq Squirrel 3.1 stable Copyright (C) 2003-2017 Alberto Demichelis (64 bits) sq> print(" === Hello world Squirrel === ") === Hello world Squirrel === sq> function add_to_10(x){ return x + 10; } sq>print(add_to_10(25)) 35 sq> x <- cos(3.1415 / 2) + 10 sq>print(" x = " + x.tostring()) x = 10 sq> for(local i = 0; i < 5; i++) print(" \n [TRACE] i = " + i.tostring()); [TRACE] i = 0 [TRACE] i = 1 [TRACE] i = 2 [TRACE] i = 3 [TRACE] i = 4
1.6.3 Example - embedding Squirrel with Squall Library
This example demonstrates how to embed the Squirrel programming language in a C++ application using the Squall header-only library.
- Squal Repository: https://github.com/jonigata/squall
- Note: This project is self-contained, no library needs to be installed on the system as Squall automatically fetches Squirrel sources using Cmake FetchContent.
Sample Project
File: CMakeLists.txt
cmake_minimum_required (VERSION 3.11) project(squirrel_squall_test) set(CMAKE_CXX_STANDARD 17) set(CMAKE_CXX_STANDARD_REQUIRED on) set(BUILD_EXAMPLES off) # -----------------------------------------------# include(FetchContent) FetchContent_Declare( squall URL https://github.com/jonigata/squall/archive/master.zip ) FetchContent_MakeAvailable(squall) #-------- TARGET DEFINITIONS --------------------# message([TRACE] " squall_SOURCE_DIR = ${squall_SOURCE_DIR}/squall ") add_executable ( squirrel-test squirrel_test.cpp) target_link_libraries ( squirrel-test squirrel_static sqstdlib_static) target_include_directories ( squirrel-test PUBLIC ${squirrel_SOURCE_DIR}/include ${squall_SOURCE_DIR} )
File: squirrel_test.cpp
#include <iostream> #include <string> #include <algorithm> #include <vector> #include <squall/squall_vmstd.hpp> #include <squall/squall_klass.hpp> void some_cpp_fun(int n ) { for(int i = 0; i < n; i++) std::printf("\n [some_cpp_function] => i = %d ", i); } class ChartXY { int m_width; int m_height; public: ChartXY(): m_width(20), m_height(50) { std::cout << " [ChartXY] Ctor() - I was created!. OK. " << std::endl; } void set_width(int x){ m_width = x; } void set_height(int x){ m_height = x; } void draw() const { std::printf(" [ChartXY] draw() => Draw chart with: width = %d ; height = %d" , m_width, m_height); } }; int main() { // Create a virtual-machine object for Squirrel language // Note: throws squall::squirrel_error squall::VMStd vm; // ------------------------------------------------------------------------// // [EXPERIMENT 1] Evaluate scripts provided as strings // //-------------------------------------------------------------------------// std::puts(" =>> [EXPERIMENT] 1 - Evaluating code as string. "); std::puts(" ---------------------------------------------\n"); try { vm.dostring(R"( // --- Squirrel Comment ----- // print(" <SQUIRREL> =>> Hello world squirrel!"); function myfunc(x) { local a = x + 5; local b = 7 * a + x; return b - a + 10; } function myfunc2() { print(" \n <SQUIRREL> I was called by the C++ code "); } print("\n <SQUIRREL> =>> myfunc(4) = " + myfunc(4).tostring() ); print("\n\n <SQUIRREL> --- For Loop test ---- "); for(local i = 0; i < 5; i++) { print("\n i = " + i.tostring() ); } )"); } catch( squall::squirrel_error const& ex ) { std::cerr << "\n [SQUIRREL ERROR] Error => " << ex.what() << std::endl; } // ------------------------------------------------------------------------// // [EXPERIMENT 2] Evaluate scripts provided as strings // //-------------------------------------------------------------------------// std::puts("\n =>> [EXPERIMENT] 2 - Getting variables defined in the code."); std::puts(" -----------------------------------------------------------\n"); { vm.dostring(R"( // ---- Global varibles for configuration ------ // ::myvar_width <- 100; ::myvar_float <- 122.56161; ::myvar_string <- "/path/to/interpreter.exe"; )"); squall::TableBase table = vm.root_table(); auto myvar_float = table.get<float>("myvar_float"); auto myvar_width = table.get<int>("myvar_width"); auto myvar_string = table.get<std::string>("myvar_string"); std::cout << " =>> myvar_width = " << myvar_width << std::endl; std::cout << " =>> myvar_float = " << myvar_float << std::endl; std::cout << " =>> myvar_string = " << myvar_string << std::endl; } // ------------------------------------------------------------------------// // [EXPERIMENT 3] Call functions defined in the script (Virtual Machine ) // //-------------------------------------------------------------------------// std::puts("\n\n =>> [EXPERIMENT] 3 - Calling functions defined in the script (VM)"); std::puts(" ------------------------------------------------------------------\n"); // Throws: 'squall::squirrel_error' int result = vm.call<int>("myfunc", 10); std::cout << " =>>> myfunc(4) = " << result << std::endl; vm.call<void>("myfunc2"); // ------------------------------------------------------------------------// // [EXPERIMENT 4] Call C++ functions from the script // //-------------------------------------------------------------------------// std::puts("\n\n =>> [EXPERIMENT] 4 - Calling functions defined in the script (VM)"); std::puts(" ------------------------------------------------------------------\n"); // Register C++ function pointer vm.defun("some_cpp_fun", &some_cpp_fun); vm.dostring(R"( print(" \n [SQUIRREL] => Call C++ function some_cpp_fun() "); some_cpp_fun(5); )"); // Register C++ lambda object vm.defun("call_me", [=](std::string const& param) { std::cout << "\n [TRACE] call_me() Parameter = " << param << "\n"; return " name = " + param; }); vm.dostring(R"( local x = call_me("<SQUIRREL-INTERPRETER>"); print(" [SQUIRREL] \n x <- " + x); )"); // ------------------------------------------------------------------------// // [EXPERIMENT 5] Call C++ classes from Squirrel-side // //-------------------------------------------------------------------------// std::puts("\n\n =>> [EXPERIMENT] 5 - Calling C++ classes from Squirrel "); std::puts(" ------------------------------------------------------------------\n"); // Create metaobject 'k' that describes ChartXY class squall::Klass<ChartXY> k(vm, "ChartXY"); k.func("set_width", &ChartXY::set_width); k.func("set_height", &ChartXY::set_height); k.func("draw", &ChartXY::draw); vm.dostring(R"( function manipulate_chart(ch){ ch.set_width(25); ch.set_height(10); ch.draw(); } function draw_with(ch, w, h) { print(" \n [SQUIRREL LOG] Function draw_with called. OK. \n"); ch.set_width(w); ch.set_height(h); ch.draw(); } )"); ChartXY mychart; vm.call<void>("manipulate_chart", &mychart); vm.call<void>("draw_with", &mychart, 100, 200); std::cout << "\n\n"; squall::TableBase table = vm.root_table(); // Pass object to Squirrel side table.set("mychart", mychart); vm.dostring(R"( mychart.set_width(250); mychart.set_width(600); mychart.draw(); )"); #if 0 // Segmentation Falt Coredump if the C++ object // is created on the Squirrel-side. vm.dostring(R"( local c = ChartXY(); c.set_width(150); c.set_height(175); c.draw(); )"); #endif return 0; }
Build:
$ cmake -H. -B_build -DCMAKE_BUILD_TYPE=Debug $ cmake --build _build --target
Check executable dependencies:
$ ldd _build/squirrel-test linux-vdso.so.1 (0x00007ffe34dd6000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd3ebd2a000) libm.so.6 => /lib64/libm.so.6 (0x00007fd3ebbe4000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd3ebbc9000) libc.so.6 => /lib64/libc.so.6 (0x00007fd3eb9ff000) /lib64/ld-linux-x86-64.so.2 (0x00007fd3ebf3e000)
Run application:
$ _build/squirrel-test =>> [EXPERIMENT] 1 - Evaluating code as string. --------------------------------------------- <SQUIRREL> =>> Hello world squirrel! <SQUIRREL> =>> myfunc(4) = 68 <SQUIRREL> --- For Loop test ---- i = 0 i = 1 i = 2 i = 3 i = 4 =>> [EXPERIMENT] 2 - Getting variables defined in the code. ----------------------------------------------------------- =>> myvar_width = 100 =>> myvar_float = 122.562 =>> myvar_string = /path/to/interpreter.exe =>> [EXPERIMENT] 3 - Calling functions defined in the script (VM) ------------------------------------------------------------------ =>>> myfunc(4) = 110 <SQUIRREL> I was called by the C++ code =>> [EXPERIMENT] 4 - Calling functions defined in the script (VM) ------------------------------------------------------------------ [SQUIRREL] => Call C++ function some_cpp_fun() [some_cpp_function] => i = 0 [some_cpp_function] => i = 1 [some_cpp_function] => i = 2 [some_cpp_function] => i = 3 [some_cpp_function] => i = 4 [TRACE] call_me() Parameter = <SQUIRREL-INTERPRETER> [SQUIRREL] x <- name = <SQUIRREL-INTERPRETER> =>> [EXPERIMENT] 5 - Calling C++ classes from Squirrel ------------------------------------------------------------------ [ChartXY] Ctor() - I was created!. OK. [ChartXY] draw() => Draw chart with: width = 25 ; height = 10 [SQUIRREL LOG] Function draw_with called. OK. [ChartXY] draw() => Draw chart with: width = 100 ; height = 200 [ChartXY] draw() => Draw chart with: width = 600 ; height = 200
1.7 Duktape - Embeddable Javascript Engine
1.7.1 Overview
- Duktape is a small footprint embeddable Javascript (ECMAScript) engine, written in C, which can be used for providing scripting capabilities for C or C++ applications.
- License: MIT
- Possible Use Cases:
- Configuration
- Data description language
- User plugins
- User extensions
- Scripting for games
- Features:
- Embeddable, portable, compact: can run on platforms with 160kB flash and 64kB RAM
- Built-in debugger
- Built-in regular expression engine
- Minimal, retargetable platform dependencies
- Combined reference counting and mark-and-sweep garbage collection with finalization
- Bytecode dump/load for caching compiled functions
- Distributable includes an optional logging framework, CommonJS-based module loading implementations, etc
Official Website
Official Repository
C++ Binding Libraries
1.7.2 Example project with DukGlue C++ binding library
This following project CMakeLists.txt automatically downloads dukglue binding library and duktape engine sources and builds a C++ demonstration code embedding duktape JavaScript engine.
- DukGlue Binding Library: https://github.com/Aloshi/dukglue
- Advantage:
- Easy to use and lots of examples.
- Drawbacks:
- Lack of namespaces which enhances API discoverability
- Lack of C++ wrappers to some Duktape C-types
- Lack of a CMakeLists.txt at the top directory.
- Advantage:
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.0) project(duktap-embed) include(FetchContent) # Download library archive (zip, *.tar.gz, ...) from URL macro(Download_Library_Url NAME URL) FetchContent_Declare(${NAME} URL ${URL}) FetchContent_GetProperties(${NAME}) if(NOT ${NAME}_POPULATED) FetchContent_Populate(${NAME}) # add_subdirectory(${${NAME}_SOURCE_DIR} ${${NAME}_BINARY_DIR}) endif() endmacro() # ====>> Duktape JavaScript Engine Configuration <<===========# Download_Library_Url(duktape "https://duktape.org/duktape-2.5.0.tar.xz" ) # FetchContent_MakeAvailable(duktape) message( [TRACE] " =>> duktape_SOURCE_DIR = ${duktape_SOURCE_DIR} ") file(GLOB_RECURSE duktape_sources "${duktape_SOURCE_DIR}/src/*.c") file(GLOB_RECURSE duktape_headers "${duktape_SOURCE_DIR}/src/*.h") message( [TRACE] " duktape_sources = ${duktape_sources} ") add_library (duktape ${duktape_sources} ${duktape_headers} ) target_include_directories(duktape PUBLIC ${duktape_SOURCE_DIR}/src ) # ----------- DukGlue Library ----------------------------# FetchContent_Declare( dukglue URL https://github.com/Aloshi/dukglue/archive/master.zip ) FetchContent_MakeAvailable(dukglue) #----- Main Target Definition ----------------------------# add_executable(duktape-embed duktape-embed.cpp) target_link_libraries(duktape-embed duktape dukglue)
File: duktape-embed.cpp
#include <iostream> #include <string> #include <vector> #include <cassert> // Repository: https://github.com/Aloshi/dukglue #include <dukglue/dukglue.h> void print_number(int x) { std::cout << " [TRACE] number passed is = " << x << std::endl; } void log_text(std::string const& text) { std::cout << " =>> [C++-LOG] - " << text << std::endl; } int eval_code(duk_context* ctx, std::string const& code) { return duk_peval_string(ctx, code.c_str()); } void plot_points(std::vector<float> const& points) { std::cout << " =>> [TRACE] Plot points =>> "; for(auto const& x: points) { std::cout << " x = " << x; } std::cout << " \n"; } class Counter { private: std::string m_name; int m_counter; public: // Ctor [1] => Default ctor Counter(): Counter("untitled", 0) { } // Ctor [2] Counter(std::string name, int counter) : m_name{std::move(name)}, m_counter{counter} { std::cout << " [TRACE] Counter created with => { " << " name = " << m_name << " ; counter = " << m_counter << " } \n"; } int getCounter() const { return m_counter; } void setCounter(int n) { m_counter = n; std::cout << " [TRACE] I was set to value " << n << std::endl; } void increment() { m_counter++; std::cout << " [TRACE] increment event =>> counter = { " << m_name << " ; " << m_counter << " } " << std::endl; } }; int main() { // Create Duktape Virtual machine duk_context* ctx = duk_create_heap_default(); /* ========================== EXPERIMENT 1 =============*/ std::puts("\n === [EXPERIMENT 1] ==>> Register and call C++ functions <<===== "); { // Register pointer to functions function (function pointer) in // the JS engine (aka virtual machine) dukglue_register_function(ctx, &print_number, "print_number"); dukglue_register_function(ctx, log_text, "log_text"); dukglue_register_function(ctx, plot_points, "plot_points"); const char* code1 = R"( print_number(10); log_text(" Hello world from Javascript" ); log_text(" Toke is equal to " + 100 ); log_text( " " + 1000 ); plot_points( [ 20.5, 100.23, -125.254, 8.251, 100.0 ]); )"; // Evaluate code, returns false on error auto n = eval_code(ctx, code1); if(n) { std::cerr << " [ERROR] A duktape evaluation error has happened. " << std::endl; } } /* ========================== EXPERIMENT 2 ====================*/ std::puts("\n === [EXPERIMENT 2] ==>> Register and call C++ classes <<===== \n"); { // Register class counter dukglue_register_constructor<Counter>(ctx, "Counter"); dukglue_register_constructor<Counter, std::string, int>(ctx, "Counter"); dukglue_register_method(ctx, &Counter::getCounter , "getCounter"); dukglue_register_method(ctx, &Counter::setCounter , "setCounter"); dukglue_register_method(ctx, &Counter::increment , "increment"); dukglue_register_property(ctx // Pointer to engine (VM) , &Counter::getCounter // Getter , &Counter::setCounter // Setter , "number" // Property name ); int ret = eval_code(ctx, R"( var counter = new Counter("mycounter", 10); for(i = 0 ; i < 5; i++) { counter.increment(); } var n = counter.getCounter(); log_text(" [BEFORE] Counter value = " + n ); counter.setCounter(100); log_text(" [AFTER 1 ] Counter value = " + counter.getCounter() ); counter.number = 400; log_text(" [AFTER 2] Counter value = " + counter.number ); )"); assert( ret == 0 ); } /* ======= Calling Javascript Engine from C++ ====================*/ // Note: It is useful for reading data or user configuration std::puts("\n === [EXPERIMENT 3] ==>> Calling engine objects from C++ <<===== \n"); { const char* code = R"( // Global variables for configuration points = 200; asset_path = "C:\\\\Users\\dummy\\data\\graphics"; function my_js_function(n){ log_text( " <my_js_function> =>> n = " + n ); var k = 20 * n + 100; return k; } )"; eval_code(ctx, code); // Throws error: DukErrorException auto points = dukglue_peval<int>(ctx, "points"); std::cout << " [*] =>> points = " << points << std::endl; // Throws error: DukErrorException auto asset_path = dukglue_peval<std::string>(ctx, "asset_path"); std::cout << " [*] =>> asset_path = " << asset_path << std::endl; auto jsexpr = dukglue_peval<double>(ctx, "3.51 * 10.52 - 8.251 / 100"); std::cout << " [*] jsexpr = " << jsexpr << std::endl; // Call Javascript function from C++ auto func = dukglue_peval<DukValue>(ctx, "my_js_function"); int res = dukglue_pcall<int>(ctx, func, 20); std::cout << " [*] res = " << res << std::endl; } // Release Javascript engine object (aka virtual machine) ::duk_destroy_heap(ctx); return 0; }
Build:
$ cmake -H. -B_build -DCMAKE_BUILD_TYPE=Debug $ cmake --build _build --target
Program output:
$ ./build/duktape-embed
=== [EXPERIMENT 1] ==>> Register and call C++ functions <<=====
[TRACE] number passed is = 10
=>> [C++-LOG] - Hello world from Javascript
=>> [C++-LOG] - Toke is equal to 100
=>> [C++-LOG] - 1000
=>> [TRACE] Plot points =>> x = 20.5 x = 100.23 x = -125.254 x = 8.251 x = 100
=== [EXPERIMENT 2] ==>> Register and call C++ classes <<=====
[TRACE] Counter created with => { name = mycounter ; counter = 10 }
[TRACE] increment event =>> counter = { mycounter ; 11 }
[TRACE] increment event =>> counter = { mycounter ; 12 }
[TRACE] increment event =>> counter = { mycounter ; 13 }
[TRACE] increment event =>> counter = { mycounter ; 14 }
[TRACE] increment event =>> counter = { mycounter ; 15 }
=>> [C++-LOG] - [BEFORE] Counter value = 15
[TRACE] I was set to value 100
=>> [C++-LOG] - [AFTER 1 ] Counter value = 100
[TRACE] I was set to value 400
=>> [C++-LOG] - [AFTER 2] Counter value = 400
=== [EXPERIMENT 3] ==>> Calling engine objects from C++ <<=====
[*] =>> points = 200
[*] =>> asset_path = C:\\Users\dummy\data\graphics
[*] jsexpr = 36.8427
=>> [C++-LOG] - <my_js_function> =>> n = 20
[*] res = 500
1.7.3 Example project with Duktape-CC binding library
- Duktape-CC binding library: https://github.com/stfwi/duktape-cc/
- Benefits
- Namespace
- RAII for duktape C-API
- Javascript common known APIs such as console.log()
- Disadvantage:
- No possible to bind lambda function.
- No possible to bind C++ classes or objects
- No CMakeLists.txt at top directory, which makes the library usage easier, but the following cmake scripts solves this problem.
- Benefits
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.0) project(duktape-cc-trial) set(CMAKE_CXX_STANDARD 17) set(CMAKE_VERBOSE_MAKEFILE ON) include(duktape.cmake) #----- Main Target Definition ----------------------------# add_executable( duktape-script duktape-script.cpp) target_link_libraries( duktape-script duktape-cc )
File: duktape.cmake
include(FetchContent) # Note: the 'add_subriectory' line was commented becuyase # library that will be downloaded does not have # a CMakeListst.txt file at the root directory. macro(Download_Library_Git NAME TAG REPOSITORY_URL) FetchContent_Declare( ${NAME} GIT_REPOSITORY ${REPOSITORY_URL} GIT_TAG ${TAG} ) FetchContent_GetProperties(${NAME}) if(NOT cpputest_POPULATED) FetchContent_Populate(${NAME}) message("${NAME}_SOURCE_DIR} = ${${NAME}_SOURCE_DIR}") # => Disable following line: the library does not have a CMakeLists.txt # at the root directory. # add_subdirectory(${${NAME}_SOURCE_DIR} ${${NAME}_BINARY_DIR}) endif() endmacro() # ====>> Duktape JavaScript Engine Configuration <<===========# Download_Library_Git( duktape-cc 51fed200b0c3353a60fa560aa8a13a480f0ec0c7 https://github.com/stfwi/duktape-cc/ ) file(GLOB_RECURSE duktape_sources "${duktape-cc_SOURCE_DIR}/duktape/*.c") file(GLOB_RECURSE duktape_headers1 "${duktape-cc_SOURCE_DIR}/duktape/*.hh") file(GLOB_RECURSE duktape_headers2 "${duktape-cc_SOURCE_DIR}/duktape/*.h") add_library( duktape-cc ${duktape_sources} ${duktape_headers1} ${duktape_headers2} ) target_include_directories( duktape-cc PUBLIC ${duktape-cc_SOURCE_DIR} )
File: duktape-script.cpp
#include <iostream> #include <string> #include <vector> #include <fstream> #include <duktape/duktape.hh> #include <duktape/mod/mod.stdio.hh> int main() { std::puts(" [TRACE] Program started Ok. "); // Create Duktape Engine object (Virtual Machine) auto ctx = duktape::engine{}; // Load all functions from stdio module // ==> Note: It is necessary for console.log() work duktape::mod::stdio::define_in(ctx); std::puts("\n [EXPERIMENT 1] ======= Evaluate string as code ========"); ctx.eval<void>(R"( console.log(" [INFO] Hello world Javascript Engine "); var i = 0; while(i < 5) { console.log(" [TRACE] <ducktape> i = " + i); i++; } )"); std::puts("\n [EXPERIMENT 2] == Read/write values to the engine the engine ="); // Write or pass values to the engine. ctx.define("app.version", "0.251"); ctx.define("user.points", 1000); ctx.define("array1", std::vector<double>{ 4.51, 9.25, -25.154, 205.2 }); ctx.define("array2", std::vector<std::string>{ "C++", "ADA-Spark", "Rust", "Dlang", "OCaml" }); std::string script_file = "/tmp/myscript.js"; const char* script_code = R"( console.log(" => app.version = " + app.version ); console.log(" => user.points = " + user.points ); console.log(" => array1 = " + array1); console.log(" => array2 = " + array2); myconfig_path = "/Users/data/osx/config"; user_credits = 1020; vector = [100.25, 90.251, -120.5150]; )"; // Write script code to file auto fs = std::ofstream(script_file); // Flush forces writing to the IO fs << script_code << std::flush; // Execute script from file ctx.include(script_file); std::cout << " ---- Read configuration from file " << std::endl; // Throws exception: duktape::detail::basic_script_error<void> auto myconfig_path = ctx.eval<std::string>("myconfig_path"); auto credits = ctx.eval<int>("user_credits"); auto vec = ctx.eval<std::vector<double>>("vector"); std::cout << "\n\n[*] my_config_path = " << myconfig_path << "\n"; std::cout << "[*] user_credits = " << credits << "\n"; std::cout << "[*] vec[0] = " << vec[0] << " ; vec[1] = " << vec[1] << "\n"; return 0; }
Build:
$ cmake -H. -B_build -DCMAKE_BUILD_TYPE=Debug $ cmake --build _build --target
Program output:
$ _build/duktape-script [TRACE] Program started Ok. [EXPERIMENT 1] ======= Evaluate string as code ======== [EXPERIMENT 2] == Read/write values to the engine the engine = [INFO] Hello world Javascript Engine [TRACE] <ducktape> i = 0 [TRACE] <ducktape> i = 1 [TRACE] <ducktape> i = 2 [TRACE] <ducktape> i = 3 [TRACE] <ducktape> i = 4 => app.version = 0.251 => user.points = 1000 => array1 = 4.51,9.25,-25.154,205.2 => array2 = C++,ADA-Spark,Rust,Dlang,OCaml ---- Read configuration from file [*] my_config_path = /Users/data/osx/config [*] user_credits = 1020 [*] vec[0] = 100.25 ; vec[1] = 90.251
1.8 QuickJS - ES20 Javascript Engine
QuickJS is small and lightweight embeddable Javascript engine, written in C, which supports ES2020 technical specification. This engine was created by Fabrice Bellard, creator of many widely used open source projects, namely, QEMU emulator used for emulation of operating systems and embedded systems hardware; FFmpeg tool for video and audio conversion; TCC (Tiny C Compiler). Some features supported by the engine are: modules, proxies, BigInt and asynchronous generators.
- QuickJS Web Site:
- QuickJS Repository:
- QuickJSpp C++ Wrapper Repository (License: CC0)
See also:
Sample CMake project
The following sample CMake projects demonstrates how to embed QuickJS engine in a C++ code by using the QuickJSpp C++ wrapper. The CMake script (CMakeLists.txt) downloads the QuickJS source code from its repository and creates static library target for the JavaScript engine which is then linked against the sample application embedding QuickJS. This CMakeLists.txt script fully automates all steps, which relieves the library user from installing QuickJS manually by using GNU make, which is the original building system used by the engine.
GIST Containing all sources:
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.9) project(QuickJS-Experiment) #========== Global Configurations =============# #----------------------------------------------# set( CMAKE_CXX_STANDARD 17 ) set( CMAKE_VERBOSE_MAKEFILE ON ) set( CMAKE_CXX_EXTENSIONS OFF) # ------------ Download CPM CMake Script ----------------# ## Automatically donwload and use module CPM.cmake file(DOWNLOAD https://raw.githubusercontent.com/TheLartians/CPM.cmake/v0.26.2/cmake/CPM.cmake "${CMAKE_BINARY_DIR}/CPM.cmake") include("${CMAKE_BINARY_DIR}/CPM.cmake") #----------- Add dependencies --------------------------# CPMAddPackage( NAME quickjs GITHUB_REPOSITORY bellard/quickjs GIT_TAG 204682fb87ab9312f0cf81f959ecd181180457bc # DOWNLOAD_ONLY YES ) # Add this directory where is this file (CMakeLists.txt) to include path. include_directories( ${CMAKE_CURRENT_LIST_DIR} ) # =============== QuickJS settings ====================================# include_directories( ${quickjs_SOURCE_DIR}/ ) message([TRACE] " quickjs source = ${quickjs_SOURCE_DIR} ") file(GLOB quickjs_hpp ${quickjs_SOURCE_DIR}/*.h ) file(GLOB quickjs_src ${quickjs_SOURCE_DIR}/quickjs.c ${quickjs_SOURCE_DIR}/libregexp.c ${quickjs_SOURCE_DIR}/libunicode.c ${quickjs_SOURCE_DIR}/cutils.c ${quickjs_SOURCE_DIR}/quickjs-libc.c ${quickjs_SOURCE_DIR}/libbf.c ) add_library( qjs-engine ${quickjs_src} ${quickjs_hpp} ) target_compile_options( qjs-engine PRIVATE -MMD -MF -Wno-sign-compare -Wno-missing-field-initializers -Wundef -Wuninitialized -Wundef -Wuninitialized -Wwrite-strings -Wchar-subscripts ) target_compile_definitions( qjs-engine PUBLIC CONFIG_BIGNUM=y CONFIG_VERSION="2020-11-08" _GNU_SOURCE ) if(UNIX) target_link_libraries( qjs-engine PRIVATE m pthread dl) endif() # =========== Target Settings =========================================# # QuickJS compiler. add_executable( qjsc ${quickjs_SOURCE_DIR}/qjsc.c ) target_compile_definitions( qjsc PUBLIC CONFIG_BIGNUM=y CONFIG_VERSION="2020-11-08" _GNU_SOURCE ) target_link_libraries( qjsc qjs-engine ) # Sample application that embeds the quickJS Javascript engine. add_executable( main main.cpp ) target_link_libraries( main qjs-engine )
File: main.cpp
#include <iostream> #include <quickjspp.hpp> class ChartXY { private: double x = 0.0, y = 0.0; double width = 100.0, height = 100.0; public: ChartXY() { } ChartXY(double w, double h): width(w), height(h) { } void show() const { std::cout << " [ĆhartXY Object] x = " << x << " ; y = " << y << " ; width = " << width << " height = " << height << '\n'; } void set_width(double width) { this->width = width; std::fprintf(stdout, " [ChartXY] Width set to %f \n", width); } void set_height(double height) { this->height = height; std::fprintf(stdout, " [ChartXY] Height set to %f \n", height); } double get_height() const { return this->height; } double get_width () const { return this->width; } void plot_points(std::vector<double> const& points) { std::cout << " [ChartXY] Plotting points =>> "; for(auto p : points) { std::cout << " " << p; } std::cout << "\n"; } }; qjs::Value try_eval_module( qjs::Context& context , qjs::Runtime& runtime , std::string const& code) { try { return context.eval(code, "<eval>", JS_EVAL_TYPE_MODULE); } catch( const qjs::exception& ex) { //js_std_dump_error(ctx); auto exc = context.getException(); std::cerr << (exc.isError() ? "Error: " : "Throw: ") << (std::string)exc << std::endl; if((bool)exc["stack"]) std::cerr << (std::string)exc["stack"] << std::endl; js_std_free_handlers(runtime.rt); return context.newObject(); } } int main(int argc, char** argv) { std::cout << " [INFO] Started Ok" << std::endl; using namespace qjs; Runtime runtime; //JSRuntime* rt = runtime.rt; Context context(runtime); //JSContext* ctx = context.ctx; js_std_init_handlers(runtime.rt); /* loader for ES6 modules */ JS_SetModuleLoaderFunc(runtime.rt, nullptr, js_module_loader, nullptr); js_std_add_helpers(context.ctx, argc - 1, argv + 1); /* system modules */ js_init_module_std(context.ctx, "std"); js_init_module_os(context.ctx, "os"); std::fprintf(stderr, " [TRACE] Before loading code. \n"); const char* str = R"( /* import * as std from 'std'; import * as os from 'os'; globalThis.std = std; globalThis.os = os; */ console.log(" [QUICJS] => =>> Script loaded. Ok. \n"); for(n = 1; n <= 5; n++){ console.log(` [QUICKJS-TRACE] n = ${n}/5 `); } // ----- Define user variables here ---- asset_path = "/Users/mydir-macosx/data/blackjack.txt"; game_score = 0.25156; let x = 10.352; datapoints = [ 0.251, 19.2363, 9.262, 100.125 ]; console.log(`\n [QUICKJS] asset_path = ${asset_path}` ); console.log(` [QUICKJS] score = ${100.0 * game_score} (in percent) \n`); console.log(` [QUICKJS] data points = ${datapoints} `) )"; try { context.eval(str); //, "", JS_EVAL_TYPE_MODULE); } catch( const qjs::exception& ex) { //js_std_dump_error(ctx); auto exc = context.getException(); std::cerr << (exc.isError() ? "Error: " : "Throw: ") << (std::string)exc << std::endl; if((bool)exc["stack"]) std::cerr << (std::string)exc["stack"] << std::endl; js_std_free_handlers(runtime.rt); return 1; } std::fprintf(stderr, " [TRACE] After loading code. \n"); int number = (int) context.eval(" 10 * (3 + 1 + 10 ) - 1000 * 2"); std::cout << " [RESULT] number = " << number << '\n'; std::puts("\n [*] ===== Read configuration variables defined in the js code. ====\n"); { auto var_asset_path = context.global()["asset_path"].as<std::string>(); std::cout << " =>> asset_path = " << var_asset_path << '\n'; auto score = context.global()["game_score"].as<double>(); std::cout << " =>> game_score (%) = " << 100.0 * score << '\n'; auto points = context.global()["datapoints"].as<std::vector<double>>(); std::cout << " ==>> datapoints = [" << points.size() << "]( "; for(auto p : points) { std::cout << p << ' '; } std::cout << " ) \n"; } std::puts("\n [*] ===== Define variables in C++-side ====\n"); { context.global()["user_name"] = context.newValue("Gaius Julius Caesar"); context.global()["user_points"] = context.newValue(101235); auto data = std::vector<std::string>{ "ADA", "RUST", "C++11", "C++17", "C++20" , "Dlang", "OCaml", "C#(Csharp)" }; context.global()["user_data"] = context.newValue(data); // Note: This code should be within an exception handler. context.eval(R"( console.log(` [STEP 2] user_name = ${user_name} ; points = ${user_points} `); console.log(` [STEP 2] user_data = ${user_data} ; type = ${ typeof(user_data) } `); console.log(` [STEP 2] user_data[5] = ${ user_data[5] } `) // Iterate over the array for(let x in user_data){ console.log(user_data[x]); } )"); } std::puts("\n [*] ===== Register class ChartXY ====\n"); auto& module = context.addModule("chart"); module.class_<ChartXY>("ChartXY") .constructor() .constructor<double, double>() .fun<&ChartXY::show>("show") .fun<&ChartXY::set_height>("set_height") .fun<&ChartXY::set_width>("set_width") .fun<&ChartXY::plot_points>("plot_points") .property<&ChartXY::get_width, &ChartXY::set_width>("width") .property<&ChartXY::get_height, &ChartXY::set_height>("height") ; module.add("user_path", "/Users/data/assets/game/score/marks"); module.add("user_points", 1023523); module.function("myfunc", [](double x, double y){ return 4.61 * x + 10 * y * y; }); const char* module_code = R"( import { ChartXY } from "chart"; import * as chart from "chart" console.log(` [SCRIPT] chart.user_path = ${chart.user_path} \n\n`); console.log(` [SCRIPT] chart.user_points = ${chart.user_points} \n\n`); console.log(` [SCRIPT] Result = ${ chart.myfunc(5.61, 9.821) } \n`); let ch = new ChartXY(200, 600); ch.show(); ch.set_width(800.0); ch.set_height(700.0) ch.show(); console.log(" [QUICKJS] Change chart dimensions using properties "); ch.width = 500; ch.height = 660; console.log(`\n <QUICKJS> Chart width = ${ch.width} ; Chart height = ${ch.height} \n`); ch.plot_points( [ 10.522, 8.261, -100.24, 7.2532, 56.123, 89.23 ] ); )"; try_eval_module(context, runtime, module_code); js_std_loop(context.ctx); // ----- Shutdown virtual machine ---------------// js_std_free_handlers(runtime.rt); return 0; }
Building and Running
Download sources:
$ git clone https://gist.github.com/1abdd1d36cd3e973cd1f11f5c20ef7eb qqjs && cd qqjs
$ ls
CMakeLists.txt main.cpp quickjspp.hpp
Building:
$ cmake -H. -B_build -DCMAKE_BUILD_TYPE=Debug $ cmake --build _build --target
Running qjsc (QuickJS - transpiler or C code generator)
$ _build/qjsc QuickJS Compiler version 2020-11-08 usage: qjsc [options] [files] options are: -c only output bytecode in a C file -e output main() and bytecode in a C file (default = executable output) -o output set the output filename -N cname set the C name of the generated data -m compile as Javascript module (default=autodetect) -D module_name compile a dynamically loaded module or worker -M module_name[,cname] add initialization code for an external C module -x byte swapped output -p prefix set the prefix of the generated C names -S n set the maximum stack size to 'n' bytes (default=262144)
Running main application, which embeds QuickJS JS engine:
$ >> _build/main [INFO] Started Ok [TRACE] Before loading code. [QUICJS] => =>> Script loaded. Ok. [QUICKJS-TRACE] n = 1/5 [QUICKJS-TRACE] n = 2/5 [QUICKJS-TRACE] n = 3/5 [QUICKJS-TRACE] n = 4/5 [QUICKJS-TRACE] n = 5/5 [QUICKJS] asset_path = /Users/mydir-macosx/data/blackjack.txt [QUICKJS] score = 25.156 (in percent) [QUICKJS] data points = 0.251,19.2363,9.262,100.125 [TRACE] After loading code. [RESULT] number = -1860 [*] ===== Read configuration variables defined in the js code. ==== =>> asset_path = /Users/mydir-macosx/data/blackjack.txt =>> game_score (%) = 25.156 ==>> datapoints = [4]( 0.251 19.2363 9.262 100.125 ) [*] ===== Define variables in C++-side ==== [STEP 2] user_name = Gaius Julius Caesar ; points = 101235 [STEP 2] user_data = ADA,RUST,C++11,C++17,C++20,Dlang,OCaml,C#(Csharp) ; type = object [STEP 2] user_data[5] = Dlang ADA RUST C++11 C++17 C++20 Dlang OCaml C#(Csharp) [*] ===== Register class ChartXY ==== [SCRIPT] chart.user_path = /Users/data/assets/game/score/marks [SCRIPT] chart.user_points = 1023523 [SCRIPT] Result = 990.3825099999999 [ĆhartXY Object] x = 0 ; y = 0 ; width = 200 height = 600 [ChartXY] Width set to 800.000000 [ChartXY] Height set to 700.000000 [ĆhartXY Object] x = 0 ; y = 0 ; width = 800 height = 700 [QUICKJS] Change chart dimensions using properties [ChartXY] Width set to 500.000000 [ChartXY] Height set to 660.000000 <QUICKJS> Chart width = 500 ; Chart height = 660 [ChartXY] Plotting points =>> 10.522 8.261 -100.24 7.2532 56.123 89.23
1.9 Chaiscript
Scripting engine available as a header-only library that has Javascript-like syntax and easy integration to C++ codebases.
Drawbacks:
- Header-only => Slow compile-time and large executable size due to the intesive use of templates. Extern templates C++ language feature could reduce the compile time.
Repository:
Web site:
Files
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.0) project(chaiscript-eval) set(CMAKE_CXX_STANDARD 17) set(CMAKE_VERBOSE_MAKEFILE ON) include( FetchContent ) set( BUILD_SAMPLES OFF CACHE BOOL "") set( BUILD_SAMPLES OFF CACHE BOOL "") set( RUN_FUZZY_TESTS OFF CACHE BOOL "") set( RUN_PERFORMANCE_TESTS OFF CACHE BOOL "") FetchContent_Declare( chaiscript GIT_REPOSITORY https://github.com/ChaiScript/ChaiScript/ GIT_TAG v6.1.0 ) FetchContent_MakeAvailable(chaiscript) include_directories( chaiscript-runtime PUBLIC ${chaiscript_SOURCE_DIR}/include ) add_executable( runner chaiscript-eval.cpp) if(UNIX) target_link_libraries( runner PUBLIC pthread dl ) endif()
File: chaiscript-eval.cpp
#include <iostream> #include <string> #include <chaiscript/chaiscript.hpp> void scriptable_function(const std::string& label, int w) { std::cout << "\n [CALLED] label = " << label << " ; w = " << w << '\n'; } class Robot { std::string name; float x = 0, y = 0; public: Robot(){ } Robot(float x, float y){} void setPosition(float x, float y) { this->x = x; this->y = y; std::fprintf(stderr, " [INFO] Robot moved to x = %f ; y = %f \n", x, y); } void showPosition() { std::fprintf(stderr, " [INFO] Robot position (x = %f, y = %f ) \n", x, y); } }; int main() { // Create script engine object chaiscript::ChaiScript chai; // Register user function chai.add( chaiscript::fun(&scriptable_function) , "scriptable_function"); chai.add( chaiscript::constructor<Robot()>(), "Robot" ); chai.add( chaiscript::fun(&Robot::showPosition), "showPosition"); chai.add( chaiscript::fun(&Robot::setPosition), "setPosition" ); const char* code = R"( // It supports C++-like syntax for(var i = 0; i < 5; ++i) { print(i); } puts(" ========= Line ================= \n"); scriptable_function("Moon", 200); scriptable_function("Mars", 500); var robot = Robot(); robot.setPosition(200, 400); robot.showPosition(); // User configuration function will be called // by the script engine. def on_init_hook() { puts("\n [TRACE] User function called. Ok."); } )"; // Attempt to evaluate code try { chai.eval(code); } catch(chaiscript::exception::eval_error const& ex) { std::cout << " [TRACE] Exception = " << ex.what() << std::endl; } std::puts(" ==== Get robot object from script ========="); auto robot = chai.eval<std::shared_ptr<Robot>>("robot"); robot->showPosition(); robot->setPosition(400, 1000); } // ---- End of main() ----------//
Check Executable
$ du -h build/runner 23M build/runner # Remove debugging symbols $ strip build/runner $ du -h build/runner 7.6M build/runner $ file build/runner build/runner: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
Running
$ ./runner 0 1 2 3 4 ========= Line ================= [CALLED] label = Moon ; w = 200 [CALLED] label = Mars ; w = 500 [INFO] Robot moved to x = 200.000000 ; y = 400.000000 [INFO] Robot position (x = 200.000000, y = 400.000000 ) ==== Get robot object from script ========= [INFO] Robot position (x = 200.000000, y = 400.000000 ) [INFO] Robot moved to x = 400.000000 ; y = 1000.000000
1.10 Python Engine via Pybind11
Documentation:
Advantages:
- High popularity
- Lost of libraries
- Easy usage
Drawbacks:
- Hard to static link
- Not ligthweight and designed to be embedded as Lua.
- It is not possible to run multiple instances of the Python interpreter.
- It is not possible to sandbox the interpreter and restrict accessing files or process manipulation APIs.
- Requires pre-installation of Python development headers. But, it can be mitigated by using Anaconda or miniconda Python distributions.
Known Cases:
- GDB - GNU Debugger
- IDA Debugger
- Sublime Text Editor
Sample Project
GIST containing the sources:
File: CMakeLists.txt
cmake_minimum_required(VERSION 3.9) project(embed-python-scripting) #========== Global Configurations =============# #----------------------------------------------# set(CMAKE_CXX_STANDARD 17) set(CMAKE_VERBOSE_MAKEFILE ON) set(CMAKE_CXX_EXTENSIONS OFF) # ------------ Download CPM CMake Script ----------------# ## Automatically donwload and use module CPM.cmake file(DOWNLOAD https://raw.githubusercontent.com/TheLartians/CPM.cmake/v0.26.2/cmake/CPM.cmake "${CMAKE_BINARY_DIR}/CPM.cmake") include("${CMAKE_BINARY_DIR}/CPM.cmake") #----------- Add dependencies --------------------------# find_package(PythonLibs REQUIRED) CPMAddPackage( NAME pybind11 URL https://github.com/pybind/pybind11/archive/v2.5.zip DOWNLOAD_ONLY true ) # add_subdirectory( {pybind11_SOURCE_DIR} ) include_directories( ${pybind11_SOURCE_DIR}/include ${PYTHON_INCLUDE_PATH} ) message( [TRACE] " pybind11_SOURCE_DIR = ${pybind11_SOURCE_DIR} ") # configure_file(script.py ${CMAKE_BINARY_DIR} COPYONLY ) #----------- Set targets -------------------------------# add_executable(app1 app1.cpp) target_link_libraries( app1 ${PYTHON_LIBRARIES} ) add_custom_command( TARGET app1 POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_SOURCE_DIR}/script.py ${CMAKE_CURRENT_BINARY_DIR}/script.py)
File: app.cpp
#include <iostream> #include <string> #include <sstream> #include <fstream> #include <cassert> #include <vector> #include <pybind11/embed.h> namespace py = pybind11; // Requires: <string>, <stream>, <sstream> std::string readFile(std::string const& file) { auto is = std::ifstream(file); if( !is.good() ){ throw std::runtime_error("Error: stream has errors."); } std::stringstream ss; ss << is.rdbuf(); return ss.str(); } int main(int argc, char** argv) { extern std::string pycode; auto guard = py::scoped_interpreter{}; // ------ EXPERIMENT 1 ------------------------------------// std::puts(" [EXPERIMENT 1] ===== Execute Python code ======================\n"); auto code = readFile("./script.py"); // std::cout << " code = " << code << "\n"; auto g = py::globals(); // Define global variables for the interpreter global scope g["game_assets"] = "/Users/myuser/game_assets"; g["speed"] = 20.151; g["z_value"] = 100; // Evaluate Python code try { py::exec( code, g ); } catch(pybind11::error_already_set const& ex) { std::cerr << " [PYBIND11 ERROR] " << ex.what() << std::endl; return EXIT_FAILURE; } // ------ EXPERIMENT 2 ------------------------------------// std::puts("\n [EXPERIMENT 2] == Read user defined configuration variables ===\n"); auto v_x = g["x"].cast<double>(); auto v_path = g["path"].cast<std::string>(); std::cout << " [*] v_x = " << v_x << std::endl; std::cout << " [*] v_path = " << v_path << std::endl; return EXIT_SUCCESS; } // ----- Internal Python Embedded Module ---------------------------// const char* version() { return "SampleModule Version 3.451-ZETA"; } // Sample "function-object class" class LinearFunctor { public: double A = 0, B = 0; LinearFunctor(); LinearFunctor(double a, double b): A(a), B(b){ } double GetA() const { return A; } void SetA(double a) { A = a; } double GetB() const { return B; } void SetB(double b) { B = b; } void show() const { std::cout << " LinearFunction: y(x) = A * x + B" << std::endl; std::cout << " => A = " << this->A << " ; B = " << this->B << std::endl; } std::string toString() const { std::stringstream ss; ss << " LinearFunction: y(x) = A * x + B" << std::endl; ss << " => A = " << this->A << " ; B = " << this->B << std::endl; return ss.str(); } // Function-call operator double operator()(double x) { return A * x + B; } }; // ---- Internal Module -----------------------// PYBIND11_EMBEDDED_MODULE(SampleModule, m) { // optional module docstring m.doc() = "Sample Python built with C++ CeePlusPlus "; m.def("version", &version, "Show Library Version"); m.def("cppLambda" ,[](double x, double y){ return 3.0 * x + y;} ,"A C++ lambda object or functor" //,py::arg("x"), py::args("y") = 15 ); // Register LinearFunction py::class_<LinearFunctor>(m, "LinearFunctor") .def(py::init<double, double>()) // Register overloaded consructor .def("GetA", &LinearFunctor::GetA) // Reister method GetA() .def("GetB", &LinearFunctor::GetB) // Register method GetB() .def("SetA", &LinearFunctor::SetA) // Reister method GetA() .def("SetB", &LinearFunctor::SetB) // Register method GetB() .def("show", &LinearFunctor::show) // Register method show .def("call", &LinearFunctor::operator()) // Register function-call operator with name 'call' .def("__call__", &LinearFunctor::operator ()) // Register fun-call operator .def("__repr__", &LinearFunctor::toString) // Register strin representation .def_readwrite("A", &LinearFunctor::A) // Register field A .def_readwrite("B", &LinearFunctor::B); // Register field B } /** --- End of PYBIND11_MODULE registration --- */
File: script.py
print(" => game_assets = ", game_assets) print(" => speed = ", speed) print(" => z_value = ", z_value) x: float = 10.0 * 20.51 / 200 path = "C:\\\\Users\\dummy\\Documents\\data" print(" [PYTHON] The value of x = ", x) for i in range(5): print(" [PYTHON] i = ", i) # It is not possible to restrict the interpreter! import os print(" [*] Current path = ", os.getcwd() ) print("\n ------------------------------------------") print("\n =>>> Test Python Internal Module (C++) <<=\n") import SampleModule as m from SampleModule import LinearFunctor print(f" -> Module Information = [{m.__doc__}] ") print( " -> Module Version = ", m.version()) print( " -> m.cppLambda(100, 25) = ", m.cppLambda(100, 25) ) functor = LinearFunctor(8.0, -10.0) print(f"\n C++ Functor -> ${functor} ") print(" functor(5.0) = ", functor(5.0)) print(" functor(8.0) = ", functor(8.0))
Build and running:
$ cd /tmp $ git clone https://gist.github.com/d7fda02034757374a0b0114e54c7daff python-embed-script $ cd python-embed-script $ cmake -H. -B_build -DCMAKE_BUILD_TYPE=Debug $ cmake --build _build --target
Program output:
$ _build/app1
[EXPERIMENT 1] ===== Execute Python code ======================
=> game_assets = /Users/myuser/game_assets
=> speed = 20.151
=> z_value = 100
[PYTHON] The value of x = 1.0255
[PYTHON] i = 0
[PYTHON] i = 1
[PYTHON] i = 2
[PYTHON] i = 3
[PYTHON] i = 4
[*] Current path = /home/mxpkf8/temp-projects/python-embed-script
------------------------------------------
=>>> Test Python Internal Module (C++) <<=
-> Module Information = [Sample Python built with C++ CeePlusPlus ]
-> Module Version = SampleModule Version 3.451-ZETA
-> m.cppLambda(100, 25) = 325.0
C++ Functor -> $ LinearFunction: y(x) = A * x + B
=> A = 8 ; B = -10
functor(5.0) = 30.0
functor(8.0) = 54.0
[EXPERIMENT 2] == Read user defined configuration variables ===
[*] v_x = 1.0255
[*] v_path = C:\\Users\dummy\Documents\data
1.11 Scheme-like lisp interpreter
1.11.1 Overview
Scheme is a simple dialect of lisp with several functional-programming features. Functions are first class citizens, they can be passed as argument to other functions and returned from functions. Everything is an expression and evaluates to something, even assignments and if-else statements. Some versions of scheme also feature tail-call optimization that allow writing tail-recursive functions to be converted to loop without creating excessive stack frames.
This section presents a lisp-like interpreter based on Scheme, implemented using modern C++ features and object oriented design patterns with a handwritten parser and lexer. This lisp interpreter is based on lispy, a lisp interpreter written in Python. Just like lispy, this implementation is easy to understand and can be easily be embedded in C++ applications for adding scripting capabilities or ursing S-expressions as a DSL (Domain-Specific Language) or data description language.
Techniques used in this implementation:
- Smart pointers are used for memory managment.
- Composite design pattern for representing the AST
- Visitor design pattern is used for traversing the AST and implementing AST printing. In a functional programming language with proper sum types (also known as variants or discriminated unions), the visitor could be replaced by pattern matching.
- Inheritance is used for representing sum types or discriminated unitons, common in functional programming languages. The AST (Abstract Syntax Tree) nodes, including atoms and lists, are represented by classes inheriting from IEXpr abstract class.
- The interpreter uses direct recursive evaluation of the AST just like the inspiration lispy.
- Lists are not implemented using linked lists, instead they are implemented using std::vector container.
- Besides the recursive abstract syntax tree evaluation implementation, the interpreter could also be implemented using a SECD (stack-environment-control-dump) virtual machine.
- This implementation uses the fn keyword instead of lambda; def instead of define for creating named functions; and set instead of define for defining variables.
Possible implementation techniques of Lisp-like languages:
- Recursive AST (Abstract Syntax Tree) evaluation
- SECD (stack-environment-control-dump) virtual/abstract machine
- Meta-circular evaluator => Implement lisp-like dialects on top of an existing lisp implementation such as Common Lisp.
- Compile lisp dialect to bytecodes of an existing virtual machine including, JVM (Java Virtual Machine); CLR (Common Language Runtime) - .NET virtual machine; parrot virtual machine or WASM - Web Assembly.
- JIT (Just-In-Time) compiler => Compile itself to machine code at runtime for improving performance and speed. This technique is used by several common lisp implementations.
- Transpile S-expressions to another language, including C or C++.
Lisp terminology:
- Sexp - S-Expression
- Stands for symbolic expressions
- Homoiconicity
- Code is data. The lisp code represents the AST (Abstract Syntax Tree) that is just nested lists of lists or atoms.
- atoms
- Everything that is not a list is an atom. An indivisible value, including strings, symbols, keywords, numbers and so on.
- car - comes from 'Contents of the Address part of the Register'
- cdr - comes from 'Contents of the Decrement part of the Register'
- cons - abbreviation of the word 'construct'
- Lisp 1 =>> There is a single namespace for variables and functions. Example: Scheme variants.
- Lisp 2 =>> There are separate namespaces for variables and functions. Example: Common Lisp and Elisp - Emacs' lisp.
- Special forms
- They are forms or primitives not evaluated as functions, instead arguments may not be evaluated at all. Most special forms are primitives or control structures such as 'if', 'set', 'define', 'lambda' and so on. For instance the 'if' special form, that has three arguments, the first argument is a predicate, the second argument is the action that happens when the predicate is true and the third argument is the action that happens when the predicate is false. If the predicate is true, the second argument is not evaluated and the if expression is evaluated to the value of the first argument. If the predicate is false, the first argument is not evaluated and the if expression is evaluated to the value of the second argument.
- DSL - Domain Specific Lanaguage
- (if <PREDICATE> <THEN-EXPR> <ELSE-EXPR> )
- The <THEN-EXPR> or then-expression is evaluated if the predicate evaulates to true that happens when the predicate does not evaluate to #f or nil. If the predicate is evaluated to any other value, the else-expression is evaluated.
- User-defined procedures
- User-defined functions using the keyword fn (equivalent to lambda) or def equivalent to Scheme's define keyword.
- Primitive procedures
- Built-in functions or procedures, for instance (+), (*), apply, map, list and so on.
- Literals
- Boolean literals => #t or #f - in this implementation
- nil literal: nil => For designating empty return value.
- symbol literal: 'a-symbol
- keyword literal: :keyword1, :keyword2
- String literal - between quotes: "my string"
Known uses of Lisp-like languages as configuration language, scripting language, embedded scripting language or extension language:
- Emacs text editor - uses its own Lisp dialect ELisp as a configuration language and scripting language. Note: Elisp is older than Scheme and Common Lisp.
- GNU Guix functional package manager that uses GNU GUile as a configuration and scripting language. GNU Guix solves the Linux dependency hell problem by allowing reproducible installation of any package. This package manager also allows multiple versions of the same software to cohexis.
- Autocad® from Autodesk - CAD (Computer Aided Design) software for engineering drawing that uses Autolisp, its own lisp dialect based on common lisp, as an embedded scripting language.
- GNU Gimp - Image editor that uses TinyScheme as embedded scripting language.
- Apple Sandbox - Uses TinyScheme as configuration language.
C++ Features useful for implementation of lexers, AST (Abstract Syntax Trees) and interpreters:
- std::shared_ptr
- std::basic_istream and std::istream
- std::getline()
- std::vector container
- std::string container
- std::map container
- std::function container
- Enum class (scoped enumeration)
- C++ Core Guidelines: Rules for Enumerations - ModernesCpp.com
- c++ - Why is enum class preferred over plain enum? - Stack Overflow
- std::static_pointer_cast (downcasting shared pointers)
- std::make_shared()
- std::enable_shared_from_this - cppreference.com
- Solution for Multiple enable_shared_from_this in Inheritance Tree - CodeProject
- Using enabled_shared_from_this. enable_shared_from_this allows to…
- Remarks on enable_shared_from_this - Christian Aichinger's posts
Typical AST - Abstract Syntax Tree of Lisp/Scheme implementations:
Expr - Expression | | +--------------+-------------------------+ | | Cons(Expr, Expr) Atom - anything that is not a pair (cons cell) (cons pair, | cons linked list +--------+--------+---+------+-----------+---------+--------------+ or cons cell ) | | | | | | | Nil Number String Boolean Symbol Keyword Function literal literal Literal Literal | -------------+------------- | | Native Lisp Function Function implemented in Lisp data Expr = -- Cons cell Cons Expr, Expr -- Nil literal - represents an empty value | Nil -- Boolean literal | Bool Bool -- Number literal | Num Double -- String literal | Str String -- Symbol | Sym String -- keyword | Key String -- Native function written in the hosting language | NativeFun ([Expr] -> Expr) -- Function written in Lisp | LispFunc (environment: Dict<String,Expr>, args: <String>, name: <String>, call: [Expr] -> Expr) Example =>> The pair or cons cell '( hello . 100) is reprented as: Cons( Sym("hello"), Num(100)) Example ->> The list (1 hello "world") is represented as a linked list in the following format: Cons( Num(1), Cons( Sym("Hello"), Cons( Str("hello"), Nil )))
AST - Abstract Syntax Tree used in this implementation:
Expression | | +-----------+------------------------+ | | | | List of Expr Atom - anything that is not a list | +-------+---------+------+--+---+--------------+----------+ | | | | | | | Nil Number String Boolean Symbol Keyword Function Literal Literal Literal Literal :keyword object #t (true) | #f (false) +----------+---------------+ | | Native Function Lisp Function written in C or C++ written in Lisp or in the hosting language Haskell-like notation: data Expr = -- List data structure - can be represented by a linked list or vector List [Expr] -- Nil literal - represents an empty value | Nil -- Boolean literal | Bool Bool -- Number literal | Num Double -- String literal | Str String -- Symbol | Sym String -- keyword | Key String -- Native function written in the hosting language | NativeFun ([Expr] -> Expr) -- Function written in Lisp | LispFunc (environment: Dict<String,Expr>, args: <String>, name: <String>, call: [Expr] -> Expr)
EBNF Grammar:
sexp: atom | list ; list: "(" sexp* ")" ; atom: SYMBOL | KEYWORD // Example: :x, :keyword | NUMBER | STRING | BOOLEAN | NIL ; NIL: "nil"; BOOLEAN: "true" | "false";
1.11.2 Code
File: code.lisp - Sample Lisp scripting code
; Lisp test code ;-------------------------------- (comment "Comment special form is evaluated to nil and discarded.") ;; Create a sample list (set mylist (list 10 #t #f "hello world" 'hello 'world ; Operation 1 :keyword1 (+ 10 25 6) '(+ 10 25 6) ; Operation 2 :keyword2 (* 4 2 6) '(* 4 2 6) )) ; Factorial function (set code-fact " ; Factorial function (def fact (n) (if (= n 1) 1 ;; Base case (* n (fact (- n 1))) ;; Recursion case )) ") ;; S-expression for fibbonaci function (def fib(n) (if (< n 2) n (+ (fib (- n 1)) (fib (- n 2)) ) )) (def make-adder (k) (fn (x) (+ k x))) ;; Define a function that adds 10 to a number (set add10 (make-adder 10)) ;; Defines a function that adds 10 to a number (set add50 (make-adder 50)) (set func (let ( (a (sqrt 125)) (b (+ 20 a)) (c (* a b)) ) (comment "Create a function using lexical scope. The variable (a, b and c) are not visible outside the function. ") (fn (x) (/ (+ a b x) c)) ) ) ;; Compute many trigonometric properties of an angle (def compute-trig (angle-degrees) (let ( (angle (/ (* angle-degrees PI) 180) ) ) (list :angle angle-degrees :cos (cos angle) :sin (sin angle) :tan (tan angle)) ) )
File: gui-dsl.lisp - Hypothetical configuration file written in Lisp that represents a html-like user interface.
;; Theoretical GUI DSL (Domain Specific Language) (div :class "myclass-csss-style" :bgcolor "#ff81a" ;; <p>Paragraph 1</p> (p "Paragraph 1 ") ;; <img src="http://..../myimage.png" /> (img :src "http://www.mydomain/images/myimage.png" :caption "Image 1 data") ;; <button id="btn-submit" class="my-style-button" >Submit</button> (button :label "Submit" :id "btn-submit" :class "my-style-button" :onclick (fn (event) (display "Button was clicked") )) )
File: cpplisp.cpp (Lisp-like interpreter) - about 1600 lines of code.
#include <iostream> #include <string> #include <sstream> #include <fstream> #include <cctype> #include <optional> #include <cassert> #include <memory> #include <vector> #include <map> #include <iomanip> #include <functional> #include <cmath> #include <stack> enum class TokenType { SYM // Symbol , KEYW // Keyword Examples => :keyword, :x , STR // String literal , NUM // Number literal , BOOL // Boolean , QUOTE // Quote , RPAREN // '(' Right parenthesis , LPAREN // ')' Left parenthesis , EOFF // End of File (It was named 'EOFF' since it conflicts with #define 'EOF') , ERR // Indicates error }; struct Token { TokenType type; std::string text; int pos; int lin; int col; Token(): type(TokenType::ERR), text(""), pos(0), lin(0), col(0) {} Token(TokenType type_, std::string const& text_, int pos_, int lin_, int col_): type(type_), text(text_), pos(pos_), lin(lin_), col(col_) { } bool isEOF() { return type == TokenType::EOFF; } }; bool isNumber(std::string const& text) { if( text[0] != '-' && !std::isdigit(text[0]) ) { return false; } auto n = text.length(); for(size_t k = 1; k < n; k++){ if( !std::isdigit( text[k] ) ){ return false;} } return true; } class Tokenizer { std::istream& _is; // Position of cursor in the inout stream int _pos = 0; // Current line in the input stream int _lin = 0; // Column in the input stream int _col = 0; public: Tokenizer(std::istream& is): _is(is){ } // Read a single character and consumes it from the stream char next() { char chr = _is.get(); _pos++; _col++; // Assumes that the new line character is just '\n' if( chr == '\n' ){ _col = 0; _lin++; } return chr; } // Read next character from input string without reading it char peek(){ return _is.peek(); } bool isBlankChar(){ char ch = peek(); return ch == ' ' || ch == '\t' || ch == '\r' || ch == '\n'; } bool isParenthesis(){ char ch = peek(); return ch == '(' || ch == ')'; } // Is end of line bool isEOL(){ return peek() == '\n'; } bool isEOF(){ return _is.eof(); } void skipBlankChars() { while( isBlankChar() ){ next(); } } void skipLineComment(){ char ch = peek(); while( ch != '\n' && !this->isEOF() ){ ch = next(); } } // Consume stream until a blank character is found std::string consumeUntilBlank() { // std::fprintf(stderr, " [TRACE] Consuming until blank \n"); char ch = peek(); std::string text; while( !this->isBlankChar() && !this->isParenthesis() && !this->isEOF()) { ch = this->next(); // fprintf(stderr, " [TRACE] ConsumeUntilBlank ch = %d \n", ch); text = text + ch; } // std::fprintf(stderr, " [TRACE] consumeUntilBlank = %s \n", text.c_str()); return text; } std::vector<Token> readTokens() { std::vector<Token> out{}; Token tok; while( !tok.isEOF() ){ tok = this->nextToken(); if( !tok.isEOF() ){ out.push_back(tok); } } auto stack = std::stack<Token>{}; // Check if parentheses are balanced for(auto const& tok: out) { // Push left parentheses to the stack if(tok.type == TokenType::LPAREN){ stack.push(tok); } // If the stack is not empty and current parenthesis is a // right parentheses, if the top of stack is left parentheses // then pop the stack. if(tok.type == TokenType::RPAREN ) { if( !stack.empty() && stack.top().type == TokenType::LPAREN ) { stack.pop(); } else { throw std::runtime_error("Error: unbalanced parentheses."); } } } if( !stack.empty() ){ auto tok = stack.top(); std::fprintf(stderr, " [ERROR] Unbalanced parentheses at line %d and column %d \n", tok.lin, tok.col); throw std::runtime_error("Error: unbalanced parentheses."); } return out; } Token nextToken() { // std::fprintf(stderr, " [TRACE] Removing blank characters \n"); skipBlankChars(); char ch = next(); // std::fprintf(stderr, " [TRACE] nextChar = %c \n", ch); if( this->isEOF() ){ return Token(TokenType::EOFF, "", _pos, _lin, _col); } // Skip comments if( ch == ';' ){ // std::fprintf(stderr, " [TRACE] Skiping comment \n"); skipLineComment(); // std::fprintf(stderr, " [TRACE] Skiping comment 2 \n"); return nextToken(); } if( ch == '(') { return Token(TokenType::LPAREN, "(", _pos, _lin, _col); } if( ch == ')') { return Token(TokenType::RPAREN, ")", _pos, _lin, _col); } if( ch == '\'') { return Token(TokenType::QUOTE, "'", _pos, _lin, _col); } // Boolean value if( ch == '#'){ auto c = this->next(); auto text = std::string("#") + c; if( c!= 'f' && c !='t' ){ return Token(TokenType::ERR, std::string("Error - invalid boolean literal => ") + text, _pos, _lin, _col); } return Token(TokenType::BOOL, text, _pos, _lin, _col); } // String literal if( ch == '"' ) { // std::fprintf(stderr, " [TRACE] Reading string literal \n"); std::string out = ""; int col = _col; int pos = _pos; int lin = _lin; char c = 'x'; while( c != '"' && !this->isEOF() ) { c = this->next(); if(c == '"'){ break; } out = out + c; } if( c!= '"'){ return Token(TokenType::ERR, "Error - non closed quote.", _pos, _lin, _col); } else { return Token(TokenType::STR, out, pos, lin, col); } } // assert( ) int col = _col; int pos = _pos; int lin = _lin; std::string text = ch + this->consumeUntilBlank(); if( text == ":"){ return Token(TokenType::ERR, "illegal terminating character after a colon:" ,pos, lin, col); } // Keyworkd if( text[0] == ':' ) { // std::fprintf(stderr, " [TRACE] Tokenizer =>> keyword = %s \n", text.c_str() ); return Token(TokenType::KEYW, text, pos, lin, col); } if( text == "-"){ return Token(TokenType::SYM, text, pos, lin, col); } if( std::isdigit(text[0]) ) { if( isNumber(text) ) { return Token(TokenType::NUM, text, pos, lin, col); } else { return Token(TokenType::ERR, std::string("Error - invalid number => ") + text, pos, lin, col); } } else if( text[0] == '-' ) { if( isNumber(text) ) { return Token(TokenType::NUM, text, pos, lin, col); } else { return Token(TokenType::ERR, std::string("Error - invalid number => ") + text, pos, lin, col); } } // Anything else is a symbpl return Token(TokenType::SYM, text, pos, lin, col); } }; std::ostream& operator<<(std::ostream& os, TokenType type) { if(type == TokenType::BOOL) { os << "[BOOL]"; } if(type == TokenType::EOFF) { os << "[EOF]"; } if(type == TokenType::STR) { os << "[STR]"; } if(type == TokenType::LPAREN) { os << "[LPAREN]"; } if(type == TokenType::RPAREN) { os << "[RPAREN]"; } if(type == TokenType::NUM) { os << "[NUM]"; } if(type == TokenType::SYM) { os << "[SYM]"; } if(type == TokenType::ERR) { os << "[ERR]"; } return os; } enum class ExprType { ERR // Error , NUM // Number literal , STR // String literal , BOOL // Boolean literal , SYM // Symbol , KEY // Keyword , NIL // Nil or Null value , FUN // Function , LST // List literal }; // Forward declaration struct ExprNil; struct ExprStr; struct ExprNum; struct ExprSym; struct ExprKey; struct ExprErr; struct ExprBool; struct ExprFun; struct ExprLst; // Visitor design pattern struct IVisitor{ public: virtual ~IVisitor() = default; virtual void visit(ExprNil& expr) = 0; virtual void visit(ExprStr& expr) = 0; virtual void visit(ExprNum& expr) = 0; virtual void visit(ExprSym& expr) = 0; virtual void visit(ExprKey& expr) = 0; virtual void visit(ExprBool& expr) = 0; virtual void visit(ExprErr& expr) = 0; virtual void visit(ExprFun& expr) = 0; virtual void visit(ExprLst& expr) = 0; }; // Lisp-like AST - Abstract syntax tree // IEXpr stands for Interface Expression struct IExpr { virtual ~IExpr() = default; virtual ExprType type() const = 0; // Returns true if AST node is ATOM virtual bool isAtom() const { return true; } // Returns true if AST node is list virtual bool isList() const { return false; } // Returns true if AST node is error virtual bool isErr() const { return false; } // Returns true if AST node is symbol virtual bool isSym() const { return false; } // Returns true if AST node is string virtual bool isStr() const { return false; } // Returns true if AST node is keyword virtual bool isKey() const { return false; } // Returns true if AST node is number virtual bool isNum() const { return false; } // Returns true if AST node is boolean virtual bool isBool() const { return false; } // Returns true if AST node is nil virtual bool isNil() const { return false; } // Returns true if AST node is function virtual bool isFun() const { return false; } /// Returns the string value of a node virtual std::string strValue() const = 0; virtual double numValue() const = 0; // Evaluates to false if nil or #f and evaluates to true if it is anything else. virtual bool boolValue() const = 0; // Method for accepting visitor (Related to visitor design pattern) virtual void accept(IVisitor& v) = 0; // Evaluate AST node and return a value (AST Node) virtual std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) = 0; }; // Represents a null value (empty value) equivalent to void // In Lisp just as other functional-like programming languages // everything is evaluated to something and can be replaced be a value. struct ExprNil: public IExpr { ExprNil(){ } ExprType type() const { return ExprType::NIL; } bool isNil() const { return true; } std::string strValue() const { return "";} double numValue() const { return 0; } bool boolValue() const { return false;} void accept(IVisitor& v) { v.visit(*this); } virtual std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { return std::make_shared<ExprNil>(); } }; // Represents a string literal struct ExprStr: public IExpr { std::string value; ExprStr(std::string const& value_): value(value_){ } ExprType type() const { return ExprType::STR; } std::string strValue() const { return value;} double numValue() const { return -1; } bool boolValue() const { return true;} bool isStr() const { return true; } void accept(IVisitor& v) { v.visit(*this); } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { return std::make_shared<ExprStr>(this->value); } }; // Represents a number struct ExprNum: public IExpr { double value; ExprNum(double value_): value(value_){ } ExprType type() const { return ExprType::NUM; } bool isNum() const { return true; } std::string strValue() const { return "";} double numValue() const { return value; } bool boolValue() const { return true;} void accept(IVisitor& v) { v.visit(*this); } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { return std::make_shared<ExprNum>(this->value); } }; // Represents a boolean literal (logical value) struct ExprBool: public IExpr { bool value; ExprBool(bool value_): value(value_){ } ExprType type() const { return ExprType::BOOL; } bool isBool() const { return true; } std::string strValue() const { return "";} double numValue() const { return -1; } bool boolValue() const { return value;} void accept(IVisitor& v) { v.visit(*this); } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { return std::make_shared<ExprBool>(this->value); } }; // Represents a lisp symbol struct ExprSym: public IExpr { std::string value; ExprSym(std::string value_): value(value_){ } ExprType type() const { return ExprType::SYM; } bool isSym() const { return true; } std::string strValue() const { return value;} double numValue() const { return -1; } bool boolValue() const { return true;} void accept(IVisitor& v) { v.visit(*this); } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { auto it = env.find(value); if (it == env.end()) { throw std::runtime_error(" [ERROR] Unbound variable " + value); //return std::make_shared<ExprErr>(10, "Error - unbound variable " + value); // return nullptr; } return it->second; } }; // Represents a lisp keyword such as :x, :y, :keyword1 struct ExprKey: public IExpr { std::string value; ExprKey(std::string value_): value(value_){ } ExprType type() const { return ExprType::KEY; } bool isKey() const { return true; } std::string strValue() const { return value;} double numValue() const { return -1; } bool boolValue() const { return true;} void accept(IVisitor& v) { v.visit(*this); } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { return std::make_shared<ExprKey>(this->value); } }; // Represents an Error (Reserved for future use) struct ExprErr: public IExpr { int code; std::string info; ExprErr(){} ExprErr(int code_, std::string const& info_) : code(code_), info(info_) { } ExprType type() const { return ExprType::ERR; } bool isErr() const { return true; } std::string strValue() const { return info;} double numValue() const { return -1; } bool boolValue() const { return false;} void accept(IVisitor& v) { v.visit(*this); } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { return std::make_shared<ExprErr>(code, info); } }; using LispFunc = std::function<std::shared_ptr<IExpr> (std::vector<std::shared_ptr<IExpr>> const& args)>; using LispEnv = std::map<std::string, std::shared_ptr<IExpr>>; // Represents a function struct ExprFun: public IExpr { std::string name; std::string doc; // std::vector<std::string> args; // std::shared_ptr<IExpr> body; // ExprFun(std::vector<std::string> const& args_): args(args_) { } ExprType type() const { return ExprType::FUN; } bool isFun() const { return true; } double numValue() const { return -1; } bool boolValue() const { return true;} virtual bool isNativeFun() const = 0; void accept(IVisitor& v) { v.visit(*this); } virtual std::shared_ptr<IExpr> call( std::vector<std::shared_ptr<IExpr>>& params , std::map<std::string, std::shared_ptr<IExpr>>& env_) = 0; ~ExprFun() = default; }; // Non-native lisp function implemented in lisp struct ExprFunLisp: public ExprFun { std::vector<std::string> args; std::vector<std::shared_ptr<IExpr>> body; std::map<std::string, std::shared_ptr<IExpr>> env; std::string strValue() const { return "(function) " + this->name;} bool isNativeFun() const { return false; } ExprFunLisp(){} ExprFunLisp(std::vector<std::string> const& args_, std::vector<std::shared_ptr<IExpr>> const& body_) : args(args_), body(body_) { } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { // Invoke copy constructor return std::make_shared<ExprFunLisp>(*this); } std::shared_ptr<IExpr> call( std::vector<std::shared_ptr<IExpr>>& params ,std::map<std::string, std::shared_ptr<IExpr>>& env_) { if( params.size() != args.size() ) { throw std::runtime_error("Error: Function call error. "); } // Create local environment auto envv = std::map<std::string, std::shared_ptr<IExpr>>{}; // Copy global environment to temporary environment for(auto& k : env_){ envv[k.first] = k.second; } // Copy local environment to temporary environment for(auto& k : env){ envv[k.first] = k.second; } // Pass parameters to local environment for(size_t n = 0; n < args.size(); n++ ) { auto a = args[n]; envv[a] = params[n]; } auto N = body.size(); for(size_t n = 0; n < N - 1; n ++ ) { body[n]->eval(envv); } auto result = body[N - 1]->eval(envv); if( result->isFun() && !std::static_pointer_cast<ExprFun>(result)->isNativeFun() ) { auto func = std::static_pointer_cast<ExprFunLisp>(result); for(size_t n = 0; n < args.size(); n++ ) { auto a = args[n]; func->env[a] = params[n]; } } return result; } }; // Native function implemented in C or C++ struct ExprFunNative: public ExprFun { LispFunc func; std::string strValue() const { return "(native function) " + this->name;} bool isNativeFun() const { return true; } ExprFunNative(LispFunc func_): func(func_){ } std::shared_ptr<IExpr> eval( std::map<std::string, std::shared_ptr<IExpr>>& env) { // Invoke copy constructor return std::make_shared<ExprFunNative>(func); } std::shared_ptr<IExpr> call( std::vector<std::shared_ptr<IExpr>>& params ,std::map<std::string, std::shared_ptr<IExpr>>& env_) { // std::fprintf(stderr, " [TRACE] Calling native function \n"); return this->func(params); } }; // Represents a list struct ExprLst: public IExpr { std::vector<std::shared_ptr<IExpr>> lst; ExprLst(std::vector<std::shared_ptr<IExpr>> const& lst_) : lst(lst_){ } ExprType type() const { return ExprType::LST; } bool isAtom() const { return false; } bool isList() const { return true; } std::string strValue() const { return "";} double numValue() const { return -1; } bool boolValue() const { return true;} void accept(IVisitor& v) { v.visit(*this); } std::shared_ptr<IExpr> eval(std::map<std::string, std::shared_ptr<IExpr>>& env) { if (lst.size() == 0) { return std::make_shared<ExprNil>(); } auto type = lst[0]->type(); if ( type != ExprType::SYM && type != ExprType::LST ) { throw std::runtime_error(" [ERROR] =>> Object is not applicable."); } std::string symbol = lst[0]->strValue(); // Ignore comment special form if(symbol == "comment") { return std::make_shared<ExprNil>(); } if (symbol == "set") { if (lst.size() != 3) { throw std::runtime_error("Ill-formed syntax for SET special form"); } if (lst[1]->type() != ExprType::SYM) { throw std::runtime_error("Ill-formed syntax for SET special form. Expected symbol. "); } auto res = lst[2]->eval(env); std::string name = lst[1]->strValue(); std::fprintf(stderr, " [TRACE] Defining symbol = %s . \n", name.c_str()); env[name] = res; //;std::make_shared<EnvVal>(res); return std::make_shared<ExprNil>(); } // Handle special form (if <cond> <then-expr>) or (if <cond> <then-expr> <else-expr>) if( symbol == "if" ) { if( lst.size() != 3 && lst.size() != 4) { throw std::runtime_error("Ill-formed syntax for IF special form"); } auto cond = lst[1]; auto cond_result = cond->eval(env); if( lst.size() == 3 && !cond_result->boolValue() ) { return std::make_shared<ExprNil>(); } if( lst.size() == 4 && !cond_result->boolValue() ) { // Evaluate the else-expr auto res = lst[3]->eval(env); // this->eval( *lst[3] ); return res; } assert( cond_result->boolValue() ); auto res = lst[2]->eval(env); //this->eval( *lst[2] ); return res; } // Handle quote special form if( symbol == "quote" || symbol == "$q") { if( lst.size() != 2 ) { throw std::runtime_error("Ill-formed syntax for QUOTE special form"); } auto arg = lst[1]; return arg; } // Equivalent to scheme or lisp lambda or scheme lambda // (fn (a0 a1 ... a[N-1]) (expr1) (expr2) ... (expr[K-1]) ) if( symbol == "fn" ) { if (lst.size() < 3) { throw std::runtime_error("Ill-formed syntax for lambda special form"); } if( !lst[1]->isList() ) { throw std::runtime_error("Ill-formed syntax for lambda. Expected arguments."); } // Function/lambda arguments auto args = std::static_pointer_cast<ExprLst>(lst[1]); // Function/lambda body (list of expressions) auto body = lst; // Remove first element ('lambda' symbol) body.erase(body.begin()); // Remove frst element - function arguments body.erase(body.begin()); std::vector<std::string> _args{}; for(auto& a: args->lst) { if( !a->isSym() ) { throw std::runtime_error("Ill-formed syntax for lamba. Arguments must be strings."); } _args.push_back( a->strValue() ); } auto fun = std::make_shared<ExprFunLisp>(_args, body); return fun; } /** Sepcial form def * (def function (a0 a1 ... a[N-1]) (expr1) (expr2) ... (expr[K-1]) ) * * Example: * (def addxy(x y ) * (disp x) * (disp y) * (+ x y)) * */ if( symbol == "def" ) { if (lst.size() < 4) { throw std::runtime_error("Ill-formed syntax for def() special form"); } if( !lst[1]->isSym() ) { throw std::runtime_error("Ill-formed syntax for def(). Expected function name."); } if( !lst[2]->isList() ) { throw std::runtime_error("Ill-formed syntax for def(). Expected arguments."); } // Function name auto name = lst[1]->strValue(); // Function/lambda arguments auto args = std::static_pointer_cast<ExprLst>(lst[2]); // Function/lambda body (list of expressions) auto body = lst; // Remove first element ('lambda' symbol) body.erase(body.begin()); body.erase(body.begin()); body.erase(body.begin()); std::fprintf(stderr, " [TRACE] Defining function = %s . \n", name.c_str()); std::vector<std::string> _args{}; for(auto& a: args->lst) { if( !a->isSym() ) { throw std::runtime_error("Ill-formed syntax for lamba. Arguments must be strings."); } _args.push_back( a->strValue() ); } auto fun = std::make_shared<ExprFunLisp>(_args, body); fun->name = name; env[name] = fun; return std::make_shared<ExprNil>(); } // Handle special form (begin (action1 ....) (action 2) (action N-1)) if( symbol == "begin" ) { // arguments auto args = lst; // Remove first element args.erase(args.begin()); if( args.size() == 0) { return std::make_shared<ExprNil>(); } for(size_t n = 0; n < args.size() - 1; n++) { args[n]->eval(env); } // Revaluate last element auto result = args[ args.size() -1 ]->eval(env); return result; } /** (let (binding lists) (exp1) (exp2) .... (expN-1) ) * * (let ( (a 20) * (b 30 )) * (display a) (display b) (+ a b) ) * * The previous expression evaluates to 30. */ if( symbol == "let" ) { if( lst.size() < 3 ) { throw std::runtime_error("Error: ill-formed sytax for let expression."); } if( !lst[1]->isList() ) { throw std::runtime_error("Error: ill-formed sytax for let expression. Expected list of bindings."); } auto bindings = std::static_pointer_cast<ExprLst>(lst[1])->lst; // Create a new temporary environment auto envv = std::map<std::string, std::shared_ptr<IExpr>>{}; auto envs = std::map<std::string, std::shared_ptr<IExpr>>{}; // Copy environment for(auto& k : env){ envv[k.first] = k.second; } for(auto const& b: bindings){ if( !b->isList() ) { throw std::runtime_error("Error: ill-formed sytax for let expression. Expected list as binding argument."); } auto bb = std::static_pointer_cast<ExprLst>(b)->lst; if( bb.size() != 2) { throw std::runtime_error("Error: ill-formed sytax for let expression."); } if( !bb[0]->isSym() ) { throw std::runtime_error("Error: ill-formed sytax for let expression. Expected symbol."); } auto sym = bb[0]->strValue(); auto val = bb[1]->eval(envv); // Add symbols from let binding to the new temporary environment envv[sym] = val; envs[sym] = val; } for(size_t k = 2; k < lst.size() - 1; k ++) { lst[k]->eval(envv); } auto result = lst[lst.size() - 1]->eval(envv); // Implement closure - function cature variables from local environment if( result->isFun() && !std::static_pointer_cast<ExprFun>(result)->isNativeFun() ) { auto func = std::static_pointer_cast<ExprFunLisp>(result); func->env = envs; } return result; } /** For-loop similar to common lisp => (repeat 10 n (action n)) */ if( symbol == "repeat" ) { if( lst.size() != 4 ) { throw std::runtime_error("Error: ill-formed syntax for repeat special form. Expected 4 arguments"); } if( !lst[2]->isSym() ) { throw std::runtime_error("Error: ill-formed syntax for repeat special form. Expected symbol."); } auto n = lst[1]->eval(env); if( !n->isNum() ) { throw std::runtime_error("Error: expected number as first argument."); } auto ntimes = (int) n->numValue(); if( ntimes < 0) { throw std::runtime_error("Error: ill-formed syntax for repeat special form . Iterations cannot be negative."); } auto sym = lst[2]->strValue(); auto action = lst[3]; // Copy environment auto envv = env; for(int k = 0; k < ntimes; k++) { envv[sym] = std::make_shared<ExprNum>(k); action->eval(envv); } return std::make_shared<ExprNil>(); } // ---------- Function Application --------------------// auto it = lst[0]->eval(env); if( !it->isFun() ){ throw std::runtime_error("Error - object not callable."); } auto function = std::static_pointer_cast<ExprFun>(it); // Function arguments auto args = lst; // Remove first element args.erase(args.begin()); std::vector<std::shared_ptr<IExpr>> evaluatedArgs; // evaluatedArgs.reserve(args.size()); for (auto const &arg : args) { auto res = arg->eval(env); evaluatedArgs.push_back(res); } auto result = function->call(evaluatedArgs, env); return result; } }; // Parser auto readTokens(std::vector<Token>& tokens) -> std::shared_ptr<IExpr> { // auto tokens = tokens_; if( tokens.size() == 0){ throw std::runtime_error("Error: Unexpected EOF - End of File"); } auto tok = tokens[0]; // Remove first element of vector tokens.erase(tokens.begin()); // std::cout << " [readToken] " << tok.type << " ; " << tok.text // << " ; line = " << tok.lin << " ; col = " << tok.col // << "\n"; if( tok.type == TokenType::ERR ) { throw std::runtime_error("[ERROR] " + tok.text); } if( tok.type == TokenType::QUOTE ) { auto lst = std::vector<std::shared_ptr<IExpr>>{}; lst.push_back( std::make_shared<ExprSym>("quote") ); auto next = readTokens(tokens); lst.push_back(next); return std::make_shared<ExprLst>(lst); } if( tok.type == TokenType::LPAREN ) { auto lst = std::vector<std::shared_ptr<IExpr>>{}; while( tokens[0].type != TokenType::RPAREN ) { auto expr = readTokens(tokens); lst.push_back( expr ); } // Drop off ')' right parentheis tokens.erase(tokens.begin()); return std::make_shared<ExprLst>(lst); } if( tok.type == TokenType::RPAREN ) { throw std::runtime_error("Error: Unexpected ')' right parenthesis "); } if( tok.type == TokenType::NUM) { return std::make_shared<ExprNum>( std::stoi(tok.text) ); } if( tok.type == TokenType::SYM && tok.text == "nil" ) { return std::make_shared<ExprNil>(); } if( tok.type == TokenType::SYM ) { return std::make_shared<ExprSym>( tok.text ); } if( tok.type == TokenType::KEYW ) { return std::make_shared<ExprKey>( tok.text ); } if( tok.type == TokenType::BOOL ) { return std::make_shared<ExprBool>( tok.text == "#t" ? true : false ); } if( tok.type == TokenType::STR ) { return std::make_shared<ExprStr>( tok.text ); } return std::make_shared<ExprErr>(-1, "BUG Found => Edge case found in the parser."); } // Parse Lisp-like (scheme-like) S-Expression by reading a string source code auto parseSexp(std::string const& code) -> std::shared_ptr<IExpr> { std::stringstream ss; ss << code; Tokenizer tokenizer(ss); auto tokens = tokenizer.readTokens(); auto result = readTokens(tokens); return result; } // Parse Lisp-like (scheme-like) S-Expression by reading an input stream auto parseSexp(std::istream& is) -> std::shared_ptr<IExpr> { Tokenizer tokenizer(is); auto tokens = tokenizer.readTokens(); auto result = readTokens(tokens); return result; } class PrintVisitor: public IVisitor { public: void visit(ExprNil& expr) override { std::cout << "nil"; } void visit(ExprStr& expr) override { std::cout << "\"" << expr.value << "\""; } void visit(ExprNum& expr) override { std::cout << expr.value; } void visit(ExprBool& expr) override { std::cout << (expr.value ? "#t" : "#f"); } void visit(ExprSym& expr) override { std::cout << expr.value; } void visit(ExprKey& expr) override { std::cout << expr.value; } void visit(ExprErr& expr) override { std::cout << "<< [ERROR] " << expr.info; } void visit(ExprFun& expr) override { std::cout << expr.strValue(); } void visit(ExprLst& expr) override { std::cout << "( "; for(auto node : expr.lst){ std::cout << " "; node->accept(*this); } std::cout << " )"; } }; class WriterVisitor: public IVisitor { std::ostream& out; public: WriterVisitor(std::ostream& os_): out(os_){} void visit(ExprNil& expr) override { out << "nil"; } void visit(ExprStr& expr) override { out << "\"" << expr.value << "\""; } void visit(ExprNum& expr) override { out << expr.value; } void visit(ExprBool& expr) override { out << (expr.value ? "#t" : "#f"); } void visit(ExprSym& expr) override { out << expr.value; } void visit(ExprKey& expr) override { out << expr.value; } void visit(ExprErr& expr) override { out << "<< [ERROR] " << expr.info; } void visit(ExprFun& expr) override { out << expr.strValue(); } void visit(ExprLst& expr) override { // out << "\n ( "; for(auto const& node : expr.lst) { out << " "; node->accept(*this); //if( node->isList() ){ out << '\n'; } } out << " )"; } }; // Print S-Expression void printSexp(IExpr& expr) { PrintVisitor visitor; expr.accept(visitor); std::cout << std::endl; } /// Lisp interpreter's primitive functions namespace primitives { // All Lisp functions passed to the interpreter must have this type signature // Lisp function primitive 'list' => Example: (list 10 20 (+ 1 2 5) "Hello" #f #t nil) std::shared_ptr<IExpr> list(std::vector<std::shared_ptr<IExpr>> const& args) { return std::make_shared<ExprLst>(args); } // Return the number of elements of alist std::shared_ptr<IExpr> length(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: length() requires 1 argument of type list."); } if( !args[0]->isList() ) { throw std::runtime_error("Error: length() expects 1 argument of type list."); } auto node = static_cast<ExprLst const&>( *args[0] ); auto lst = node.lst; return std::make_shared<ExprNum>(lst.size()); } // car => Returns first element of the list or nil std::shared_ptr<IExpr> car(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: car() requires 1 argument of type list."); } if( !args[0]->isList() ) { throw std::runtime_error("Error: car() expects 1 argument of type list."); } auto node = static_cast<ExprLst const&>( *args[0] ); auto lst = node.lst; if( lst.size() == 0 ){ return std::make_shared<ExprNil>(); } return lst[0]; } std::shared_ptr<IExpr> cdr(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: cdr() requires 1 argument of type list."); } if( !args[0]->isList() ) { throw std::runtime_error("Error: cdr() expects 1 argument of type list."); } auto node = static_cast<ExprLst const&>( *args[0] ); auto lst = node.lst; if( lst.size() == 0 ){ return std::make_shared<ExprNil>(); } // Remove first list argument lst.erase(lst.begin()); return std::make_shared<ExprLst>(lst); } // cons => List constructor std::shared_ptr<IExpr> cons(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 ) { throw std::runtime_error("Error: cons() requires 2 arguments."); } if( !args[1]->isList() && !args[1]->isNil() ) { throw std::runtime_error("Error: pairs construction not allowed. Second argument can only be a list or nil."); } if( args[1]->isList() ) { auto lst = std::static_pointer_cast<ExprLst>( args[1] )->lst; lst.insert(lst.begin(), args[0]); return std::make_shared<ExprLst>(lst); } assert( args[1]->isNil() ); auto lst = std::vector<std::shared_ptr<IExpr>>{ args[0] }; return std::make_shared<ExprLst>(lst); } // Get the nth element of a list std::shared_ptr<IExpr> nth(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 ) { throw std::runtime_error("Error: cons() requires 2 arguments."); } if( !args[0]->isNum() ) { throw std::runtime_error("Error: Expected number "); } if( !args[1]->isList() && !args[1]->isNil() ) { throw std::runtime_error("Error: Expected list or nil."); } if( args[1]->isNil() ) { return std::make_shared<ExprNil>(); } auto n = (int) args[0]->numValue(); auto lst = std::static_pointer_cast<ExprLst>(args[1])->lst; if( n < lst.size() && n >= 0 ){ return lst[n]; } return std::make_shared<ExprNil>(); } // (= 10 20) => #f ; (= 10 10) => #t std::shared_ptr<IExpr> equal_number(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 ) { throw std::runtime_error("Error: = requires 2 arguments."); } if( !args[0]->isNum() && !args[1]->isNum() ) { throw std::runtime_error("Error: Expected number "); } auto a = args[0]->numValue(); auto b = args[1]->numValue(); return std::make_shared<ExprBool>(a == b); } std::shared_ptr<IExpr> less_than(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 ) { throw std::runtime_error("Error: = requires 2 arguments."); } if( !args[0]->isNum() && !args[1]->isNum() ) { throw std::runtime_error("Error: Expected number "); } auto a = args[0]->numValue(); auto b = args[1]->numValue(); return std::make_shared<ExprBool>(a < b); } // Parse a string returning a S-Expression std::shared_ptr<IExpr> read(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: read() requires 1 argument of type string."); } if( args[0]->type() != ExprType::STR ) { throw std::runtime_error("Error: read() expects 1 argument of type string."); } std::string code = args[0]->strValue(); auto result = parseSexp(code); return result; } // Primitive predicate function list? => Return true if the argument is of type list std::shared_ptr<IExpr> is_list(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: list?() requires 1 argument."); } return std::make_shared<ExprBool>( args[0]->isList() ); } // Primitive predicate function symbol? => Return true if the argument is of type symbol std::shared_ptr<IExpr> is_symbol(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: symbol?() requires 1 argument."); } return std::make_shared<ExprBool>( args[0]->type() == ExprType::SYM ); } std::shared_ptr<IExpr> is_keyword(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: keyword?() requires 1 argument."); } return std::make_shared<ExprBool>( args[0]->type() == ExprType::KEY ); } // Primitive predicate function number? => Return true if the argument is of type number std::shared_ptr<IExpr> is_number(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: number?() requires 1 argument."); } return std::make_shared<ExprBool>( args[0]->isNum() ); } // Primitive predicate function boolean? => Return true if the argument is of type number std::shared_ptr<IExpr> is_boolean(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: boolean?() requires 1 argument."); } return std::make_shared<ExprBool>( args[0]->type() == ExprType::BOOL ); } // Primitive predicate function nil? => Return true if the argument is of type nil std::shared_ptr<IExpr> is_nil(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: nil?() requires 1 argument."); } return std::make_shared<ExprBool>( args[0]->isNil() ); } // Primitive predicate function string? => Return true if the argument is of type string std::shared_ptr<IExpr> is_string(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: string?() requires 1 argument."); } return std::make_shared<ExprBool>( args[0]->type() == ExprType::STR ); } // Display an S-Expression std::shared_ptr<IExpr> disp(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ){ throw std::runtime_error("Error: function disp() requires 1 argument."); } PrintVisitor visitor; args[0]->accept(visitor); std::cout << "\n"; return std::make_shared<ExprNil>(); } // Lisp function (+) => Example: (+ 1 2 3 4 5) = 15 std::shared_ptr<IExpr> add(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() == 0 ){ throw std::runtime_error("Error: this function requires at least 1 argument."); } double acc = 0.0; for(auto const& a: args){ if( a->type() != ExprType::NUM){ throw std::runtime_error("Invalid argument. Expected a number"); } acc = acc + a->numValue(); } return std::make_shared<ExprNum>(acc); } // Lisp function (*) => Example: (* 1 2 3 4 5) = 120 std::shared_ptr<IExpr> mul(std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() == 0 ){ throw std::runtime_error("Error: this function requires at least 1 argument."); } double acc = 1.0; for(auto const& a: args){ if( a->type() != ExprType::NUM){ throw std::runtime_error("Invalid argument. Expected a number"); } acc = acc * a->numValue(); } return std::make_shared<ExprNum>(acc); } // List function (-) => Example: (- 100 20 30 ) = 100 - 20 - 30 = 50 std::shared_ptr<IExpr> sub(std::vector<std::shared_ptr<IExpr>> const& args) { // std::fprintf(stderr, " [TRACE] Called Lispfun sub() \n "); if (args.size() == 0) { throw std::runtime_error("Error: this function requires at least 1 argument."); } if (args.size() == 1) { if (args[0]->type() != ExprType::NUM) { throw std::runtime_error("Error: invalid argument type. Expected number"); } assert(args[0]->type() == ExprType::NUM); return std::make_shared<ExprNum>(-args[0]->numValue()); } if (args[0]->type() != ExprType::NUM) { throw std::runtime_error("Error: invalid argument type. Expected number"); } double acc = args[0]->numValue(); // std::fprintf(stderr, " [TRACE] acc = %f \n", acc); auto args_ = args; // Remove first element args_.erase(args_.begin()); for (auto const &a : args_) { if (a->type() != ExprType::NUM) { throw std::runtime_error("Invalid argument. Expected a number"); } double x = a->numValue(); // std::fprintf(stderr, " [TRACE] x = %f \n", x); acc = acc - x; } return std::make_shared<ExprNum>(acc); } std::shared_ptr<IExpr> div(std::vector<std::shared_ptr<IExpr>> const& args) { // std::fprintf(stderr, " [TRACE] Called Lispfun sub() \n "); if (args.size() == 0) { throw std::runtime_error("Error: this function requires at least 1 argument."); } if (args.size() == 1) { if (args[0]->type() != ExprType::NUM) { throw std::runtime_error("Error: invalid argument type. Expected number"); } assert(args[0]->type() == ExprType::NUM); return std::make_shared<ExprNum>(1.0 / args[0]->numValue()); } if (args[0]->type() != ExprType::NUM) { throw std::runtime_error("Error: invalid argument type. Expected number"); } double acc = args[0]->numValue(); // std::fprintf(stderr, " [TRACE] acc = %f \n", acc); auto args_ = args; // Remove first element args_.erase(args_.begin()); for (auto const &a : args_) { if (a->type() != ExprType::NUM) { throw std::runtime_error("Invalid argument. Expected a number"); } double x = a->numValue(); // std::fprintf(stderr, " [TRACE] x = %f \n", x); acc = acc / x; } return std::make_shared<ExprNum>(acc); } } class Eval { // Reference to itself in oder to provide code completion Eval& self = *this; public: using Env = std::map<std::string, std::shared_ptr<IExpr>>; // Lisp environment Env _env; Eval() { self.addFunction("disp", &primitives::disp); // Primitive list functions self.addFunction("list", &primitives::list); self.addFunction("length", &primitives::length); self.addFunction("car", &primitives::car); self.addFunction("cdr", &primitives::cdr); self.addFunction("cons", &primitives::cons); self.addFunction("nth", &primitives::nth); // Primitive predicate functions self.addFunction("number?", &primitives::is_number); self.addFunction("boolean?", &primitives::is_boolean); self.addFunction("symbol?", &primitives::is_symbol); self.addFunction("nil?", &primitives::is_nil); self.addFunction("keyword?", &primitives::is_keyword); self.addFunction("list?", &primitives::is_list); // Primtive math functions self.addFunction("=", &primitives::equal_number); self.addFunction("<", &primitives::less_than); self.addFunction("+", &primitives::add); self.addFunction("*", &primitives::mul); self.addFunction("-", &primitives::sub); self.addFunction("/", &primitives::div); self.addMath1ArgFun("sin", static_cast<double (*) (double)>(&std::sin)); self.addMath1ArgFun("cos", static_cast<double (*) (double)>(&std::cos)); self.addMath1ArgFun("tan", static_cast<double (*) (double)>(&std::tan)); self.addMath1ArgFun("sqrt", static_cast<double (*) (double)>(&std::sqrt)); self.addMath1ArgFun("inv", [](double x){ return 1.0 / x; }); self.addVariable("PI", 3.1415); // Parse string into a SExp - S-Expression self.addFunction("read", &primitives::read); // Evaluate a List (AST - Abstract Syntax Tree) or SEXP returning a SEXP (S-Expression) self.addFunction("eval", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 ) { throw std::runtime_error("Error: eval(sexp) requires 1 argument of type list."); } if( !args[0]->isList() ) { throw std::runtime_error("Error: eval(sexpr) expects 1 argument of type list."); } auto expr = static_cast<ExprLst const&>( *args[0] ); return expr.eval(this->_env); }); // load a file self.addFunction("load-file", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 && !args[0]->isStr() ) { throw std::runtime_error("Error: load-file() requires 1 argument of type string."); } auto file = args[0]->strValue(); auto expr = this->evalFile(file); return expr; }); // Read a single S-Expression from file without evaluating it self.addFunction("read-file", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 && !args[0]->isStr() ) { throw std::runtime_error("Error: eval(sexp) requires 1 argument of type string."); } auto file = args[0]->strValue(); auto ifs = std::ifstream(file); auto sexp = parseSexp(ifs); return sexp; }); // Write a single S-Expression to a file self.addFunction("write-file", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 && !args[0]->isStr() ) { throw std::runtime_error("Error: eval(sexp) requires 1 argument of type string."); } auto file = args[0]->strValue(); auto ofs = std::ofstream(file); WriterVisitor visitor(ofs); args[1]->accept(visitor); return std::make_shared<ExprNil>(); }); // Read a multiple S-Expression from file without evaluating it self.addFunction("read-file-multiple", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1 && !args[0]->isStr() ) { throw std::runtime_error("Error: eval(sexp) requires 1 argument of type string."); } auto file = args[0]->strValue(); std::ifstream is(file); if( !is.good() ){ throw std::runtime_error("Error: stream has errors."); } std::stringstream ss; ss << is.rdbuf(); auto code = "( \n" + ss.str() + "\n )"; auto sexp = parseSexp(code); return sexp; }); self.addFunction("apply", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 ) { throw std::runtime_error("Error: apply() requires 2 arguments."); } if( !args[0]->isFun() ) { throw std::runtime_error("Error: expected function as first argument."); } if( !args[1]->isList() ) { throw std::runtime_error("Error: expected list as second argument."); } auto func = std::static_pointer_cast<ExprFun>(args[0]); auto params = std::static_pointer_cast<ExprLst>(args[1]); return func->call(params->lst, this->_env); }); self.addFunction("map", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 ) { throw std::runtime_error("Error: apply() requires 2 arguments."); } if( !args[0]->isFun() ) { throw std::runtime_error("Error: expected function as first argument."); } if( !args[1]->isList() ) { throw std::runtime_error("Error: expected list as second argument."); } auto func = std::static_pointer_cast<ExprFun>(args[0]); auto params = std::static_pointer_cast<ExprLst>(args[1]); auto result = std::vector<std::shared_ptr<IExpr>>{}; auto args_ = std::vector<std::shared_ptr<IExpr>>{ std::make_shared<ExprNil>() }; result.reserve( params->lst.size() ); for(auto const& p: params->lst){ args_[0] = p; result.push_back( func->call( args_, this->_env ) ); } return std::make_shared<ExprLst>(result); }); self.addFunction("for-each", [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 2 ) { throw std::runtime_error("Error: apply() requires 2 arguments."); } if( !args[0]->isFun() ) { throw std::runtime_error("Error: expected function as first argument."); } if( !args[1]->isList() ) { throw std::runtime_error("Error: expected list as second argument."); } auto func = std::static_pointer_cast<ExprFun>(args[0]); auto params = std::static_pointer_cast<ExprLst>(args[1]); for(auto const& p: params->lst){ auto args_ = std::vector<std::shared_ptr<IExpr>>{ p }; func->call( args_, this->_env ); } return std::make_shared<ExprNil>(); }); } void addVariable(std::string const& name, double value) { auto expr = std::make_shared<ExprNum>(value); _env[name] = expr; } void addVariable(std::string const& name, std::shared_ptr<IExpr> expr) { _env[name] = expr; } void addFunction(std::string const& name, LispFunc func) { auto fn = std::make_shared<ExprFunNative>(func); fn->name = name; _env[name] = fn; } template<typename Func> void addMath1ArgFun(std::string const& name, Func funcMath) { auto func = [=](std::vector<std::shared_ptr<IExpr>> const& args) { if( args.size() != 1){ throw std::runtime_error(" [ERROR] This function requires exactly one argument."); } auto x = args[0]; if( x->type() != ExprType::NUM ){ throw std::runtime_error(" [ERROR] Invalid arguument type. Number required. "); } double res = funcMath(x->numValue()); return std::make_shared<ExprNum>(res); }; auto val = std::make_shared<ExprFunNative>(func); val->name = name; _env[name] = val; } std::shared_ptr<IExpr> eval(IExpr& expr) { return expr.eval( this->_env ); } // Interactive read-print eval loop0 void repl() { std::string line; std::string arg, command; for(;;) { std::cout << " $ lisp>> "; if( !std::getline(std::cin, line) ){ break; } if( line == "" ){ continue; } std::stringstream ss(line); ss >> command >> arg; try { if( command == ":file") { auto res = this->evalFile(arg); continue; } if( command == ":block") { std::fprintf(stderr, " [INFO] Type a multi-line s-expression and type (;;) as the last line when you are done. \n"); std::string code; while( line != ";;" ){ std::getline(std::cin, line); code = code + "\n" + line; } auto res = this->evalCode(code); self.addVariable("ans", res); printSexp(*res); continue; } auto res = this->evalCode(line); self.addVariable("ans", res); printSexp(*res); } catch(const std::exception& e) { std::cerr << e.what() << '\n'; } } } /// Eval lisp-like code given as string std::shared_ptr<IExpr> evalCode(std::string const& code) { auto expr = parseSexp(code); auto result = expr->eval(_env); return result; } std::shared_ptr<IExpr> evalFile(std::string const& file) { std::ifstream is(file); if( !is.good() ){ throw std::runtime_error("Error: stream has errors."); } std::stringstream ss; ss << is.rdbuf(); auto code = "(begin \n" + ss.str() + "\n )"; auto expr = parseSexp(code); auto result = expr->eval(_env); return result; } }; int main(){ std::cout << " [TRACE] Testing " << std::endl; const char* text = R"( ; comment1 ; comment 2 ; comment 3 ( print->list ; comment 4 " Hello world" A8 B100 ; Next comment #t #f (+ 20 30 -200(* 20 -20 x y z 8000)) ) )"; std::stringstream ss; #if 0 std::cout << " =========== Tokenizer ======================== " << std::endl; ss << text; ss << "100"; Tokenizer tokenizer(ss); Token tok; while( !tok.isEOF() ){ tok = tokenizer.nextToken(); std::cout << " => " << tok.type << " ; " << tok.text << " ; line = " << tok.lin << " ; col = " << tok.col << "\n"; } #endif auto expr = parseSexp(text); std::cout << std::boolalpha << " [TRACE] expr->isList() = " << expr->isList() << "\n"; printSexp(*expr); std::cout << " ====== Evaluate Code ======================== " << std::endl; const char* code1 = R"( (list 10 #t #f "hello world" ; Operation 1 (+ 10 25 6) ; Operation 2 (* 4 2 6) ) )"; Eval eval; auto res = eval.evalCode(code1); std::cout << " [TRACE] Result = "; printSexp(*res); std::cout << "============ Start LISP REPL ======================" << std::endl; eval.repl(); return 0; }
Building the lisp interpreter:
$ g++ cpplisp.cpp -o cpplisp.bin -g -Wall -Wextra -std=c++1z
Running:
$ >> rlwrap ./cpplisp.bin [TRACE] Testing [TRACE] expr->isList() = true ( print->list " Hello world" A8 B100 #t #f ( + 20 30 -200 ( * 20 -20 x y z 8000 ) ) ) ====== Evaluate Code ======================== [TRACE] Result = ( 10 #t #f "hello world" 41 48 ) ============ Start LISP REPL ====================== $ lisp>> (list '(+ 1 2 3 4 5) (+ 1 2 3 4 5) '(* 4 100) (* 4 100) :keyword #t #f nil "Hello world") ( ( + 1 2 3 4 5 ) 15 ( * 4 100 ) 400 :keyword #t #f nil "Hello world" ) $ lisp>> '(list (+ 1 2 3 4 5) (+ 1 2 3 4 5) '(* 4 100) (* 4 100) :keyword #t #f nil "Hello world") ( list ( + 1 2 3 4 5 ) ( + 1 2 3 4 5 ) ( quote ( * 4 100 ) ) ( * 4 100 ) :keyword #t #f nil "Hello world" ) $ lisp>> (cons 10 20) Error: pairs construction not allowed. Second argument can only be a list or nil. $ lisp>> (cons 20 nil) ( 20 ) $ lisp>> (cons 10 (cons 20 nil)) ( 10 20 ) $ lisp>> (cons "hello" (cons 10 (cons 20 nil))) ( "hello" 10 20 ) $ lisp>> $ lisp>> (cons 'world (cons "hello" (cons 10 (cons 20 nil)))) ( world "hello" 10 20 ) $ lisp>> (cons :keyword (cons 'world (cons "hello" (cons 10 (cons 20 nil))))) ( :keyword world "hello" 10 20 ) $ lisp>> (cons sqrt (cons :keyword (cons 'world (cons "hello" (cons 10 (cons 20 nil)))))) ( (native function) sqrt :keyword world "hello" 10 20 ) $ lisp>> $ lisp>> (+ 1 2 3 4 5 6) 21 $ lisp>> (* 1 2 3 4 5 6) 720 $ lisp>> (list 1 2 3 5 6 7 8) ( 1 2 3 5 6 7 8 ) $ lisp>> (set x 10) [TRACE] Defining symbol = x . nil $ lisp>> ( (if (< x 5) + *) 100 20) 2000 $ lisp>> ( (if (< x 50) + *) 100 20) 120 $ lisp>> (set code "(apply + (list 1 3 4 5 6 7))") [TRACE] Defining symbol = code . nil $ lisp>> code "(apply + (list 1 3 4 5 6 7))" $ lisp>> (set expr (read code)); read SExp - S-Expression from string [TRACE] Defining symbol = expr . nil $ lisp>> expr ( apply + ( list 1 3 4 5 6 7 ) ) $ lisp>> (eval expr) ;; Evaluate S-Expression 26 $ lisp>> :block [INFO] Type a multi-line s-expression and type (;;) as the last line when you are done. ; Execute a let-binding expression (let ( (a 10) (b (+ (* a a) 10)) (c (sqrt (+ a b))) ) (disp a) (disp b) (disp c) (list :x a :y b :result3 c :z (+ a b c))) ;; 10 110 10.9545 ( :x 10 :y 110 :result3 10.9545 :z 130.954 ) $ lisp>> a [ERROR] Unbound variable a $ lisp>> b [ERROR] Unbound variable b $ lisp>> c [ERROR] Unbound variable c $ lisp>> $ lisp>> (load-file "code.lisp") ; Load a lisp file [TRACE] Defining symbol = mylist . [TRACE] Defining symbol = code-fact . [TRACE] Defining function = fib . [TRACE] Defining function = make-adder . [TRACE] Defining symbol = add10 . [TRACE] Defining symbol = add50 . [TRACE] Defining symbol = func . [TRACE] Defining function = compute-trig . nil $ lisp>> $ lisp>> mylist ( 10 #t #f "hello world" hello world :keyword1 41 ( + 10 25 6 ) :keyword2 48 ( * 4 2 6 ) ) $ lisp>> (car mylist) 10 $ lisp>> (cdr mylist) ( #t #f "hello world" hello world :keyword1 41 ( + 10 25 6 ) :keyword2 48 ( * 4 2 6 ) ) $ lisp>> (car (cdr mylist)) #t $ lisp>> (nth 5 mylist) world $ lisp>> (add10 20) 30 $ lisp>> (add50 80) 130 $ lisp>> (map add10 (list 1 2 3 4 5 6 7 )) ( 11 12 13 14 15 16 17 ) $ lisp>> (map add50 (list 1 2 3 4 5 6 7 )) ( 51 52 53 54 55 56 57 ) $ lisp>> (map (fn (x) (* x x)) (list 1 2 3 4 5 6 7 )) ; Map a lambda ( 1 4 9 16 25 36 49 ) $ lisp>> (compute-trig 45) ( :angle 45 :cos 0.707123 :sin 0.70709 :tan 0.999954 ) $ lisp>> (compute-trig 90) ( :angle 90 :cos 4.63268e-05 :sin 1 :tan 21585.8 ) $ lisp>> (compute-trig 30) ( :angle 30 :cos 0.866033 :sin 0.499987 :tan 0.57733 ) $ lisp>> (map compute-trig (list 30 90 45)) ( ( :angle 30 :cos 0.866033 :sin 0.499987 :tan 0.57733 ) ( :angle 90 :cos 4.63268e-05 :sin 1 :tan 21585.8 ) ( :angle 45 :cos 0.707123 :sin 0.70709 :tan 0.999954 ) ) $ lisp>> (func 10) 0.1502 $ lisp>> (func 1) 0.124383 $ lisp>> a [ERROR] Unbound variable a $ lisp>> b [ERROR] Unbound variable b $ lisp>> c [ERROR] Unbound variable c $ lisp>> $ lisp>> (for-each disp (list "Hello" 'world :lisp 100 nil #t #f)) "Hello" world :lisp 100 nil #t #f $ lisp>> (set make-multiplier-function (fn (mult) (fn (x) (* mult x)))) [TRACE] Defining symbol = make-multiplier-function . nil $ lisp>> (set mult-by-10 (make-multiplier-function 10)) [TRACE] Defining symbol = mult-by-10 . nil $ lisp>> (set mult-by-6 (make-multiplier-function 6)) [TRACE] Defining symbol = mult-by-6 . nil $ lisp>> (map mult-by-10 (list 1 2 3 4 5)) ( 10 20 30 40 50 ) $ lisp>> (map mult-by-6 (list 1 2 3 4 5)) ( 6 12 18 24 30 ) $ lisp>> ( (make-multiplier-functon 4) (list 1 2 3 4 5)) [ERROR] Unbound variable make-multiplier-functon $ lisp>> (map (make-multiplier-function 4) (list 1 2 3 4 5)) ( 4 8 12 16 20 ) $ lisp>>
Read file gui-dsl.lisp:
$ lisp>> (set expr (read-file "gui-dsl.lisp")) [TRACE] Defining symbol = expr . nil $ lisp>> expr ( div :class "myclass-csss-style" :bgcolor "#ff81a" ( p "Paragraph 1 " ) ( img :src "http://www.mydomain/images/myimage.png" :caption "Image 1 data" ) ( button :label "Submit" :id "btn-submit" :class "my-style-button" :onclick ( fn ( event ) ( display "Button was clicked" ) ) ) ) $ lisp>> (car expr) div $ lisp>> (car (cdr expr)) :class $ lisp>> (nth 5 expr) ( p "Paragraph 1 " ) $ lisp>> (car (nth 5 expr)) p $ lisp>> (cdr (nth 5 expr)) ( "Paragraph 1 " ) $ lisp>> (car (cdr (nth 5 expr))) "Paragraph 1 " $ lisp>> (nth 6 expr) ( img :src "http://www.mydomain/images/myimage.png" :caption "Image 1 data" ) $ lisp>> (nth 7 expr) ( button :label "Submit" :id "btn-submit" :class "my-style-button" :onclick ( fn ( event ) ( display "Button was clicked" ) ) )
1.11.3 dotted list sexp parser algorithm
In traditional Scheme and Lisp implementations, lists are not reprsented as vectors or arrays, instead they are represented as linked list of cons, also called dotted pairs cells. Each cons cell has two parts. The first part car points to another s-expression, that either can be an atomic element or other cons cell. The second part of a dotted pair, cons can point to another dotted pair, an empty list or an atomic element. If the cons of the final dotted pair of a list points to an empty list (nil in common lisp), the list is said to be a proper list. If the dotted pair points to an atomic element, the list is said to be an improper list.
The terms car, cdr and cons, come from the following abbreviations:
- car - comes from 'Contents of the Address part of the Register'
- cdr - comes from 'Contents of the Decrement part of the Register'
- cons - abbreviation of the word 'construct'
A single dotted pair (a . b) denoting an association of two atoms 'a' and 'b' can be represented with the following diagram.
(a . b) or (cons a b) +-----+-----+ | car | cdr | ---> b +-----+-----+ | \ / a
According to this dotted-pairs or cons-cell linked lista approach, a list (a b c) is represented as:
representation 1: (cons a (cons b (cons c nil))) representation 2: (a . (b . (c . nil))) +-----+-----+ +-----+-----+ +-----+-----+ | car | cdr | --->| car | cdr | --->| car | cdr |---> NIL +-----+-----+ +-----+-----| +-----+-----+ => Empty list in Scheme '()) | | | => nil (null pointer) in Common Lisp \ / \ / \ / a b c
A proper list, such as (a b c d), is terminated in nil as shown in the following diagram:
cons(a, cons(b, cons(c, cons(d, nil))) (a . (b . (c . (d . nil))))
An improper list, such as (a b . c), is terminated in a non nil or non empty-list atom. The next diagram show how this list is represented in memory.
cons(a, cons(b, c)) (a . (b . c)) +-----+-----+ +-----+-----+ | car | cdr | --->| car | cdr | ---> c +-----+-----+ +-----+-----| | | \ / \ / a b
The EBNF grammar of this dotted-pairs linked list can be specified as follow:
sexp := atom | pair | list pair := "(" sexp "." sexp ")" list := "(" sexp* ")" atom := NUMBER | SYMBOL | STRING | BOOLEAN | SYMBOL | NULL
The AST can be represnted using OCaml algebraic data type notation as:
type sexp = num of int // number literal | sym of string // symbol, identifier or keyword | str of string // string literal between quotes "" | bool of boolean // boolean literak #t or #f | nil // represents null value | cons of sexp * sexp // cons cell => It can represent a null-terminated
The previous grammar can be parsed using this pascal-like pseudocode:
// Token Types // //------------------------------------------------// // Delimiters T_LPAR "(" // left parenthesis T_RPAR ")" // right parenthesis T_DOT "." // Dot - found in dotted pair '(a . b) T_QUOTE "'" // quote (') // Literal token types (terminal elements of grammar) T_NUM // number T_STR // string literal "something else" T_SYM // symbol 'my-symbol T_NIL // nil => null element and empty list T_BOOL // boolean #t (true) or #f (false) // Sentinel Token Types T_EOF // End Of File T_ERR // Error T_NDF // Non Defined - non initialized token // -------------------------------------------------------// // Those tokens come from the tokenizer which skips blank characters, white spaces and new lines, and // breaks the source code into terminal elements of the grammar. tokens := [ Token(T_LPAR, "(", Token(T_NUM, "-1245.13e3"), Token(T_SYM, "x1") ,Token(T_STR, "hello world"), Token(T_NIL, "nil"), Token(T_BOOL, "#f") ,Token(T_BOOL, "#t"), Token(T_DOT, "."), Token(T_RPAR, ")") ] // ------- Global Variables ==> They should be contained in an object. // Current position of cursors index := 0 // Current token current := Token(T_NDF, "") // Advance to next token FUNCTION advance(): void IF index >= len(tokens) THEN RETURN Token(T_EOF, "") END current := tokens[index] index := index + 1 END // Check whether the current token type has the same type as the argument FUNCTION check(token_type): bool RETURN current.type == token_type END // Returns true if the current token matches the expected type and consume it. FUNCTION match(token_type): bool IF current.type == token_type THEN advance() RETURN true END RETURN false END // If the current token does not match the expected type, raise an error. FUNCTION expect(token_type) IF current.type == token_type advance() ELSE error("Expected token ", token_type, ", but given ", current) END END // Return true if the current token is EOF (End Of File) FUNCTION is_eof(): bool RETURN current.type == EOF END //--------------------------------------------------------// // Parse Lisp symbolic expression FUNCTION parse_sexp(): Ast BEGIN IF token.is_atom() THEN RETURN parse_atom() ELSE IF match(T_LPAR) THEN RETURN parse_list() ELSE IF match(T_QUOTE) THEN q = self.parse_sexp() RETURN AstCons( AstSymbol("quote"), AstCons(q, AstNil() ) ) ELSE error("Invalid token: ", token_string) END END FUNCTION parse_atom(): Ast BEGIN ast := null IF check(TOKEN_NUM) THEN ast := AstInt(parse_num(lexeme)) END ELSE IF check(TOKEN_STR) THEN ast := AstStr(lexeme) END ELSE IF check(TOKEN_SYM) THEN ast := AstSym(lexeme) END ELSE IF check(TOKEN_NULL) THEN ast := AstNil() END // If this line runs, it means that there is a bug. ELSE error("parse_atom() => invalid token: ", token_string) END // Advance token advance() RETURN ast END FUNCTION parse_list(): Ast IF match(T_RPAR) THEN RETURN AstNil() ELSE IF match(T_DOT) THEN IF match(T_RPAR) THEN error("Error: ill-formed dotted list.") END element := parse_sexp() expect(T_RPAR) RETURN element ELSE iF match(T_EOF) THEN error("Expected right parenthesis") END head := parse_sexp() tail := pase_list() pair := cons(head, tail) RETURN pair END //---------- Program entry point - main() --------// // FUNCTION main(): void // Lexical analysis, break text/source into tokens tokens := tokenize(source_code) // Initialize parser advance() // parser AST ast := parse_sexp() // Display result print(" Lisp AST: ", ast) END
The next code is an executable implementation of the presented parsing algorithm implemented in Python, but that can be easily translated to any other programming language.
File: sexp_parser1.py
from typing import Any, Callable, Dict, NamedTuple, Optional, List, Tuple # ============= >>>> Token types <<<<=================== # --- Literals ------------------------------------------------- T_INT = "T_INT" # Integer: 1002 -245 9025 T_FLT = "T_FLT" # Floating point number: -10.2522 9.25221 .0252e3 102.5E3 T_STR = "T_STR" # String literal: "world" T_SYM = "T_SYM" # Symbol: 'hello T_NIL = "T_NIL" # Nil element or end of list T_BOOL = "T_BOOL" # Boolean: #f or #t # --- Delimiters ----------------------------------------------- T_LPAR = "T_LPAR" # left parenthesis T_RPAR = "T_RPAR" # right parenthesis T_DOT = "T_DOT" # dot '.' T_QUOTE = "T_QUOTE" # quote (') # --- Sentinel tokens --------------------------------------- T_NDF = "T_NDF" # Non defined, non initialized sentinel token T_EOF = "T_EOF" # End of file sentinel token class Token(NamedTuple): type: str # Token type text: str # Token text, also known as lexeme pos: int # Token position col: int # Token column - starts with 1 lin: int # Token line - starts with 1 TOKEN_EOF = Token(T_EOF, "", 0, 0, 0) NULLCHAR = '\0' class Tokenizer: def __init__(self, source: str) -> None: # Source code to be tokenized self._source = source # Current position of cursor self._pos = 0 # Current line of cursor self._lin = 1 # Current column of cursor self._col = 1 # Position of first token character self._token_pos = 0 # Column of first token character self._token_col = 1 # Line of first token chracter self._token_lin = 1 # Current character self._chr = '' # Current token self.advance() def advance(self): if self._pos >= len(self._source): self._chr = '\0' return self._chr = self._source[self._pos] self._col += 1 if self._chr == '\n': self._lin += 1 self._col = 1 self._pos += 1 def is_eof(self): """Return true if character is end of input""" return self._chr == NULLCHAR def has_next(self) -> bool: """Return true if tokenizer has next token""" return not self.is_eof() def is_space(self): """Return true if current character is space or blank character. """ return self._chr.isspace() def is_digit(self): """Return true if current character is digit.""" return self._chr.isdigit() def is_delimiter(self): return self._chr in ["(", ")", ".", "'"] def check(self, chr: str): return self._chr == chr def peek(self): """Return current character""" return self._chr def lookahead(self, n: int = 1): """Returns next character in the stream without consuming it""" if self._pos + n >= len(self._source): return NULLCHAR return self._source[self._pos + n - 1] def expect(self, chr: str): if self._chr == chr: self.advance() return raise RuntimeError(f"Expected character {chr}, but given {self._chr}.") def match(self, chr: str): if self._chr == chr: self.advance() return True return False def next(self) -> Token: """Return next token""" if self.is_eof(): return self._token(T_EOF, "") # Skip whitespace self._skip_whitespace() self.set_token_position() if self.match("("): return self._token(T_LPAR, "(") elif self.match(")"): return self._token(T_RPAR, ")") elif self.match("."): return self._token(T_DOT, ".") elif self.match("'"): return self._token(T_QUOTE, "'") elif self.check('-') and self.lookahead().isdigit(): return self._next_number() elif self.is_digit(): return self._next_number() elif self.check("\""): return self._next_string() else: return self._next_symbol_or_bool() def tokens(self) -> List[Token]: lst = [] while not self.is_eof(): t = self.next() if t.type == T_EOF: break lst.append(t) return lst def set_token_position(self): """Store position of beginning of token until this function is called again.""" self._token_pos = self._pos - 1 self._token_col = self._col - 1 self._token_lin = self._lin def _skip_whitespace(self): while self.is_space() and not self.is_eof(): self.advance() def _token(self, type: str, lexeme: str) -> Token: return Token(type, lexeme ,self._token_pos ,self._token_col ,self._token_lin) def _next_symbol_or_bool(self) -> Token: lexeme = "" self.set_token_position() while not self.is_eof() and not self.is_space() and not self.is_delimiter(): lexeme += self.peek() self.advance() if lexeme == "": return self._token(T_EOF, "") if lexeme == "#t" or lexeme == "#f": return self._token(T_BOOL, lexeme) return self._token(T_SYM, lexeme) def _next_number(self) -> Token: lexeme = "" self.set_token_position() # In grammars where '-' negative sign is not part of literal # remove the next line if self.match('-'): lexeme += "-" if not self.check('.'): lexeme += self._next_integer() if not self.check('.') and not self.check('e') and not self.check('E'): if self.peek().isalpha(): raise RuntimeError("Error: invalid number, cannot end with letter") # Identified integer type return self._token(T_INT, lexeme) if self.match('.'): lexeme += "." + self._next_integer() letter = self.peek() if self.match('e') or self.match('E'): lexeme += letter if not self.match('-') and not self.is_digit(): raise RuntimeError("Error: invalid floating point number: " + lexeme) if self.match('-'): lexeme += "-" if not self.is_digit(): raise RuntimeError("Error: invalid floating point number. Expected digit, but got: " + lexeme) lexeme += self._next_integer() if self.peek().isalpha(): raise RuntimeError("Error: invalid number, it cannot end with letter") # Return a lexeme (string representation) of a floating point number literal return self._token(T_FLT, lexeme) def _next_integer(self) -> str: assert self.is_digit() lexeme = "" while self.is_digit() and not self.is_eof(): lexeme += self.peek() self.advance() return lexeme def _next_string(self) -> Token: self.expect('"') lexeme = "" self.set_token_position() while not self.is_eof() and not self.check('"'): if self.match('\\'): if self.match('n'): lexeme += "\n"; continue if self.match('t'): lexeme += "\t"; continue if self.match('v'): lexeme += "\v"; continue if self.match('"'): lexeme += "\""; continue if self.match('\\'): lexeme += "\\"; continue raise RuntimeError("Tokenizer error, invalid escape character: '" + self._ch + "'") lexeme += self.peek() self.advance() self.expect('"') return self._token(T_STR, lexeme) EXPR_CONS = "EXPR_CONS" EXPR_NIL = "EXPR_NIL" EXPR_BOOL = "EXPR_BOOL" EXPR_STR = "EXPR_STR" EXPR_SYM = "EXPR_SYM" EXPR_NUM = "EXPR_NUM" EXPR_ERR = "EXPR_ERR" class Expr: """Reprsents a S-Expression symbolic expression LISP linked list.""" def to_repr(self) -> str: return "<NOT IMPLEMENTED>" def to_str(self) -> str: return "<NOT IMPLEMENTED>" def type(self) -> str: return "<NOT IMPLEMENTED>" def to_num(self) -> float: return 0.0 def is_atom(self) -> bool: return False def is_pair(self) -> bool: return False def is_nil(self) -> bool: return False def is_num(self) -> bool: return False def is_sym(self) -> bool: return False def is_bool(self) -> bool: return False def is_str(self) -> bool: return False def is_err(sefl) -> bool: return False def car(self) -> 'Expr': raise RuntimeError("Not implemented") def cdr(self) -> 'Expr': raise RuntimeError("Not implemented") def __repr__(self) -> str: return self.to_repr() class ExprNil(Expr): """This AST node represents an empty list or null element.""" def is_atom(self) -> bool: return True def is_nil(self) -> bool: return True def to_repr(self) -> str: return "nil" def to_str(self) -> str: return "nil" def type(self) -> str: return EXPR_NIL class ExprCons(Expr): """AST node represents a cons cell.""" def type(self) -> str: return EXPR_CONS def is_pair(self) -> bool: return True def __init__(self, car: Expr, cdr: Expr) -> None: self._car = car self._cdr = cdr def car(self) -> 'Expr': return self._car def cdr(self) -> 'Expr': return self._cdr def to_repr(self) -> str: head = self._car.to_repr() tail = self._cdr.to_repr() return f"cons({head}, {tail})" def to_str(self) -> str: return self.to_repr() def type(self) -> str: return EXPR_CONS class ExprSym(Expr): """AST node represents a lisp symbol.""" def __init__(self, value: str): self._value = value def is_atom(self) -> bool: return True def is_sym(self) -> bool: return True def to_repr(self): return "'" + self._value def to_str(self): return self._value def type(self) -> str: return EXPR_SYM class ExprNum(Expr): """AST node represents a lisp number.""" def __init__(self, value: int): self._value = value def is_atom(self) -> bool: return True def is_sym(self) -> bool: return True def to_repr(self): return str(self._value) def to_str(self): return str(self._value) def to_num(self) -> float: return self._value def type(self) -> str: return EXPR_NUM class ExprStr(Expr): """AST node represents a lisp symbol.""" def __init__(self, value: str): self._value = value def is_atom(self) -> bool: return True def is_str(self) -> bool: return True def to_repr(self): return '"' + self._value + '"' def to_str(self): return self._value def type(self) -> str: return EXPR_STR class ExprBool(Expr): """AST node represents a lisp boolean value.""" def __init__(self, value: bool): self._value = value def is_atom(self) -> bool: return True def is_bool(self) -> bool: return True def to_repr(self): return "#t" if self._value else "#f" def to_str(self): return "#t" if self._value else "#f" def type(self) -> str: return EXPR_BOOL class ExprErr(Expr): """AST node represents a runtime error value (not an AST value).""" def __init__(self, value: str): self._value = value def is_atom(self) -> bool: return True def is_err(sefl) -> bool: return True def to_repr(self): return "err => " + self._value def to_str(self): return self._value def type(self) -> str: return EXPR_ERR class Parser: def __init__(self, source: str) -> None: self._source = source tok = Tokenizer(source) self._tokens = tok.tokens() self._pos = 0 self._current = Token(T_NDF, "", 0, 0, 0) self.advance() def peek(self): return self._current def lookahead(self, n: int = 1): """Returns next character in the stream without consuming it""" if self._pos + n >= len(self._tokens): return Token(T_EOF, 0, 0, 0, 0) return self._tokens[self._pos + n - 1] def advance(self): """Advance to next token""" if self._pos >= len(self._tokens): self._current = Token(T_EOF, 0, 0, 0, 0) return self._current = self._tokens[self._pos] self._pos += 1 def match(self, type: str): """Returns true if the current token matches the expected type and consume it.""" if self._current.type == type: self.advance() return True return False def check(self, type: str): """Returns true if the current token matches the expected type and consume it.""" return self._current == type def expect(self, type: str): """Consume current token if it matches the expected type. Otherwise raise an error.""" if self._current.type == type: self.advance() return raise RuntimeError(f"Expected token of type {type}, but given {self._current}.") def is_eof(self): """Return true if current token is T_EOF""" return self._current.type == T_EOF # ------- End of reusable parser code ---------------_# def is_atom(self): """Return true if current token is atom""" return self.peek().type in [ T_BOOL, T_FLT, T_INT, T_SYM, T_NIL, T_STR ] def parse_sexp(self) -> Expr: if self.is_atom(): return self.parse_atom() elif self.match(T_LPAR): return self.parse_list() # 'x is parsed as(quote x) # '(+ x 10) is parsed as (quote (+ x 10)) elif self.match(T_QUOTE): q = self.parse_sexp() return ExprCons(ExprSym("quote"), ExprCons(q, ExprNil())) else: raise RuntimeError("Invalid token: " + str(self.peek())) def parse_atom(self) -> Expr: lexeme = self.peek().text if self.match(T_INT): return ExprNum(float(lexeme)) elif self.match(T_FLT): return ExprNum(float(lexeme)) elif self.match(T_STR): return ExprStr(lexeme) elif self.match(T_SYM): return ExprSym(lexeme) elif self.match(T_BOOL): return ExprBool(lexeme == "#t") else: raise RuntimeError("Invalid token: " + lexeme) def parse_list(self) -> Expr: # self.expect(T_LPAR) if self.match(T_RPAR): return ExprNil() elif self.match(T_DOT): if self.match(T_RPAR): raise RuntimeError("Ill-formed dotted list.") elem = self.parse_sexp() self.expect(T_RPAR) return elem elif self.match(T_EOF): raise RuntimeError("Expected right parenthesis") car = self.parse_sexp() cdr = self.parse_list() ast = ExprCons(car, cdr) return ast def sexp_parse(source: str) -> Expr: p = Parser(source) return p.parse_sexp() def sexp_equal(lhs: Expr, rhs: Expr) -> bool: """Determine whether two S-Expressions are equal""" if lhs.type() != rhs.type(): return False elif lhs.is_atom() and rhs.is_atom(): if lhs.is_num(): return lhs.to_num() == rhs.to_num() elif lhs.is_str(): return lhs.to_str() == rhs.to_str() elif lhs.is_sym(): return lhs.to_str() == rhs.to_str() elif lhs.is_bool(): return lhs._value == rhs._value elif lhs.is_nil(): return True else: raise RuntimeError("Error: edge case => code has bugs. ") else: t = sexp_equal(lhs.car(), rhs.car()) if not t: return False k = sexp_equal(lhs.cdr(), rhs.cdr()) return k def sexp_to_str(sexp: Expr) -> Expr: """Pretty printer of S-Expression""" ## import pdb; pdb.set_trace() if sexp.is_str(): return sexp.to_repr() elif sexp.is_atom(): return sexp.to_str() elif sexp.is_pair(): s = sexp text = "(" while s.is_pair(): head = s.car() s = s.cdr() text = text + " " + sexp_to_str(head) if s.is_nil(): text = text + ")" elif s.is_atom(): text = text + " . " + s.to_str() + " )" else: RuntimeError("Edge case => this branch is impossible to happen.") return text else: raise RuntimeError("Not implemented for this Ast element: " + sexp) def test_expr1(): """unit test of SEXP parser.""" src = ' ( 9.214 "world" a . end )' expected = ExprCons(ExprNum(9.214), ExprCons(ExprStr("world"), ExprCons( ExprSym('a'), ExprSym("end"))) ) ast = sexp_parse(src) assert sexp_equal(ast, expected) def test_expr2(): """Test in-meory AST representation of sexp""" src = '( "hello world" (100 a . b) x1 z )' expected = ExprCons( ExprStr("hello world") , ExprCons( ExprCons(ExprNum(100.0), ExprCons(ExprSym("a"), ExprSym("b"))) , ExprCons(ExprSym("x1") , ExprCons(ExprSym("z"), ExprNil()))) ) ast = sexp_parse(src) assert sexp_equal(ast, expected) def test_expr3(): """Test quote quoted expression""" src = ' \'(+ x y z)' lst = ExprCons( ExprSym("+") , ExprCons(ExprSym("x"), ExprCons(ExprSym("y"), ExprCons( ExprSym("z"), ExprNil())))) expected = ExprCons( ExprSym("quote"), ExprCons(lst, ExprNil())) ast = sexp_parse(src) assert sexp_equal(ast, expected) def test_expr4(): """Test quoted expression""" src = ' \'(+ x y z)' expected = "cons('quote, cons(cons('+, cons('x, cons('y, cons('z, nil)))), nil))" ast = sexp_parse(src) assert str(ast) == expected def test_exrp5(): """Test string representation of sexp""" src = ' \'(+ x y z)' s = sexp_parse(src) # Pretty print representation of SEXP p = sexp_to_str(s) expected = '( quote ( + x y z))' assert p == expected def test_expr6(): src = """ (fn myfunction(x y . lst) (print x) (print y) (map print lst) ) """ expected = '( fn myfunction ( x y . lst ) ( print x) ( print y) ( map print lst))' ast = sexp_parse(src) # Pretty print representation of SEXP p = sexp_to_str(ast) assert p == expected
Run unit tests:
$ pytest -q -v -rA sexp_parser1.py ============================================== test session starts ============================================== platform linux -- Python 3.8.10, pytest-7.3.1, pluggy-1.0.0 rootdir: /home/user/formula-parser plugins: anyio-3.5.0, xonsh-0.13.3 collected 6 items sexp_parser1.py ...... [100%] ==================================================== PASSES ===================================================== ============================================ short test summary info ============================================ PASSED sexp_parser1.py::test_expr1 PASSED sexp_parser1.py::test_expr2 PASSED sexp_parser1.py::test_expr3 PASSED sexp_parser1.py::test_expr4 PASSED sexp_parser1.py::test_exrp5 PASSED sexp_parser1.py::test_expr6 =============================================== 6 passed in 0.04s ===============================================
Interact with the parser in the REPL:
$ python3 -i sexp_parser1.py src = """ ( a x->1 "My string literal" (m1 . w ) (1.25 -9.251e3 0.255 nil) z2->w . object:method ) """ ast = sexp_parse(src) >>> ast cons('a, cons('x->1, cons("My string literal", cons(cons('m1, 'w), cons(cons(1.25, cons(-9251.0, cons(0.255, cons('nil, nil)))), cons('z2->w, 'object:method)))))) >>> ast.car() 'a >>> ast.cdr() cons('x->1, cons("My string literal", cons(cons('m1, 'w), cons(cons(1.25, cons(-9251.0, cons(0.255, cons('nil, nil)))), cons('z2->w, 'object:method))))) >>> ast.cdr().cdr() cons("My string literal", cons(cons('m1, 'w), cons(cons(1.25, cons(-9251.0, cons(0.255, cons('nil, nil)))), cons('z2->w, 'object:method)))) >>> ast.cdr().cdr().cdr() cons(cons('m1, 'w), cons(cons(1.25, cons(-9251.0, cons(0.255, cons('nil, nil)))), cons('z2->w, 'object:method))) >>> sexp_to_str(ast) '( a x->1 "My string literal" ( m1 . w ) ( 1.25 -9251.0 0.255 nil) z2->w . object:method )' >>> sexp_to_str(ast.cdr()) '( x->1 "My string literal" ( m1 . w ) ( 1.25 -9251.0 0.255 nil) z2->w . object:method )' >>> sexp_to_str(ast.cdr().cdr()) '( "My string literal" ( m1 . w ) ( 1.25 -9251.0 0.255 nil) z2->w . object:method )' >>> sexp_to_str(ast.cdr().cdr().cdr()) '( ( m1 . w ) ( 1.25 -9251.0 0.255 nil) z2->w . object:method )' >>> sexp_to_str(ast.cdr().cdr().cdr().car()) '( m1 . w )' >>> sexp_to_str(ast.cdr().cdr().cdr().car().car()) 'm1' >>> sexp_to_str(ast.cdr().cdr().cdr().car().cdr()) 'w'
1.11.4 sexp parser algorithm without dotted list
Besides the dotted-pair approach that represents S-expressions as linked lists of cons cells, S-expresions can also be represented by nested lists or arrays. This approach is more suitable for data exchange and serialization. Other advantage is the easier and quickier implementation of a lisp-like embedded scripting languages hosted in a garbage collected statically typed languages such as Java or C#. Unlike the dotted-pair AST - abstract syntax tree, this type of AST cannot represent improper lists.
EBNF for Lisp/Scheme Sexp S-Expression without cons pairs linked list:
sexp := atom | list list := "(" sexp* ")" atom := NUMBER | BOOLEAN | SYMBOL | STRING | KEYWORD
Pascal-like pseudocde for S-expression parser implementation:
// Token Types // //------------------------------------------------// // Delimiters T_LPAR "(" // left parenthesis T_RPAR ")" // right parenthesis T_DOT "." // Dot - found in dotted pair '(a . b) T_QUOTE "'" // Quote // Literal token types (terminal elements of grammar) T_NUM // number T_STR // string literal "something else" T_SYM // symbol 'my-symbol T_NIL // nil => null element and empty list T_BOOL // boolean #t (true) or #f (false) // Sentinel Token Types T_EOF // End Of File T_ERR // Error T_NDF // Non Defined - non initialized token // -------------------------------------------------------// // Those tokens come from the tokenizer which skips blank characters, white spaces and new lines, and // breaks the source code into terminal elements of the grammar. tokens := [ Token(T_LPAR, "(", Token(T_NUM, "-1245.13e3"), Token(T_SYM, "x1") ,Token(T_STR, "hello world"), Token(T_NIL, "nil"), Token(T_BOOL, "#f") ,Token(T_BOOL, "#t"), Token(T_DOT, "."), Token(T_RPAR, ")") ] // ------- Global Variables ==> They should be contained in an object. // Current position of cursors index := 0 // Current token current := Token(T_NDF, "") // Advance to next token FUNCTION advance(): void IF index >= len(tokens) THEN RETURN Token(T_EOF, "") END current := tokens[index] index := index + 1 END // Check whether the current token type has the same type as the argument FUNCTION check(token_type): bool RETURN current.type == token_type END // Returns true if the current token matches the expected type and consume it. FUNCTION match(token_type): bool IF current.type == token_type THEN advance() RETURN true END RETURN false END // If the current token does not match the expected type, raise an error. FUNCTION expect(token_type) IF current.type == token_type advance() ELSE error("Expected token ", token_type, ", but given ", current) END END // Return true if the current token is EOF (End Of File) FUNCTION is_eof(): bool RETURN current.type == EOF END //--------------------------------------------------------// // Algebraic data types in languages like OCaml or Scala; // or class hierarchy in languages like Java Ast is root class -> AstNum(val: int) // number value -> AstStr(val: str) // string value -> AstSym(val: str) // symbol value -> AstBool(val: bool) // #t (true) or #f (false) -> AstNil() // nil -> AstList(val: [Ast]] // List instead of a cons cell -> AstErr(val: str) // Represents parser error or runtime error FUNCTION parse_sexp(): Ast BEGIN IF check(T_LPAR) THEN RETURN parse_list() ELSE IF is_atom(token) THEN RETURN parse_atom() ELSE IF check(QUOTE) THEN sexp := parse_sexp() RETURN AstList([ AStSym("quote"), sexp]) ELSE error("Invalid token") END FUNCTION parse_atom(): Ast BEGIN ast := null // Lexeme is that string that a tokens represents IF check(TOKEN_NUM) THEN ast := AstInt(parse_num(lexeme)) END ELSE IF check(TOKEN_STR) THEN ast := AstStr(lexeme) END ELSE IF check(TOKEN_SYM) THEN ast := AstSym(lexeme) END ELSE IF check(TOKEN_NULL) THEN ast := AstNil() END // If this line runs, it means that there is a bug. ELSE error("parse_atom() => invalid token: ", token_string) END // Advance token advance() RETURN ast END FUNCTION parse_list(): Ast expect(T_LPAR) list: [Ast] = [] WHILE NOT check(T_RPAR) and NOT is_eof() THEN sexp = parse_sexp() ASSERT sexp != error append_to_list(list, sexp) END expect(T_RPAR) RETURN list END //---------- Program entry point - main() --------// // FUNCTION main(): void // Lexical analysis, break text/source into tokens tokens := tokenize(source_code) // Initialize parser advance() // parser AST ast := parse_sexp() // Display result print(" Lisp AST: ", ast) END
Parser algorithm implementation in Python:
File: sexp_parser2.py
from tokenize import Double from typing import Any, Callable, Dict, NamedTuple, Optional, List, Tuple # ============= >>>> Token types <<<<=================== # --- Literals ------------------------------------------------- T_INT = "T_INT" # Integer: 1002 -245 9025 T_FLT = "T_FLT" # Floating point number: -10.2522 9.25221 .0252e3 102.5E3 T_STR = "T_STR" # String literal: "world" T_SYM = "T_SYM" # Symbol: 'hello T_NIL = "T_NIL" # Nil element or end of list T_BOOL = "T_BOOL" # Boolean: #f or #t # --- Delimiters ----------------------------------------------- T_LPAR = "T_LPAR" # left parenthesis T_RPAR = "T_RPAR" # right parenthesis T_DOT = "T_DOT" # dot '.' T_QUOTE = "T_QUOTE" # quote (') # --- Sentinel tokens --------------------------------------- T_NDF = "T_NDF" # Non defined, non initialized sentinel token T_EOF = "T_EOF" # End of file sentinel token class Token(NamedTuple): type: str # Token type text: str # Token text, also known as lexeme pos: int # Token position col: int # Token column - starts with 1 lin: int # Token line - starts with 1 TOKEN_EOF = Token(T_EOF, "", 0, 0, 0) NULLCHAR = '\0' class Tokenizer: def __init__(self, source: str) -> None: # Source code to be tokenized self._source = source # Current position of cursor self._pos = 0 # Current line of cursor self._lin = 1 # Current column of cursor self._col = 1 # Position of first token character self._token_pos = 0 # Column of first token character self._token_col = 1 # Line of first token chracter self._token_lin = 1 # Current character self._chr = '' # Current token self.advance() def advance(self): if self._pos >= len(self._source): self._chr = '\0' return self._chr = self._source[self._pos] self._col += 1 if self._chr == '\n': self._lin += 1 self._col = 1 self._pos += 1 def is_eof(self): """Return true if character is end of input""" return self._chr == NULLCHAR def has_next(self) -> bool: """Return true if tokenizer has next token""" return not self.is_eof() def is_space(self): """Return true if current character is space or blank character. """ return self._chr.isspace() def is_digit(self): """Return true if current character is digit.""" return self._chr.isdigit() def is_delimiter(self): return self._chr in ["(", ")", ".", "'"] def check(self, chr: str): return self._chr == chr def peek(self): """Return current character""" return self._chr def lookahead(self, n: int = 1): """Returns next character in the stream without consuming it""" if self._pos + n >= len(self._source): return NULLCHAR return self._source[self._pos + n - 1] def expect(self, chr: str): if self._chr == chr: self.advance() return raise RuntimeError(f"Expected character {chr}, but given {self._chr}.") def match(self, chr: str): if self._chr == chr: self.advance() return True return False def next(self) -> Token: """Return next token""" if self.is_eof(): return self._token(T_EOF, "") # Skip whitespace self._skip_whitespace() self.set_token_position() if self.match("("): return self._token(T_LPAR, "(") elif self.match(")"): return self._token(T_RPAR, ")") elif self.match("."): return self._token(T_DOT, ".") elif self.match("'"): return self._token(T_QUOTE, "'") elif self.check('-') and self.lookahead().isdigit(): return self._next_number() elif self.is_digit(): return self._next_number() elif self.check("\""): return self._next_string() else: return self._next_symbol_or_bool() def tokens(self) -> List[Token]: lst = [] while not self.is_eof(): t = self.next() if t.type == T_EOF: break lst.append(t) return lst def set_token_position(self): """Store position of beginning of token until this function is called again.""" self._token_pos = self._pos - 1 self._token_col = self._col - 1 self._token_lin = self._lin def _skip_whitespace(self): while self.is_space() and not self.is_eof(): self.advance() def _token(self, type: str, lexeme: str) -> Token: return Token(type, lexeme ,self._token_pos ,self._token_col ,self._token_lin) def _next_symbol_or_bool(self) -> Token: lexeme = "" self.set_token_position() while not self.is_eof() and not self.is_space() and not self.is_delimiter(): lexeme += self.peek() self.advance() if lexeme == "": return self._token(T_EOF, "") if lexeme == "#t" or lexeme == "#f": return self._token(T_BOOL, lexeme) return self._token(T_SYM, lexeme) def _next_number(self) -> Token: lexeme = "" self.set_token_position() # In grammars where '-' negative sign is not part of literal # remove the next line if self.match('-'): lexeme += "-" if not self.check('.'): lexeme += self._next_integer() if not self.check('.') and not self.check('e') and not self.check('E'): if self.peek().isalpha(): raise RuntimeError("Error: invalid number, cannot end with letter") # Identified integer type return self._token(T_INT, lexeme) if self.match('.'): lexeme += "." + self._next_integer() letter = self.peek() if self.match('e') or self.match('E'): lexeme += letter if not self.match('-') and not self.is_digit(): raise RuntimeError("Error: invalid floating point number: " + lexeme) if self.match('-'): lexeme += "-" if not self.is_digit(): raise RuntimeError("Error: invalid floating point number. Expected digit, but got: " + lexeme) lexeme += self._next_integer() if self.peek().isalpha(): raise RuntimeError("Error: invalid number, it cannot end with letter") # Return a lexeme (string representation) of a floating point number literal return self._token(T_FLT, lexeme) def _next_integer(self) -> str: assert self.is_digit() lexeme = "" while self.is_digit() and not self.is_eof(): lexeme += self.peek() self.advance() return lexeme def _next_string(self) -> Token: self.expect('"') lexeme = "" self.set_token_position() while not self.is_eof() and not self.check('"'): if self.match('\\'): if self.match('n'): lexeme += "\n"; continue if self.match('t'): lexeme += "\t"; continue if self.match('v'): lexeme += "\v"; continue if self.match('"'): lexeme += "\""; continue if self.match('\\'): lexeme += "\\"; continue raise RuntimeError("Tokenizer error, invalid escape character: '" + self._ch + "'") lexeme += self.peek() self.advance() self.expect('"') return self._token(T_STR, lexeme) EXPR_LIST = "EXPR_LIST" EXPR_NIL = "EXPR_NIL" EXPR_BOOL = "EXPR_BOOL" EXPR_STR = "EXPR_STR" EXPR_SYM = "EXPR_SYM" EXPR_NUM = "EXPR_NUM" EXPR_ERR = "EXPR_ERR" class Expr: """Reprsents a S-Expression, Lisp symbolic expression (but not linked list)""" # Get Representation showing object type, for instance Sym[x] if x is a symbol # Num[10], Str["my string"] def to_repr(self) -> str: return "<NOT IMPLEMENTED>" # Get S-Expression representation def to_sexp(self) -> str: return "<NOT IMPLEMENTED>" def type(self) -> str: return "<NOT IMPLEMENTED>" def to_num(self) -> float: return 0.0 def to_bool(self) -> bool: return True # Attempt to convert AST node to string def to_str(self) -> str: raise RuntimeError("Not implemented.") # Return nth element of expr list def nth(self, n: int) -> 'Expr': raise RuntimeError("Not valid for this type") # Return nodes of a ExprList, but this method is only valid for this type def nodes(self) -> List['Expr']: raise RuntimeError("Method only valid for ExprList") def is_atom(self) -> bool: return False def is_list(self) -> bool: return False def is_nil(self) -> bool: return False def is_num(self) -> bool: return False def is_sym(self) -> bool: return False def is_bool(self) -> bool: return False def is_str(self) -> bool: return False def is_err(sefl) -> bool: return False def car(self) -> 'Expr': raise RuntimeError("Not implemented") def cdr(self) -> 'Expr': raise RuntimeError("Not implemented") def __repr__(self) -> str: return self.to_repr() class ExprNil(Expr): """This AST node represents an empty list or null element.""" def is_atom(self) -> bool: return True def is_nil(self) -> bool: return True def to_repr(self) -> str: return "nil" def to_sexp(self) -> str: return "nil" def type(self) -> str: return EXPR_NIL def to_bool(self) -> bool: return False class ExprList(Expr): """AST node represents a list.""" def type(self) -> str: return EXPR_LIST def is_list(self) -> bool: return True def __init__(self, nodes: List[Expr]) -> None: self._nodes = nodes def car(self) -> 'Expr': return self._car def cdr(self) -> 'Expr': return self._cdr def nth(self, n: int) -> 'Expr': return self._nodes[n] def nodes(self) -> List[Expr]: return self._nodes def to_repr(self) -> str: nodes = ", ".join([ n.to_repr() for n in self._nodes ]) return f"List[ {nodes} ]" def to_sexp(self) -> str: nodes = " ".join([ n.to_sexp() for n in self._nodes ]) return f"({nodes})" def type(self) -> str: return EXPR_LIST class ExprSym(Expr): """AST node represents a lisp symbol.""" def __init__(self, value: str): self._value = value def is_atom(self) -> bool: return True def is_sym(self) -> bool: return True def to_repr(self): return f"Sym[{self._value}]" def to_sexp(self): return self._value def to_str(self) -> str: return self._value def type(self) -> str: return EXPR_SYM class ExprNum(Expr): """AST node represents a lisp number.""" def __init__(self, value: int): self._value = value def is_atom(self) -> bool: return True def is_sym(self) -> bool: return True def to_repr(self): return f"Num[{self._value}]" def to_sexp(self): return str(self._value) def is_num(self) -> bool: return True def to_num(self) -> float: return self._value def to_str(self) -> str: return str(self._value) def type(self) -> str: return EXPR_NUM class ExprStr(Expr): """AST node represents a lisp symbol.""" def __init__(self, value: str): self._value = value def is_atom(self) -> bool: return True def is_str(self) -> bool: return True def to_repr(self): return 'Str["' + self._value + '"]' def to_sexp(self): return '"' + self._value + '"' def to_str(self) -> str: return str(self._value) def type(self) -> str: return EXPR_STR class ExprBool(Expr): """AST node represents a lisp boolean value.""" def __init__(self, value: bool): self._value = value def is_atom(self) -> bool: return True def is_bool(self) -> bool: return True def to_repr(self): return "Bool[" + ("#t" if self._value else "#f") + "]" def to_sexp(self): return "#t" if self._value else "#f" def type(self) -> str: return EXPR_BOOL def to_bool(self) -> bool: return self._value def to_str(self) -> str: return "#t" if self._value else "#f" class ExprErr(Expr): """AST node represents a runtime error value (not an AST value).""" def __init__(self, value: str): self._value = value def is_atom(self) -> bool: return True def is_err(sefl) -> bool: return True def to_repr(self): return "err => " + self._value def to_sexp(self): return self._value def type(self) -> str: return EXPR_ERR def to_str(self) -> str: return self._value class Parser: def __init__(self, source: str) -> None: self._source = source tok = Tokenizer(source) self._tokens = tok.tokens() self._pos = 0 self._current = Token(T_NDF, "", 0, 0, 0) self.advance() def peek(self): return self._current def lookahead(self, n: int = 1): """Returns next character in the stream without consuming it""" if self._pos + n >= len(self._tokens): return Token(T_EOF, 0, 0, 0, 0) return self._tokens[self._pos + n - 1] def advance(self): """Advance to next token""" if self._pos >= len(self._tokens): self._current = Token(T_EOF, 0, 0, 0, 0) return self._current = self._tokens[self._pos] self._pos += 1 def match(self, type: str): """Returns true if the current token matches the expected type and consume it.""" if self._current.type == type: self.advance() return True return False def check(self, type: str): """Returns true if the current token matches the expected type and consume it.""" return self._current.type == type def expect(self, type: str): """Consume current token if it matches the expected type. Otherwise raise an error.""" if self._current.type == type: self.advance() return raise RuntimeError(f"Expected token of type {type}, but given {self._current}.") def is_eof(self): """Return true if current token is T_EOF""" return self._current.type == T_EOF # ------- End of reusable parser code ---------------_# def is_atom(self): """Return true if current token is atom""" return self.peek().type in [ T_BOOL, T_FLT, T_INT, T_SYM, T_NIL, T_STR ] def parse_sexp(self) -> Expr: if self.is_atom(): return self.parse_atom() elif self.check(T_LPAR): return self.parse_list() # 'x is parsed as(quote x) # '(+ x 10) is parsed as (quote (+ x 10)) elif self.match(T_QUOTE): q = self.parse_sexp() return ExprList([ExprSym("quote"), q]) else: raise RuntimeError("Invalid token: " + str(self.peek())) def parse_atom(self) -> Expr: lexeme = self.peek().text if self.match(T_INT): return ExprNum(float(lexeme)) elif self.match(T_FLT): return ExprNum(float(lexeme)) elif self.match(T_STR): return ExprStr(lexeme) elif self.match(T_SYM): if lexeme == "nil": return ExprNil() return ExprSym(lexeme) elif self.match(T_BOOL): return ExprBool(lexeme == "#t") else: raise RuntimeError("Invalid token: " + lexeme) def parse_list(self) -> Expr: self.expect(T_LPAR) nodes: List[Expr] = [] while not self.check(T_RPAR) and not self.is_eof(): sexp = self.parse_sexp() nodes.append(sexp) self.expect(T_RPAR) ast = ExprList(nodes) return ast def sexp_parse(source: str) -> Expr: """Parse S-Expression.""" p = Parser(source) ast = p.parse_sexp() return ast def sexp_equal(lhs: Expr, rhs: Expr) -> bool: """Compare two S-expressions returning true if they are equal. This function is useful for unit testing. """ if rhs.type() != lhs.type(): return False elif rhs.is_atom() and lhs.is_atom(): if rhs.is_num(): return rhs.to_num() == lhs.to_num() elif rhs.is_bool(): return rhs.to_bool() == lhs.to_bool() elif rhs.is_sym(): return rhs.to_sexp() == lhs.to_sexp() elif rhs.is_str(): return rhs.to_sexp() == lhs.to_sexp() elif rhs.is_nil(): return True elif rhs.is_list(): a = lhs.nodes() b = rhs.nodes() if len(a) != len(b): return False for n in range(0, len(a)): if not sexp_equal(a[n], b[n]): return False return True else: raise RuntimeError("Error - reached edge case.") def sexp_eval(sexp: Expr) -> Expr: """Non-turing complete S-Expression evaluator. It can only evaluate simple expressions without branching. """ # import pdb; pdb.set_trace() if sexp.is_num(): return sexp elif sexp.is_str(): return sexp elif sexp.is_nil(): return sexp elif sexp.is_bool(): return sexp elif sexp.is_err(): return sexp # Function or special-form evaluation elif sexp.is_list(): # import pdb; pdb.set_trace() if len(sexp.nodes()) == 0: return ExprNil() nodes = sexp.nodes() head = nodes[0] args = nodes[1:] if not head.is_sym(): return ExprErr(f"Error: {func} not callable.") # Obtain function name func = head.to_sexp() # Special form quote if func == "quote": if len(args) != 1: return ExprErr("Error: ill-formed quote special form") return args[0] # Sepcial form if: (if <cond> <then_form> <else_form>) if func == "if": if len(args) > 4 or len(args) < 1: return ExprErr("Syntax error - Ill-formed special form if") cond = args[0] cond_eval = sexp_eval(cond) if cond_eval.to_bool(): # evaluate the then-form res = sexp_eval(args[1]) return res if len(args) == 3: # evaluate the else-form res = sexp_eval(args[2]) return res return ExprNil() # Evaluate all arguments before function application evaluated_args = [] for a in args: res = sexp_eval(a) # print(" [TRACE] res = ", res) # Abort computation if the evaluation of any argument is an error if res.is_err(): return res evaluated_args.append(res) result = ExprNil() if func == "list": result = ExprList(evaluated_args) elif func == "car": if len(evaluated_args) != 1: return ExprErr("Function car() expects 1 argument of type list or nil") lst: Expr = evaluated_args[0] if lst.is_nil(): return ExprNil() if not lst.is_list(): return Expr("Function car() expects argument of type list.") n = lst.nodes() if len(n) == 0: return ExprNil() return n[0] elif func == "cdr": if len(evaluated_args) != 1: return ExprErr("Function cdr() expects 1 argument of type list or nil") lst: Expr = evaluated_args[0] if lst.is_nil(): return ExprNil() if not lst.is_list(): return ExprErr("Function cdr() expects argument of type list.") n = lst.nodes() if len(n) == 0: return ExprNil() return ExprList(n[1:]) elif func == "eval": if len(evaluated_args) != 1: return ExprErr("Function eval()) expects 1 argument.") expr = evaluated_args[0] result = sexp_eval(expr) elif func == "read": if len(evaluated_args) != 1: return ExprErr("Function eval()) expects 1 argument.") src = evaluated_args[0].to_str() ast = sexp_parse(src) return ast elif func == "+": result = fold_list(lambda x, acc: x + acc, evaluated_args) elif func == "-": result = fold_list(lambda x, acc: x - acc, evaluated_args) elif func == "*": result = fold_list(lambda x, acc: x * acc, evaluated_args) elif func == "/": result = fold_list(lambda x, acc: x / acc, evaluated_args) elif func == ">": if len(evaluated_args) != 2: return ExprErr("Error - function (>) expects two arguments") if evaluated_args[0].type() != EXPR_NUM: return ExprErr("Error - first argument should be a number") if evaluated_args[1].type() != EXPR_NUM: return ExprErr("Error - first argument should be a number") a = evaluated_args[0].to_num() b = evaluated_args[1].to_num() result = ExprBool(a > b) elif func == "<": if len(evaluated_args) != 2: return ExprErr("Error - function (>) expects two arguments") if evaluated_args[0].type() != EXPR_NUM: return ExprErr("Error - first argument should be a number") if evaluated_args[1].type() != EXPR_NUM: return ExprErr("Error - first argument should be a number") a = evaluated_args[0].to_num() b = evaluated_args[1].to_num() result = ExprBool(a < b) else: return ExprErr(f"Error unbound function '{func}' ") return result else: return ExprErr(f"Evaluator not implemented for {sexp}") #def fold_list(func: Callable[[Double, Double], Double], args: List[Expr]) -> Expr: def fold_list(func, args: List[Expr]) -> Expr: #print(" [TRACE] args = ", args) if len(args) < 1: return ExprErr("Error: expected at least 1 argument") if not args[0].is_num(): return ExprErr("Error: expected numeric argument") acc = args[0].to_num() for a in args[1:]: if a.is_err(): return a if not a.is_num(): return ExprErr("Expected numeric argument, but given: ", a) value = a.to_num() try: acc = func(acc, value) except ZeroDivisionError as ex: return ExprErr(str(ex)) result = ExprNum(acc) return result def repl(): show_ast = False while True: line = input(" SEXP> ").strip() if line == "": continue elif line == ":show_ast": show_ast = True continue elif line == ":hide_ast": show_ast = False continue ast = sexp_parse(line) if show_ast: print(" [TRACE] ast = ", ast) print(" [TRACE] ast = ", ast.to_sexp()) result = sexp_eval(ast) print(" := ", result.to_sexp()) # ----------- Unit Testing --------------------------------------------------# def test_sexp_parser(): """Test S-Expression parser""" src = "(list 'expr1 (* 3 5 6) 'expr2 (+ 1 2 (* 3 4)) )" expr1 = ExprList([ ExprSym("quote"), ExprSym("expr1")] ) expr2 = ExprList([ ExprSym("quote"), ExprSym("expr2")] ) lst1 = ExprList([ ExprSym("*"), ExprNum(3.0), ExprNum(5.0), ExprNum(6.0) ]) lst2 = ExprList([ ExprSym("+"), ExprNum(1), ExprNum(2), ExprList([ ExprSym("*"), ExprNum(3.0), ExprNum(4.0)])]) expected = ExprList([ ExprSym("list"), expr1, lst1, expr2, lst2 ]) ast = sexp_parse(src) assert sexp_equal(ast, expected) def test_sexp_evaluator1(): """Test S-expression interpreter""" src = "(list 'expr1 (* 3 5 6) 'expr2 (+ 1 2 (* 3 4)) )" expected = ExprList([ExprSym("expr1"), ExprNum(90.0), ExprSym("expr2"), ExprNum(15.0)]) ast = sexp_parse(src) output = sexp_eval(ast) assert sexp_equal(output, expected) def test_sexp_evaluator2(): """Test S-expression interpreter""" src = " (read \n \"(+ 1 2 3 4)\" )" expected = ExprList([ExprSym("+"), ExprNum(1.0), ExprNum(2.0), ExprNum(3.0), ExprNum(4.0)]) ast = sexp_parse(src) output = sexp_eval(ast) assert sexp_equal(output, expected) #-------------------------- End of Unit Testing ------------------------------# src = """ (fn myfunc(x y z) "Compute the distance from origin to point x, y, z" (sqrt (+ (* x x) (* y y) (* z z)) ) ) """ ast = sexp_parse(src)
Run the unit testing:
$ pytest -q -v -rA sexp_parser2.py =============================== test session starts ================================ platform linux -- Python 3.8.10, pytest-7.3.1, pluggy-1.0.0 rootdir: /home/user/formula-parser collected 3 items sexp_parser2.py ... [100%] ====================================== PASSES ====================================== ============================= short test summary info ============================== PASSED sexp_parser2.py::test_sexp_parser PASSED sexp_parser2.py::test_sexp_evaluator1 PASSED sexp_parser2.py::test_sexp_evaluator2 ================================ 3 passed in 0.04s =================================
Test the parser in a interactive way:
>>> ast List[ Sym[fn], Sym[myfunc], List[ Sym[x], Sym[y], Sym[z] ] , Str["Compute the distance from origin to point x, y, z"] , List[ Sym[sqrt], List[ Sym[+], List[ Sym[*], Sym[x], Sym[x] ] , List[ Sym[*], Sym[y], Sym[y] ], List[ Sym[*], Sym[z], Sym[z] ] ] ] ] >>> type(ast) <class '__main__.ExprList'> >>> ast.is_num() False >>> ast.is_list() True >>> ast.is_bool() False >>> ast.is_sym() False >>> ast.nth(0) Sym[fn] >>> ast.nth(1) Sym[myfunc] >>> ast.nth(2) List[ Sym[x], Sym[y], Sym[z] ] >>> ast.nth(2).to_sexp() '(x y z)' >>> ast.nth(2).nth(0) Sym[x] >>> ast.nth(2).nth(1) Sym[y] >>> ast.nth(2).nth(2) Sym[z] >>> ast.nth(3) Str["Compute the distance from origin to point x, y, z"] >>> ast.nth(4) List[ Sym[sqrt], List[ Sym[+], List[ Sym[*], Sym[x], Sym[x] ], List[ Sym[*], Sym[y], Sym[y] ], List[ Sym[*], Sym[z], Sym[z] ] ] ] >>> ast.nth(4).to_sexp() '(sqrt (+ (* x x) (* y y) (* z z)))'
Test the REPL (Read-Print-Eval-Loop). In order to keep, the code as short as possible, a turing-complete Lisp-evaluator was not included. As a result, the evaluator can only deal with simple arithmetic expressions.
>>> repl() SEXP> (list 'x (* 3 5 1) 'y (+ (* 3 5) (/ 100 5) 10) '(+ 100 x (* z w)) ) := (x 15.0 y 45.0 (+ 100.0 x (* z w))) SEXP> SEXP> (read "(+ (* 3 5) (+ 1 2 3 ))") := (+ (* 3.0 5.0) (+ 1.0 2.0 3.0)) SEXP> (eval (read "(+ (* 3 5) (+ 1 2 3 ))")) := 21.0 SEXP> (/ 10 0) := float division by zero SEXP> (if (> 3 15) (+ 2 5) (/ 10 0)) := float division by zero SEXP> (if (> 20 15) (+ 2 5) (/ 10 0)) := 7.0 SEXP> (if (> 3 15) (+ 2 5) (/ 10 0)) := float division by zero SEXP> (+ 100 (if (> 20 15) (+ 2 5) (/ 10 0))) := 107.0 SEXP> (+ 100 (if (> 10 15) (+ 2 5) (/ 10 0))) := float division by zero
1.11.5 Interpreter for Kotlin - Java Platform
There ara many lisp-like programming languages for the Java Platform and JVM, including Clojure, ABCL common lisp and Kawa Scheme. All those implementations compile to JVM bytecodes, instructions of Java Virtual Machine at runtime, but they don't work on Android, that uses Dalvik virtual machine, a different virtual machine than JVM.
This code is an incomplete Scheme-like embedded tree walking interpreter implemented in Kotlin language, that is able to compile to both JVM and Android Dalvik virtual machine. As a result, the code can be embedded in Java desktop application or android apps for faster experimentation and iteration since the interpreter allows instanting and calling Java objects at runtime without any time wasted on compilation.
File: sexpr.kt (Almost: 2130 lines of code)
// Embedded lisp-like interpreter hosted on Kotlin/JVM (Java Virtual Machine) // // Compile this program with: // $ kotlinc sexpr.kt -include-runtime -d sexpr.ja // // Run this sample program with: // $ java -jar sexpr.jar // $ rlrwrap java -jar sexpr.jar // // ---------------------------------------------------------------------------- typealias Func = (List<Ast>) -> Ast enum class AstType { INT, FLT, STR, SYM, KEY, OBJ, LST, ENV, BOOL, NIL , VOID, ERR, FUNC_NATIVE, FUNC_UDF } // Lisp Ast - Abstract Syntax Tree and Runtime values // This is equivalent to an AGDT - Algebraic Data Type sealed class Ast { open fun type(): AstType { throw Exception("Not implemented for this type of AST node.") } open fun toInt(): Int { return 0 } open fun toFlt(): Double { return 0.0 } open fun toBool(): Boolean { throw Exception("Not valid for this type of AST node.") } open fun toStr(): String { throw Exception("Not valid for this type of AST node.") } open fun toObj(): Object { throw Exception("Not valid for this type of AST node.") } open fun nth(n: Int): Ast { throw Exception("Not valid for this type of AST node. ") } open fun nodes(): List<Ast> { throw Exception("Not valid for this type of AST node. ") } open fun size(): Int { throw Exception("Not valid for this type of AST node.")} open fun head(): Ast { throw Exception("Not valid for this type of AST node. ")} open fun tail(): List<Ast> { throw Exception("Not valid for this type of AST node. ")} open fun isAtom(): Boolean { return false } // Return true if object is callable, in other words, the // object is a primtive function (defined in the host language) // or a UDF (User-Defined-Function) defined in the hosted language open fun isCallable(): Boolean { return false } // Return true if AST node is a runtime error open fun isErr(): Boolean { return false } // Return true is AST node is a symbol open fun isSym(): Boolean { return false } // Return true is AST node is a string literal open fun isStr(): Boolean { return false } // Return true if AST node is a keyword literal open fun isKey(): Boolean { return false } // Return true if AST node is a list open fun isList(): Boolean { return false } // Return true if AST node is a integer or floating point open fun isNum(): Boolean { return false} // Return true if AST node is a nil (null) object open fun isNil(): Boolean { return false } // Return true if AST node is void open fun isVoid(): Boolean { return false } // If true, then the runtime value represents a wrapped Java Object open fun isObj(): Boolean { return false } // Anything that is not false or nil is regarded as true // in if-else special forms (if <cond> <then> <else>) open fun asBool(): Boolean { return true } // Null-element ASt - emtpy AST for initialializing variables of type AST object AstVoid: Ast() { override fun isVoid(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun type(): AstType { return AstType.VOID } } // Lisp integer literal data class AstInt(val value: Int): Ast(){ override fun toInt(): Int { return value } override fun toFlt(): Double { return value.toDouble() } override fun isNum(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun type(): AstType { return AstType.INT } } // Lisp floating point literal data class AstFlt(val value: Double): Ast(){ override fun toFlt(): Double { return value } override fun isNum(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun toStr(): String { return value.toString() } override fun type(): AstType { return AstType.FLT } } // Lisp string literal data class AstStr(val value: String): Ast(){ override fun toStr(): String { return value } override fun isAtom(): Boolean { return true } override fun isStr(): Boolean { return true } override fun type(): AstType { return AstType.STR } } // Lisp symbol literal => 'hello-world, string->num data class AstSym(val value: String): Ast() { override fun toStr(): String { return value } override fun isSym(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun type(): AstType { return AstType.SYM } } // Lisp keyworkd literal, example => :width, :size data class AstKey(val value: String): Ast() { override fun toStr(): String { return value } override fun isAtom(): Boolean { return true } override fun isKey(): Boolean { return true } override fun type(): AstType { return AstType.KEY } } // Lisp boolean value data class AstBool(val value: Boolean): Ast(){ override fun toBool(): Boolean { return value } override fun asBool(): Boolean { return value } override fun isAtom(): Boolean { return true } override fun toStr(): String { return if(value) "#t" else "#f" } override fun type(): AstType { return AstType.BOOL } } // Ast - indicates null or empty list object AstNil: Ast() { override fun asBool(): Boolean { return false } override fun isNil(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun toStr(): String { return "nil"} override fun type(): AstType { return AstType.NIL } } // Lisp list data class AstLst(val value: List<Ast>): Ast() { override fun nth(n: Int): Ast { return value[n] } override fun size(): Int { return value.size } override fun nodes(): List<Ast> { return value } override fun head(): Ast { return value[0] } override fun tail(): List<Ast> { return value.takeLast( value.size - 1 )} override fun isList(): Boolean { return true } override fun type(): AstType { return AstType.LST } } // ------ Intepreter Runtime Values ---------------------------// // They don't represent AST nodes. They are the result of AST evaluation // by the interpreter. // Runtime error data class AstErr(val value: String): Ast() { override fun toStr(): String { return value } override fun isErr(): Boolean { return true } override fun type(): AstType { return AstType.ERR } } // Lisp environment contains bindings between variables and // runtime values class AstEnv(val parent: AstEnv? = null) : Ast() { public val env: MutableMap<String, Ast> = mutableMapOf<String, Ast>() public var name: String = "" override fun type(): AstType { return AstType.ENV } override fun isAtom(): Boolean { return true } fun has(name: String): Boolean { if( env.contains(name) ){ return true} if( parent == null ){ return false } val r = parent!!.has(name) return r } fun set(name: String, value: Ast){ env.put(name, value) } // Set only an existing binding fun setExisting(name: String, value: Ast): Boolean { if( env.contains(name) ){ env.put(name, value) return true } else if( parent != null){ val b = parent!!.setExisting(name, value) return b } return false } // Remove all elements from current environment fun reset(){ env.clear() } fun get(name: String): Ast? { // print(" [TRACE] (109) AstEnv.get() => name = ${name} => ${this}") if(env.contains(name)) { return env!!.get(name) } else if(parent == null) { return AstErr("Error: (line 113) unbound variable ${name}") } val r = parent!!.get(name) // println(" [TRACE] (116) AstEnv.get() => ${name} = ${r}") return r } } data class AstFunc(val name: String, val func: Func): Ast() { override fun isCallable(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun type(): AstType { return AstType.FUNC_NATIVE } } // Represents a user-defined function data class AstFuncUDF( val name: String , val args: List<String> , val body: List<Ast> , var env: AstEnv ): Ast() { override fun isCallable(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun type(): AstType { return AstType.FUNC_UDF } } // This AST node represents a Java object data class AstObj(val value: Object): Ast() { override fun isObj(): Boolean { return true } override fun isAtom(): Boolean { return true } override fun toObj(): Object { return value } override fun type(): AstType { return AstType.OBJ } } } enum class Type { T_Flt // Floating point number literal , T_Int // Integer , T_Str // String literal , T_Sym // Symbol literal ` , T_Key // keyword literal , T_Nil // Nil literal , T_Bool // Boolean literal , T_Quote // (') , T_Lpar // '(' Left parenthesis , T_Rpar // ')' Right parenthesis , T_EOF // Type EOF token , T_ERR // Token error , T_EMPTY } // Strings that matches the terminal element of a grammar data class Token( val type: Type , val text: String , val pos: Int , val lin: Int , val col: Int ) { fun isErr(): Boolean { return type == Type.T_ERR } fun isEof(): Boolean { return type == Type.T_EOF } } class Tokenizer(val source: String) { // Null character constant (sentinel value) val NULL_CHAR = 0.toChar() val QUOTE_CHAR = 34.toChar() // '"' breaks the syntax highlight // Initialize current character with NullChar var ch: Char = NULL_CHAR // Current position var pos: Int = 0 // Current line var lin: Int = 1 // Current column var col: Int = 1 private val TOKEN_EMPTY = Token(Type.T_EMPTY, "", 0, 0, 0) init { advance() } fun peek(): Char { return ch } fun lookahed(n: Int): Char { if (pos + n >= source.length){ return NULL_CHAR } return source[pos + n - 1] } fun isEof(): Boolean { return ch == NULL_CHAR } fun match(c: Char): Boolean { if(c == ch){ advance() return true } return false } fun check(c: Char): Boolean { return c == ch } fun expect(c: Char): Token { if(c == ch){ advance() return TOKEN_EMPTY } // throw Exception("Lexical error. Invalid character. Expected '${c}', but given '${ch}' ") return error("Lexical error. Invalid character. Expected '${c}', but given '${ch}' ") } fun advance() { if (pos >= source.length) { ch = 0.toChar() } else { ch = source[pos] pos = pos + 1 } } fun isSpace(): Boolean { return ch == ' ' || ch == '\t' || ch == '\r' || ch == '\n' } fun isDelimiter(): Boolean { return ch == ')' || ch == '(' || ch == '\'' || ch == '`' || ch == ',' || ch == '@' } fun isDigit(): Boolean { return ch.isDigit() } fun token(type: Type, lexeme: String): Token { val tok = Token(type, lexeme, pos, lin, col) return tok } // Returns an error sentinel token. It is an alternative to exceptions. fun error(message: String): Token { val tok = token(Type.T_ERR, message) return tok } fun tokens(): List<Token> { val tokens = mutableListOf<Token>() while( !isEof() ){ val t = next() if(t.type == Type.T_EOF){ break } tokens.add(t) } return tokens } fun next(): Token { // Return sentinel token when at end of input if (isEof()) { return token(Type.T_EOF, "") } while (isSpace() && !isEof()) { advance() } assert( !isSpace() ) // Skip comment if( match(';')){ while( !match('\n') && !this.isEof() ){ advance() } return next() } if ( match('(') ) return token(Type.T_Lpar, "(") if ( match(')') ) return token(Type.T_Rpar, ")") if ( match('\'') ) return token(Type.T_Quote, "'") if ( check('-') && lookahed(1).isDigit() ){ return nextNumber() } if ( isDigit() ) { return nextNumber() } if ( check(QUOTE_CHAR) ) { return nextString() } return nextSymbolOrBool() // throw Exception("edge case") } private fun nextSymbolOrBool(): Token { var lexeme = "" while( !isEof() && !isSpace() && !isDelimiter() ) { lexeme += peek() advance() } if( lexeme == ""){ return token(Type.T_EOF, "") } if( lexeme == "#t" || lexeme == "#f"){ return token(Type.T_Bool, lexeme)} if( lexeme.startsWith(":") ){ return token(Type.T_Key, lexeme.drop(1)) } return token(Type.T_Sym, lexeme) } private fun nextString(): Token { // println(" [TRACE] peek() = " + peek()) var err = expect(QUOTE_CHAR) if(err.isErr()){ return err} var lexeme = "" while( !isEof() && !check(QUOTE_CHAR)) { if( match('\\') ){ if( match('n') ){ lexeme += "\n"; continue } if( match('t') ){ lexeme += "\t"; continue } // if( match('v') ){ lexeme += "\v"; continue } if( match('\\') ){ lexeme += "\\"; continue } if( match(QUOTE_CHAR) ){ lexeme += "\""; continue } //throw Exception("Tokenizer error, invalid escape character.") return error("Tokenizer error, invalid escape character.") } lexeme += peek() advance() } // println(" [TRACE] lexeme = '${lexeme}' ") err = expect(QUOTE_CHAR) if(err.isErr()){ return err} return token(Type.T_Str, lexeme) } private fun nextNumber(): Token { var lexeme = "" if( match('-')){ lexeme += "-" } if(!check('.')){ lexeme += nextInteger() } if(!check('.') && !check('e') && !check('E') ) { if( peek().isLetter() ){ // throw Exception("Invalid number. Numbers cannot terminate with letter.") return error("Invalid number. Numbers cannot terminate with letter.") } return token(Type.T_Int, lexeme) } if ( match('.') ) { lexeme += '.' + nextInteger() } val letter = peek() if( match('e') or match('E')) { lexeme += letter if( !match('-') && !isDigit()){ // throw Exception("Invalid floating point number: " + lexeme) return error("Invalid floating point number: " + lexeme) } if( match('-') ){ lexeme += "-" } if( !isDigit() ){ // throw Exception("Invalid floating point. Expected digit, but got: " + lexeme) return error("Invalid floating point. Expected digit, but got: " + lexeme) } lexeme += nextInteger() } if( peek().isLetter() ){ // throw Exception("Invalid number. It cannot end with letter.") return error("Invalid number. It cannot end with letter.") } return token(Type.T_Flt, lexeme) } private fun nextInteger(): String { var lexeme = "" while( this.peek().isDigit() && !this.isEof() ) { lexeme += peek() advance() } return lexeme } } // ----- End of class Tokenizer() -------------- // class Parser(val source: String) { // List of tokens to be processed val tokens = Tokenizer(source).tokens() // Initialize current token with a null element var current: Token = Token(Type.T_EMPTY, "", 0, 0, 0) // Position of current tokens var pos = 0 // Initialize parser state init { advance() } fun peek(): Token { return current } fun advance() { if( pos >= tokens.size ){ current = Token(Type.T_EOF, "", 0, 0, 0) return } current = tokens[pos] pos = pos + 1 } fun match(type: Type): Boolean { if( current.type == type ){ advance() return true } return false } fun check(type: Type): Boolean { return current.type == type } fun expect(type: Type): Ast { if( current.type == type ) { advance(); return Ast.AstVoid } return Ast.AstErr("Parser Error: expected token " + type + ", but given " + current) } private fun isEof(): Boolean { return current.type == Type.T_EOF } private fun isAtom(): Boolean { return peek().type == Type.T_Bool || peek().type == Type.T_Flt || peek().type == Type.T_Int || peek().type == Type.T_Sym || peek().type == Type.T_Str || peek().type == Type.T_Key } fun parseSexp(): Ast { if ( isAtom() ) { return parseAtom() } else if( check(Type.T_Lpar) ){ return parseList() } else if( match(Type.T_Quote) ){ val s = parseSexp() val lst = listOf<Ast>( Ast.AstSym("quote"), s ) val r = Ast.AstLst(lst) return r } else if( check(Type.T_ERR) ){ return Ast.AstErr(" [LEXICAL ERROR] " + current ) } return Ast.AstErr("Parser Error: unexpected invalid token: " + peek() ) } private fun parseAtom(): Ast { val lexeme = peek().text if ( match(Type.T_Int) ){ return Ast.AstInt( lexeme.toInt() ) } else if( match(Type.T_Flt) ){ return Ast.AstFlt( lexeme.toDouble() ) } else if( match(Type.T_Str) ){ return Ast.AstStr(lexeme) } else if( match(Type.T_Bool) ){ return Ast.AstBool(lexeme == "#t") } else if( match(Type.T_Key) ){ return Ast.AstKey(lexeme)} else if( match(Type.T_Sym) ){ if( lexeme == "nil" ){ return Ast.AstNil } return Ast.AstSym(lexeme) } else { return Ast.AstErr("Parser Errror: Invalid token: ${peek()}") } } private fun parseList(): Ast { var err = expect(Type.T_Lpar) if(err.isErr()){ return err} val nodes = mutableListOf<Ast>() while( !check(Type.T_Rpar) && !isEof() ) { val sexp = parseSexp() nodes.add(sexp) } err = expect(Type.T_Rpar) if(err.isErr()){ return err} val ast = Ast.AstLst(nodes) return ast } } // --- End of class Parser() ---------// object Sexp { // Get S-Expression lisp-like string representation of AST fun toStr(r: Ast): String = when(r) { // Emtpy AST => Null is Ast.AstVoid -> "<void>" // Boolean value is Ast.AstBool -> if(r.value) "#t" else "#f" // Floating point literal is Ast.AstFlt -> r.value.toString() // Integer point literal is Ast.AstInt -> r.value.toString() // Symbol literal is Ast.AstSym -> r.value // Keyword literal is Ast.AstKey -> ":" + r.value // Nul - null value is Ast.AstNil -> "nil" // String is Ast.AstStr -> "\"" + r.value + "\"" // List is Ast.AstLst -> "(" + r.value.map { toStr(it) }.joinToString( separator = " ") + ")" // Runtime error is Ast.AstErr -> "EROR: " + r.value // Environment - contains bindings between variables and values is Ast.AstEnv -> "<Env: ${r.env}>" // Primitive function is Ast.AstFunc -> "<FuncPRM: ${r.name}>" // User-defined function is Ast.AstFuncUDF -> "<FuncUDF: ${r.name}>" // Java Object is Ast.AstObj -> { // val klass = r.value::class.java.getName() // "<Object: ${klass}>" "<Obj: ${r.value}>" } //else -> "Not implemented" } fun parse(source: String): Ast { val p = Parser(source) val ast = p.parseSexp() return ast } // Checks whether two S-Expressions are equal fun equal(lhs: Ast, rhs: Ast): Boolean { if( lhs.type() != rhs.type() ){ return false } else if ( rhs.isAtom() && lhs.isAtom() ) { val t = rhs.type() if ( t == AstType.INT ){ return lhs.toInt() == rhs.toInt() } else if ( t == AstType.FLT ){ return lhs.toFlt() == rhs.toFlt() } else if ( t == AstType.STR ){ return lhs.toStr() == rhs.toStr() } else if ( t == AstType.SYM ){ return lhs.toStr() == rhs.toStr() } else if ( t == AstType.KEY ){ return lhs.toStr() == rhs.toStr() } else if ( t == AstType.ERR ){ return lhs.toStr() == rhs.toStr() } else if ( t == AstType.BOOL ){ return lhs.toBool() == rhs.toBool() } else if ( t == AstType.NIL ){ return true } else if ( t == AstType.VOID ){ return true } else { throw Exception("Edge case reached not implemented for => lhs = ${lhs} and rhs ${rhs}") } } else if( rhs.type() == AstType.LST ) { val a = lhs.nodes() val b = rhs.nodes() if( a.size != b.size){ return false } for(n in 0..(a.size - 1)) { if( !equal(a[n], b[n]) ){ return false } } return true } else { throw Exception("Reached edge case => This exception is never supposed to be thrown.") } } } class Evaluator { var logOnError = false // Default environment val _env = Ast.AstEnv() // Special forms val _forms = mutableMapOf<String, (List<Ast>, Ast.AstEnv) -> Ast>() init { _env.name = "global" reset() } fun reset() { this._env.reset() // ---- Register primitive special forms --------------// // Sepcial form for updating an already defined variable addSpecialForm("set", { args, env -> this.form_set(args, env) } ) // Special form for defining a new variable addSpecialForm("def", { args, env -> this.form_def(args, env )} ) // If-else special form addSpecialForm("if", { args, env -> this.form_if(args, env) } ) // Function defintion or lambda special form addSpecialForm("fn", { args, env -> this.form_fn(args, env) } ) // Quote (quote (args ...)) sepcial form addSpecialForm("quote", { args, env -> this.form_quote(args, env) } ) // Do special-form borrowed from Clojure: (do (println "x") (println "z") (+ x y z)) addSpecialForm("do", { args, env -> this.form_do(args, env)} ) // Call object default constructor (new javax.swing.Swing.JFrame) addSpecialForm("new", { args, env -> this.form_new(args, env) } ) // Clojure-like doto special form addSpecialForm("doto", { args, env -> this.form_doto(args, env)} ) // // ---- Register primitive functions ------------------ // /// addFuncMathMany("+", { a, b -> a + b }) addPrimitive("+", ::primitive_add) addPrimitive("-", ::primitive_sub) addPrimitive("*", ::primitive_mul) addPrimitive("/", ::primitive_div) addPrimitive("eval", ::primitive_eval) addPrimitive("load-script", ::primitive_loadScript) addPrimitive("read", ::primitive_read) addPrimitive("list", { args -> Ast.AstLst(args) }) addPrimitive("car", ::primitive_car ) addPrimitive("cdr", ::primitive_cdr ) addPrimitive("cons", ::primitive_cons ) addPrimitive("first", ::primitive_car ) addPrimitive("rest", ::primitive_cdr ) addPrimitive("nth", ::primitive_nth ) // Apply a function over a list addPrimitive("apply", ::primitive_apply) // Map a function over list addPrimitive("map", ::primitive_map ) // Get element of a plist (property list) that matches a keyword argument // (getf :x (list :z 200 :h 300 :x "hello")) => returns "hello" addPrimitive("getf", ::primitive_getf) // Concatenate strings addPrimitive("concat", ::primitive_concat) // addPrimitive("for-each", ::primitive_foreach ) addPrimitive("println", ::primitive_println ) addMath1Arg("m/sin", { java.lang.Math.sin(it) } ) addMath1Arg("m/cos", { java.lang.Math.cos(it) } ) addMath1Arg("m/tan", { java.lang.Math.tan(it) } ) addMath1Arg("m/exp", { java.lang.Math.exp(it) } ) addMath1Arg("m/sqrt", { java.lang.Math.sqrt(it) } ) addMath1Arg("m/log", { java.lang.Math.log(it) } ) addMath1Arg("m/log10", { java.lang.Math.log10(it) } ) // Show all avaialabe methods of a Java object AstObj addPrimitive("ref/show-methods", ::primitive_showMethods ) // Get a java class meta-object by its name as string addPrimitive("ref/class-by-name", ::primitive_getClassByName ) // Get class meta-object of a java object addPrimitive("ref/get-obj-class", ::primitive_getObjClass ) //Cast a java object to an interface by reflection // (ref/cast java.awt.Container object-here-Ast.AstObj) addSpecialForm("ref/cast", { args, env -> this.form_cast(args, env) } ) addSpecialForm("ref/static-field", { args, env -> this.form_staticField(args, env) } ) addPrimitive("action-listener", ::primitive_toActionListener) addPrimitive("java-null", ::primitive_javaNull) // PI constant set("m/PI", Ast.AstFlt( 3.141592653589793)) // Variable that references this environment object set("env", _env) } // Eval AST (Abstract Syntax Tree) by walking over it // and generate a interpreter runtime value fun eval(s: Ast, env: Ast.AstEnv): Ast = when(s){ is Ast.AstErr -> s is Ast.AstVoid -> s is Ast.AstBool -> s is Ast.AstFlt -> s is Ast.AstInt -> s is Ast.AstKey -> s is Ast.AstNil -> s is Ast.AstStr -> s is Ast.AstSym -> { // println(" [TRACE] envName = ${env.name} ; Evaluator.get =>> ${s} \n") env.get(s.toStr()) ?: Ast.AstErr("Error: (line 549) unbound symbol ${s.toStr()} in ${env.name}") } is Ast.AstEnv -> s is Ast.AstFunc -> s is Ast.AstFuncUDF -> s is Ast.AstObj -> s is Ast.AstLst -> evalApp(env, s) } fun eval(source: String): Ast { val ast = Sexp.parse(source) val res = eval(ast, _env) return res } fun eval(ast: Ast): Ast { val res = eval(ast, _env) return res } fun loadScript(fileName: String): Ast { var source: String? = null try { source = java.io.File(fileName).inputStream().readBytes().toString(Charsets.UTF_8) } catch(ex: java.io.FileNotFoundException) { return error(ex.toString()) } source = "(do\n" + source + "\n)" val ast = Sexp.parse(source) if( ast.isErr() ){ return ast } val res = this.eval(ast, _env) return res } // Get value from environment fun get(name: String): Ast? { return _env.get(name) } // Set value in environment fun set(name: String, value: Ast) { _env.set(name, value) } // Interactive Read-Print-Eval-Loop fun repl() { val sc = java.util.Scanner(System.`in`) var count = 0 while(true){ System.out.print(" In[${count}]> ") System.out.flush() // val line = sc.nextLine() var line = readLine() ?: "" if(line == ""){ continue } if(line == ".h"){ println(""" The interpreter has the following commands: + .h or .help => To display this help menu. + .q or .quit => To quit the interpreter. + .multi => To paste multi-line S-Expression in the repl + .reset or .r => To reset the interpreter state + .vars => To display all variables + .funcs => To show all functions + .verbose-on => Enable verbose mode + .verbose-off => Disable verbose mode """) continue } if(line == ".verbose-on"){ println(" [INFO] Verbose mode enabled.") this.logOnError = true continue } if(line == ".verbose-off"){ println(" [INFO] Verbose mode disabled.") this.logOnError = true continue } if(line == ".quit" || line == ".q"){ System.exit(0); break } if(line == ".reset" || line == ".r"){ this.reset() println(" [INFO] Reset repl. All variables removed. Ok.") continue } if(line == ".multi"){ println(" [INFO] Type a multi-line expression and then type (;;) when you are done.") var entry = "" while(true){ val lin = readLine() ?: "" if(lin == ";;"){ break } entry = entry + lin } println(" [INFO] Exit multi-line mode. ") line = entry } // Display all defined variables in current environment if(line == ".vars") { println(" =>> Variables: ") _env.env.forEach{ (k, v) -> if( !v.isCallable()){ println(" " + k) } } continue } // Display all defined functions in current environment if(line == ".funcs") { println(" =>> Functions: ") _env.env.forEach{ (k, v) -> if( v.isCallable()){ println(" " + k) } } continue } val result = this.eval(line) if( result.isErr() ){ println("Error: " + result.toStr()) continue } // History variables $0, $1, $2, ... // containing the evaluation results of // previous inputs entered by the user val label = "\$${count}" println(" ${label} = " + Sexp.toStr(result)) count += 1 // Variable answer always contains the result // of the last computation this.set("ans", result) this.set(label, result) } } // Evaluate application of function or special form // (function-or-form arg0 arg1 ... argN-1) private fun evalApp(env: Ast.AstEnv, lst: Ast): Ast { if(lst.size() == 0){ return Ast.AstNil } val head = lst.head() val args = lst.tail() if( head is Ast.AstSym ){ val x = head.toStr() if(x == "comment") { return Ast.AstVoid } if( _forms.contains(x) ){ val handler = _forms!!.get(x) // Apply Special form val res = handler!!.invoke(args, env) return res } } // Check whether argument is function val obj = eval(head, env) // Abort computation on error if( obj.isErr() ){ return obj } // Check whether object is a Java object (instance of a Java class) if( obj.isObj() ){ // Perform method call val r = invokeObjectMethod(obj as Ast.AstObj, args, env) return r } // Abort computation if object is not callable if( !obj.isCallable() ){ return error("Object not callable ${head}") } val res = funCall(env, obj, args) return res } // Handle function call and evalute all arguments before function application. // funCall(environment, function, list-of-arguments) private fun funCall(env: Ast.AstEnv, obj: Ast, args: List<Ast>): Ast { val evaluatedArgs = mutableListOf<Ast>() // Evaluate arguments and abort evaluation on error for(arg in args){ val r = this.eval(arg, env) if( r.isVoid() ){ continue } if( r.isErr() ){ return r} evaluatedArgs.add(r) } // ---- Evaluation of primtive function call in host language -------- // // Object is primtive function (defined in host language) if(obj is Ast.AstFunc){ // println(" [TACE] Call primitive function ${obj}") val func = obj as Ast.AstFunc val result = func.func(evaluatedArgs) return result } // ------_ Evaluation of user-defined function call -----------// // Object is UDF - User-Defined Function val func = obj as Ast.AstFuncUDF val nargs = func.args.size if(nargs != args.size ){ return error("Invalid number of arguments passed to function. It expects $nargs arguments. ") } // Create temporary environment for function evaluation val tenv = Ast.AstEnv(func.env) // Set arguments in the temporary environment for(k in (0..(nargs - 1))){ val argName = func.args[k] tenv.set(argName, evaluatedArgs[k]) } var last: Ast = Ast.AstNil // Evaluate function body and return evaluation of last form for(form in func.body) { last = this.eval(form, tenv) // abort evaluation on error if(last.isErr()){ return last } } return last } private fun error(message: String): Ast { if(this.logOnError){ System.err.println(" [ERROR] " + message) } return Ast.AstErr(message) } // Add primitive function fun addPrimitive(name: String, func: (List<Ast>) -> Ast) { _env.set(name, Ast.AstFunc(name, func)) } // Register primitive math function of 1 argument fun addMath1Arg(name: String, func: (Double) -> Double) { val func = { args: List<Ast> -> if(args.size != 1){ error("Function ${name} expects 1 argument. ") } else if( !args[0].isNum() ) { error("Expected numeric argument") } else { val a = args[0].toFlt() val r = func(a) Ast.AstFlt( r ) } } addPrimitive(name, func) } // Add special form (macro or control structure) fun addSpecialForm(name: String, form: (List<Ast>, Ast.AstEnv) -> Ast) { _forms.put(name, form) } // ----- Functions and special forms called by the interpreter --------// // Add math function of many arguments that is evaluated as: // (func a b c d e f g h ...) = // a func b func c func d func e ... // func(a, func(b, func(c, func(e, ...)))) // // For instance: '+' // (+ 10 2 5 6) = 10 + 2 + 5 + 6 private fun addFuncMathMany(name: String, func: (Double, Double) -> Double) { val value = Ast.AstFunc(name, { args -> fold1(func, args) } ) _env.set(name, value) } private fun fold1(func: (Double, Double) -> Double, args: List<Ast>): Ast { if(args.size == 0){ return Ast.AstErr("Error: function expects at leat 1 argument.") } val head = args[0] if( !(head is Ast.AstInt || head is Ast.AstFlt) ){ return error("Error: function requires numeric argument, given ${head}.") } var acc = head.toFlt() for(k in 1..(args.size - 1)){ val a = args[k] if( !(a is Ast.AstInt || a is Ast.AstFlt) ){ return error("Error: function requires numeric argument, given ${a}.") } val x = a.toFlt() acc = func(acc, x) } return Ast.AstFlt(acc) } // Read S-expression from string private fun primitive_read(args: List<Ast>): Ast { if(args.size != 1){ return error("Error: function read() requires 1 (one) agument of type string.") } if( !(args[0] is Ast.AstStr)){ return error("Function read() expects 1 argument of type string.") } val code = args[0].toStr() val ast = Sexp.parse(code) return ast } private fun primitive_eval(args: List<Ast>): Ast { if(args.size != 1){ return error("Error: function eval() requires 1 (one) agument of type string.") } val result = this.eval(args[0], _env) return result } // Get list head private fun primitive_car(args: List<Ast>): Ast { if(args.size != 1){ return error("Error: function 'car' requires at least 1 (one) agument.") } if( !args[0].isList() ){ return error("Function car expects 'list' argument. ") } val lst = args[0] as Ast.AstLst if( lst.isNil() || lst.size() == 0){ return Ast.AstNil } // The rendudant variable is 'r' is useful for debugging // since debuggers work in a line by line manner. val r = lst.head() return r } // Get list tail private fun primitive_cdr(args: List<Ast>): Ast { if(args.size != 1){ return error("Error: function 'car' requires at least 1 (one) agument.") } if( !args[0].isList() ){ return error("Function car expects list argument. ") } val lst = args[0] as Ast.AstLst if( lst.isNil() || lst.size() == 0){ return Ast.AstNil } // The rendudant variable is 'r' is useful for debugging // since debuggers work in a line by line manner. val r = Ast.AstLst(lst.tail()) return r } private fun primitive_cons(args: List<Ast>): Ast { if(args.size != 2) { return error("Error: function 'cons' requires at least 2 agument.") } if( !args[0].isAtom() ) { return error("Function cons() expects first argument be an atom.") } if( !args[1].isList() ) { return error("Function cons() expects second argument be a list or nil.") } val result = listOf(args[0]).plus( args[1].nodes() ) return Ast.AstLst(result) } // Get nth element of a list // (nth 0 lst) => Get element of list 'lst' private fun primitive_nth(args: List<Ast>): Ast { if(args.size != 2){ return error("Error: function 'nth' requires at least 2 agument.") } if( !(args[0] is Ast.AstInt) ){ return error("Function nth expects number as 1st argument. ") } if( !args[1].isList() ){ return error("Function nth expects list as 2nd argument. ") } val n = args[0].toInt() val lst = args[1] as Ast.AstLst if( lst.isNil() || lst.size() == 0){ return Ast.AstNil } if(n >= lst.size()){ return error("Error: out of bounds access.") } val r = lst.nth(n) return r } // Apply => Apply a function to a list of arguments // (apply + (list 1 2 3 5 6)) // ( (fn (x y) (+ x y)) (list 3 5)) => 7 private fun primitive_apply(args: List<Ast>): Ast { if(args.size != 2){ return error("Error: function apply() requires at least 2 agument.") } if( !args[0].isCallable() ){ return error("Function apply() expects a function as 1st argument. ") } if( !(args[1] is Ast.AstLst) ){ return error("Function apply() expects a list as 2nd argument. ") } val res = funCall(_env, args[0], args[1].nodes()) return res } // Example: (+ 1 2 5 100.252) fun primitive_add(args: List<Ast>): Ast { if(args.size == 0){ return error("Function + (add) expects at least 1 argument. No argument was given.")} var result: Ast = Ast.AstInt(0) for(x in args){ if( !x.isNum() ){ return error("Function + (add) expects number argument") } if( x is Ast.AstInt && result is Ast.AstInt){ result = Ast.AstInt( result.toInt() + x.value ) } if( x is Ast.AstInt && result is Ast.AstFlt){ result = Ast.AstFlt( result.toFlt() + x.value ) } if( x is Ast.AstFlt ){ result = Ast.AstFlt( result.toFlt() + x.value ) } } return result } fun primitive_sub(args: List<Ast>): Ast { if(args.size == 0){ return error("Function - (sub) expects at least 1 argument. No argument was given.")} var result: Ast = args[0] if( !result.isNum() ){ return error("Function (sub) arguments must be numbers.") } for(i in 1..(args.size - 1)){ val x = args[i] if( !x.isNum() ){ return error("Function - (sub) expects number argument") } if( x is Ast.AstInt && result is Ast.AstInt){ result = Ast.AstInt( result.toInt() - x.value ) } if( x is Ast.AstInt && result is Ast.AstFlt){ result = Ast.AstFlt( result.toFlt() - x.value ) } if( x is Ast.AstFlt ){ result = Ast.AstFlt( result.toFlt() - x.value ) } } return result } fun primitive_mul(args: List<Ast>): Ast { if(args.size == 0){ return error("Function * (mul) expects at least 1 argument. No argument was given.")} var result: Ast = Ast.AstInt(1) for(x in args){ if( !x.isNum() ){ return error("Function * (mul) expects number argument") } if( x is Ast.AstInt && result is Ast.AstInt){ result = Ast.AstInt( result.toInt() * x.value ) } if( x is Ast.AstInt && result is Ast.AstFlt){ result = Ast.AstFlt( result.toFlt() * x.value ) } if( x is Ast.AstFlt ){ result = Ast.AstFlt( result.toFlt() * x.value ) } } return result } fun primitive_div(args: List<Ast>): Ast { if(args.size == 0){ return error("Function '/' (div) expects at least 1 argument. No argument was given.")} var result: Ast = args[0] if( !result.isNum() ){ return error("Function '/' arguments must be numbers.") } for(i in 1..(args.size - 1)){ val x = args[i] if( !x.isNum() ){ return error("Function - '/' expects number argument") } result = Ast.AstFlt( result.toFlt() / x.toFlt() ) } return result } private fun primitive_map(args: List<Ast>): Ast { if(args.size != 2){ return error("Error: function apply() requires at least 2 agument.") } if( !args[0].isCallable() ){ return error("Function apply() expects a function as 1st argument. ") } if( !(args[1] is Ast.AstLst) ){ return error("Function apply() expects a list as 2nd argument. ") } val func = args[0] val lst = args[1].nodes() val result = mutableListOf<Ast>() for(n in lst){ val r = this.funCall(_env, func, listOf<Ast>(n)) if(r.isErr()){ return r} result.add(r) } return Ast.AstLst(result) } // Concatenate the string representation of arguments private fun primitive_concat(args: List<Ast>): Ast { var s = "" for(a in args){ if(a is Ast.AstStr){ s = s + a.toStr() continue } s = s + Sexp.toStr(a) } val result = Ast.AstStr(s) return result } // Example: (getf :title (:x 100 :y 300 :title "hello")) => result: "hello" private fun primitive_getf(args: List<Ast>): Ast { if(args.size != 2){ return error("Error: function getf() requires at least 2 agument.") } if( !(args[0] is Ast.AstKey) ){ return error("Function getf() expects a function as 1st argument. ") } if( !(args[1] is Ast.AstLst) ){ return error("Function getf() expects a list as 2nd argument. ") } if( args[1].size() == 0 || args[1].isNil() ){ return Ast.AstNil } val key = args[0].toStr() val lst = args[1].nodes() for(i in 0..(lst.size - 1)) { if(i + 1 == lst.size){ return Ast.AstNil} if(lst[i] is Ast.AstKey && lst[i].toStr() == key) { return lst[i + 1] } } return Ast.AstNil } // Show methods of a Java objects => This function is intended for interactive use // Example: // (set obj (new javax.swing.JFrame)) // (reflect/show-methods obj) => Display all object methods // (reflect/show-methods obj "method") => Display all object methods containing the string method private fun primitive_showMethods(args: List<Ast>): Ast { if(args.size < 1 || args.size > 2) { return error("Error: function requires 1 agument of AstObj.") } if( !args[0].isObj()){ return error("Function requires 1 argument of type AstObj.") } val obj = (args[0] as Ast.AstObj).value::class.java if( args.size == 1) { for(m in obj.getMethods()){ println(m) } return Ast.AstNil } if( !args[1].isStr() ){ return error("Expected string as second argument") } val search = args[1].toStr() for(m in obj.getMethods()){ val s = m.toString() if( s.contains(search, ignoreCase = true) ) { println(s) } } return Ast.AstNil } private fun primitive_getClassByName(args: List<Ast>): Ast { if(args.size != 1){ return error("Error: function requires 1 agument of type string.") } if( !(args[0] is Ast.AstStr) ){ return error("Function requires 1 argument of type string.") } val className = args[0].toStr() try{ val cls = Class.forName(className) return Ast.AstObj(cls as Object) } catch(ex: java.lang.ClassNotFoundException) { return error(ex.toString()) } } private fun primitive_getObjClass (args: List<Ast>): Ast { if(args.size != 1){ return error("Error: function reflect/get-obj-class() requires 1 agument of type string.") } if( !args[0].isObj()){ return error("Function requires 1 argument of type AstObj.") } val cls = (args[0] as Ast.AstObj).value::class.java return Ast.AstObj(cls as Object) } // // For-each => Apply a function with side-effect to each element of a list // // (for-each print (list 1 2 5 10)) // private fun primitive_foreach(args: List<Ast>): Ast // { // if(args.size != 2){ return error("Error: function 'for-each' requires at least 2 agument.") } // if( !(args[0].isCallable() ){ return error("Function for-each expects a function as 1st argument. ") } // return r // } // (print "string ..." arg1 arg2 ... argN-1) private fun primitive_println(args: List<Ast>): Ast { for(a in args){ if ( a is Ast.AstStr ){ System.out.print(a.value) } else if( a is Ast.AstInt ){ System.out.print(a.value) } else if( a is Ast.AstFlt ){ System.out.print(a.value) } else if( a is Ast.AstBool){ Sexp.toStr(a) } else if( a is Ast.AstNil){ System.out.print("nil") } else { System.out.print(Sexp.toStr(a)) } System.out.print(" ") } System.out.print("\n") return Ast.AstNil } // Function used to pass lambda to arguments of type java.awt.event.ActionListener // Example: // (set btn (doto (new javax.swing.JButton) (setLabel "hello")) // (btn addActionListener // (action-listener (fn () (println "clicked") ))) private fun primitive_toActionListener(args: List<Ast>): Ast { if( args.size != 1 ){ return error("Function expects 1 argument.") } if( !args[0].isCallable() ){ return error("Function expectes 1 argument of type function.") } // import java.awt.event.ActionEvent; // import java.awt.event.ActionListener; val self = this val listener = object: java.awt.event.ActionListener { override fun actionPerformed(evt: java.awt.event.ActionEvent) { self.funCall(self._env, args[0], listOf()) } } val ret = Ast.AstObj(listener as Object) return ret } // Return a Java Null object private fun primitive_javaNull(args: List<Ast>): Ast { if( args.size !=0 ){ return error("Function expects 0 argument.") } //val ret = Ast.AstObj(null) val ret = Ast.AstNil return ret } // Load file (load-script "script.jlsp") private fun primitive_loadScript(args: List<Ast>): Ast { if( args.size != 1 && !args[0].isStr() ){ return error("Function expects 1 argument of type string.") } val fileName = args[0].toStr() var source: String? = null try { source = java.io.File(fileName).inputStream().readBytes().toString(Charsets.UTF_8) } catch(ex: java.io.FileNotFoundException) { return error(ex.toString()) } println(" [========>>> Loaded Code <<<============]") println(source) source = "(do\n" + source + "\n)" val ast = Sexp.parse(source) if( ast.isErr() ){ return ast } val res = this.eval(ast, _env) return res } // Update an aready existing variablr // Special form set => (set x (+ 1 2 4 6)) // Example: (def x 100) (set x (+ x 1)) ... private fun form_set(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size != 2 ){ return error("Ill-formed special form set. Expected 2 arguments.") } if( !args[0].isSym() ){ return error("Ill-formed special form 'set'. Expected symbol as first argument.") } val name = args[0].toStr() val expr = args[1] if( !env.has(name) ){ return error("Unbound variable ${name}") } val result = this.eval(expr, env) if( result.isErr() ){ return result } env.setExisting(name, result) return result } // Define a new variable // (def x 2000) private fun form_def(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size != 2 ){ return error("Ill-formed special form def. Expected 2 arguments.") } val name = args[0] if( !name.isSym() ){ return error("Ill-formed special form 'def'. Expected symbol as first argument.") } val expr = args[1] val result = this.eval(expr, env) if( result.isErr() ){ return result } env.set(name.toStr(), result) return result } // Quote special form '(+ 1 2 3) or (quote (+ 1 2 3)) private fun form_quote(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size != 1 ){ return error("Ill-formed special form 'quote'. Expected 1 argument.") } val expr = args[0] return expr } // Special form 'do' - equivalent to 'begin' in Scheme. // (do (println x) (println y) (+ x y z)) private fun form_do(args: List<Ast>, env: Ast.AstEnv): Ast { var result: Ast = Ast.AstNil for(form in args){ result = this.eval(form, env) } return result } // IF-else special form: // =>> (if <cond> <then>) or // =>> (if <cond> <then> <else>) // (if (> x 10) (print "greater than 10") (print "less than 10")) // (+ z (if (> x 10) (* 3 x) (+ 10 x ) )) private fun form_if(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size < 2 || args.size > 3) { return error("Ill-formed special form if-else. Expected at least 2 arguments.") } val cond = args[0] // Evaluate condition/predicate val cond_result = this.eval(cond, env) // Short-circuit - abort computation on error if(cond_result.isErr()){ return cond_result} if(cond_result.asBool()){ val then_ast = args[1] val then_result = this.eval(then_ast, env) return then_result } if( args.size == 2 ) { return Ast.AstNil } val else_ast = args[2] val else_result = this.eval(else_ast, env) return else_result } // Lambda special form FN => function definition or lambda // Lambda function =>> (fn (x) (+ x 1)) // Function defintion =>> (fn increment (x) (+ x 1)) // // private fun form_fn(args: List<Ast>, env: Ast.AstEnv): Ast { if(args.size < 2){ return error("Ill-formed syntax for lambda. The function requires at least 2 arguments.") } val n = args[0] // Function name val name = if(n.isSym()) n.toStr() else "" // List of arguments val pargs = if(n.isSym()) args[1] else n if( !(pargs is Ast.AstLst) ){ return error("Special form FN expected list of arguments.") } val _args = pargs.nodes().map { it.toStr() } val k = if (n.isSym()) 2 else 1 // println(" [TRACE] name = ${name} ; args = ${_args}") val body = mutableListOf<Ast>() for (i in k..(args.size - 1)) { body.add( args[i] ) } // User-defined function val p = Ast.AstEnv(env) // p.name = "Function: " + name // println(" [TRACE] (890) Function def =>> env.name = ${p.name} p = ${p}") val udf = Ast.AstFuncUDF(name, _args, body, p ) // Register function in evironment if it is a non-lambda function if(name != ""){ env.set(name, udf) } return udf } // Special form borrowed from Clojure that allows calling // multiple methods of a java object with less repetition. // Example: // (set frame (new javax.swing.JFrame)) // (doto frame // (setSize 400 500) ;; size // (setTitle "My frame") ;; set title // (show)) ;; Show // // --------- It is simila rto // frame.setSize(400 500) // frame.setTitle("Mytitle") private fun form_doto(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size < 2 ){ return error("Ill-formed special form 'quote'. Expected 1 argument.") } val ref = args[0] val obj = eval(ref, env) if( obj.isErr() ){ return obj } if( !obj.isObj() ){ return error("Expected an instance of Java object, but got ${ Sexp.toStr(obj) }") } val _args = args.takeLast( args.size - 1) // println(" [TRACE] args = " + _args) for(a in _args){ if( !(a is Ast.AstLst) ){ return error("Expected list argument, but got ${ Sexp.toStr(a) }") } // println(" [TRACE] =>> a = ${ Sexp.toStr(a) }") var argList = a as Ast.AstLst val res = this.invokeObjectMethod(obj as Ast.AstObj, argList.value, env) if( res.isErr() ){ return res} } return obj } private fun objToASt(arg: Any): Ast { if( arg == null){ return Ast.AstNil } else if( arg is Int) { return Ast.AstInt(arg as Int) } else if( arg is Double ){ return Ast.AstFlt(arg as Double) } else if( arg is Boolean ){ return Ast.AstBool(arg as Boolean) } else if( arg is String ) { return Ast.AstStr(arg as String) } else { return Ast.AstObj(arg as Object) } } // Special form for invoking consturctor of Java objects, for instance // for building a Java Swing // (set frame (new javax.swing.JFrame arg0 arg1 ... argN-1)) private fun form_new(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size < 1 ){ return error("Ill-formed special form 'new'. Expected at least argument.") } if( !args[0].isSym() ) { return error("Ill-formed special form 'new'. Expected firstargument to be symbol.") } val klassName: String = args[0].toStr() var klass: java.lang.Class<*>? = null try { // Attempt to retrieve class metaobject klass = Class.forName(klassName) } catch(ex: java.lang.ClassNotFoundException) { return error(ex.toString()) } if( args.size == 1) { // Create object using reflection and default costructor val obj = klass.newInstance() return Ast.AstObj(obj as Object) } val types = ArrayList<Class<*>>() val invokedArgs = ArrayList<Any>() val _args = args.takeLast( args.size - 1) // Attempt to evalute arguments before method application // and abort if an error happens. for(arg in _args){ val a = this.eval(arg, env) if( a.isErr() ){ return a} if( a.isVoid() ){ continue } if (a is Ast.AstBool){ types.add(Boolean::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstFlt){ types.add(Double::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstInt){ types.add(Int::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstStr){ types.add(String::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstObj){ types.add(a.value::class.java) invokedArgs.add(a.value) } else return error("Special form new() not implemented for argument of type ${a} ") } try { val ctor = klass.getConstructor(*types.toTypedArray()) val obj = ctor.newInstance(*invokedArgs.toTypedArray()) if( obj == null){ return Ast.AstNil } val r = objToASt(obj) return r } catch(ex: java.lang.NoSuchMethodException) { return error(ex.toString()) } catch(ex: Exception) { return error(ex.toString()) } } // Special form for getting static field of a class at runtime through reflection // Example: (static-field java.awt.Color Red) private fun form_staticField(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size < 1 ){ return error("Ill-formed special form 'new'. Expected at least argument.") } if( !args[0].isSym() && !args[1].isSym() ) { return error("Ill-formed special form 'new'. Expected firstargument to be symbol.") } val klassName: String = args[0].toStr() val field = args[1].toStr() var klass: java.lang.Class<*>? = null try { // Attempt to retrieve class metaobject klass = Class.forName(klassName) } catch(ex: java.lang.ClassNotFoundException) { return error(ex.toString()) } try { val field = klass!!.getField(field).get(null) val result = Ast.AstObj(field as Object) return result } catch(ex: java.lang.NoSuchFieldException) { return error(ex.toString()) } } // Attempt to cast object to a class or interface // this method is useful for calling overloding methods // Example: // (set fr (doto (new javax.swing.JButton) setLabel "Click me")) // (cast java.awt.Container fr) private fun form_cast(args: List<Ast>, env: Ast.AstEnv): Ast { if( args.size < 2 ){ return error("Ill-formed special form 'cast'. Expected 1 argument.") } if( !args[0].isSym() ) { return error("Ill-formed special form 'cast'. Expected first argument to be symbol.") } // Evaluate first argument val arg1 = this.eval(args[1], env) if( !arg1.isObj() && !arg1.isNil() ){ return error("Ill-formed special form 'cast'. Expected java object type as 2nd argument.") } val obj: Object? = if( arg1.isNil() ) { null } else { (arg1 as Ast.AstObj).value } try { // Attempt to retrieve class metaobject val klass = Class.forName(args[0].toStr()) val ret = klass.cast(obj) return Ast.AstObj(ret as Object) } catch(ex: java.lang.ClassNotFoundException) { return error(ex.toString()) } catch(ex: java.lang.ClassCastException ) { return error(ex.toString()) } } // Invoke/call method of java method using reflection // Example: // (set frame (new javax.swing.JFrame)) // (frame setSize 400 500) ; invoke object method => frame.setSize(400, 500) // (frame setVisible #t ) ; set object visible => frame.setVisible(true) private fun invokeObjectMethod(obj: Ast.AstObj, args: List<Ast>, env: Ast.AstEnv): Ast { val x: Object = obj.value if( args.size < 1){ return error("Method requires a method name as symbol.") } // Object method val m = args[0] // Get a property or set a property // (object :property) => Get a property value => object.property // (object :property value) => Set a property value => object.property = value if( m.isKey() ) { val property = args[0].toStr() // Getter (Object :property) => Similar to object.property in infix notation if( args.size == 1) { val method = "get" + property.capitalize() val argList = listOf( Ast.AstSym(method) ) val res = invokeObjectMethod(obj, argList, env) return res // Setter (Object :property value) => Similar to object.property = value in infix notation } else if( args.size == 2) { val method = "set" + property.capitalize() val values = this.eval(args[1], env) if(values.isErr()){ return values } var argList: List<Ast> = if( values.isList() ) { listOf( Ast.AstSym(method) ).plus( values.nodes() ) } else { listOf( Ast.AstSym(method), values ) } val res = invokeObjectMethod(obj, argList, env) return res } return error("""Invalid property call. Example of valid call: (object :property) or (object :property value) """) } if( !m.isSym() ) { return error("Invalid method call, expected symbol") } val methodName = m.toStr() // Remove first element of args list val _args = args.takeLast( args.size - 1) // Java metaclass, meta object => an object that describes a class val klass = x::class.java val types = ArrayList<Class<*>?>() val invokedArgs = ArrayList<Any?>() // Attempt to evalute arguments before method application // and abort if an error happens. for(arg in _args){ val a = this.eval(arg, env) if( a.isErr() ){ return a} if (a is Ast.AstBool){ // invokedArgs.add(java.lang.Boolean(a.value) as Object) types.add(Boolean::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstFlt){ types.add(Double::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstInt){ types.add(Int::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstStr){ types.add(String::class.java) //invokedArgs.add(a.value as Object) invokedArgs.add(a.value) } else if (a is Ast.AstObj){ types.add(a.value::class.java) invokedArgs.add(a.value) } else if (a is Ast.AstNil){ invokedArgs.add(null) } else return error("Not implemented invokeObjectMethod() for argument of type {a} ") } try { // Attempt to get method metaObject val mocMethod = klass.getMethod(methodName, *types.toTypedArray()) val result = mocMethod.invoke(x, *invokedArgs.toTypedArray()) if( result == null){ return Ast.AstNil} val r = objToASt(result) return r } catch(ex: java.lang.NoSuchMethodException) { // println(" [TRACE] Second attempt to find method.") val result = this.invoke(x, methodName, invokedArgs) return result } catch(ex: Exception) { return error(ex.toString()) } } // Invoke object method using reflection //private fun invoke(targetObject: Object, methodName: String, parameters: List<Object> ): Ast private fun invoke(targetObject: Object, methodName: String, parameters: List<Any?> ): Ast { // println(" [TRACE] parameters.size = ${parameters.size}") for (method in targetObject::class.java.getMethods()) { if ( method.getName() != methodName) { continue } val parameterTypes = method.getParameterTypes() if( parameterTypes.size != parameters.size ){ continue } // System.out.println("\n ----------------------------"); // System.out.println(" [TRACE] Found method " + method); var matches = true; for (i in 0..(parameterTypes.size - 1)) { val p = parameterTypes[i]; // System.out.println(" [TRACE] p[" + i + "] = " + p) val param = parameters[i] val klass = if(param == null) null else param!!::class.java // println(" [TRACE] klass = ${klass} ") if( klass == null){ continue } //if (!p.isAssignableFrom( param.getClass())) if (!p.isAssignableFrom(klass)) { matches = false; break; } } if (matches) { // System.out.println(" [TRACE] Method invoked "); // obtain a Class[] based on the passed arguments as Object[] try { // println(" [TRACE] parameters = ${parameters}") val result = method.invoke(targetObject, *parameters.toTypedArray()); if( result == null){ return Ast.AstNil} val r = objToASt(result) return r } catch(ex: Exception) { return error( ex.toString() ) } } } // --- End for ---------- // return error("Method ${methodName} not found.") } } // ---- End of class Eval --------- // // ----- Primitive unit test framework --------// object Assert { fun assert(predicate: Boolean, message: String = "Assertion failure") { if(!predicate){ throw AssertionError(message) } } fun<T> equal(expected: T, output: T) { if(expected != output){ throw AssertionError("Equality test failure. Expected ${expected}, but got ${output}.") } } fun astEqual(expected: Ast, output: Ast) { val expectedStr = Sexp.toStr(expected) val outputStr = Sexp.toStr(output) if( !Sexp.equal(expected, output) ){ throw AssertionError("AST equality test failure. \n => Expected ${expectedStr}, \n => but got ${outputStr}.") } } fun<T> listEqual(expected: List<T>, output: List<T>) { val size = expected.size if( expected.size != output.size ){ var msg = "List equality test failure. Expected ${expected} (${expected.size} elements)," msg += " but got ${output} (${output.size} elements). " throw AssertionError(msg) } for(n in 0..(size -1) ) { val a = expected[n] val b = output[n] if(a == b){ continue } var msg = """ List equality test failure. Expected ${expected} (${expected.size} elements) Element ${n} expected to be ${a}, but got ${b}. """.trimIndent() throw AssertionError("Equality test failure. Expected ${expected}, but got ${output}.") } } fun<T> notEqual(expected: T, output: T) { if(expected == output){ throw AssertionError("Inequality test failure. Not expected ${expected}, but got ${output}.") } } } enum class TestStatus { // Indicates successful execution of a test // (assertion does not thrown an AssertionError) PASSED // Indicates that the assertion predicate was evaluated to false // and the assumption made by the assertion failed. , FAILURE // Indicates an unexpected error - not an AssertionError , ERROR } data class TestCase(val name: String, val func: () -> Unit) data class TestResult(val name: String, val status: TestStatus, val exception: Throwable?) { fun isPassed(): Boolean { return status == TestStatus.PASSED } fun isFailure(): Boolean { return status == TestStatus.FAILURE } fun isError(): Boolean { return status == TestStatus.ERROR } } class TestSet { private var name: String = "" private val testcases = mutableListOf<TestCase>() private var results = mutableListOf<TestResult>() fun addTest(name: String, testFunc: () -> Unit) { val test = TestCase(name, testFunc) testcases.add(test) } // Print all test cases and their identifiers. fun showTests() { var count = 0 for(t in testcases){ println(" [${count}] => ${t.name} ") count = count + 1 } } fun runTest(test: TestCase): TestResult { try { test.func() return TestResult(test.name, TestStatus.PASSED, null) } catch(ex: AssertionError){ return TestResult(test.name, TestStatus.FAILURE, ex) } catch(ex: Exception){ return TestResult(test.name, TestStatus.ERROR, ex) } } // Run a single test case given its number (unique identifier) fun runTestByNumber(n: Int) { if(n >= testcases.size){ System.err.println(" [ERROR] Test case with this number not found ${n}") return } val test = testcases[n] val r = runTest(test) if ( r.status == TestStatus.PASSED ) { println("[PASSED] Test '${test.name}' (id = ${n}) passed. Ok.") } else if( r.status == TestStatus.FAILURE ) { println(" [FAILED] Test '${test.name}' (id = ${n}) failed. Assertion error. ") println() // println(r.exception) r.exception!!.printStackTrace() } else if( r.status == TestStatus.ERROR ) { println(" [ERROR ] Test '${test.name} (id = ${n}) error. Unexpected exception.") println() // println(r.exception) r.exception!!.printStackTrace() } } fun runAll() { var count = 0 var passed = 0 var failed = 0 var error = 0 for(t in this.testcases) { val r = runTest(t) results.add(r) if (r.status == TestStatus.PASSED ){ passed += 1 } else if(r.status == TestStatus.FAILURE ){ failed += 1 } else if(r.status == TestStatus.ERROR ){ error += 1 } count = count + 1 } val passedPercent = 100.0 * passed / count val failedPercent = 100.0 * failed / count val errorPercent = 100.0 * error / count println("------------------------ TEST REPORT ----------------------------------\n") if( failed == 0 || error == 0 ){ println(" [*] SUCCESS - ALL TESTS PASSED OK. ") } println(" [*] Passed = ${passed} (${passedPercent}%) / Failures = ${failed} (${failedPercent}%) / Error = ${error} (${errorPercent}%) / Total = ${count}") var id = -1 for(r in results){ id = id + 1 if(r.isFailure()){ println("\n [+] Test Failure => name: '${r.name}' (id = ${id}) / Assertion Error ") println("........................................................................") println() // println(r.exception) r.exception!!.printStackTrace() continue } if(r.isError()){ println("\n [+] Test Error => name: '${r.name}' (id = ${id}) / Unexpected Exception") println("........................................................................") println() // println(r.exception) r.exception!!.printStackTrace() continue } } } } // ---------- Program Entry Point --------------------// fun main(args: Array<String>) { val interp = Evaluator() if(args.size == 0){ return } if( args[0] == "repl"){ interp.eval(startCode) println(" =====> Interactive S-Expression interpreter started. Ok. ") println(" [*] Type .help or .h for showing the help menu.") println() interp.repl() return } if( args[0] == "script") { if( args.size < 2){ System.err.println("ERROR - exepcted script file name") System.err.println("Usage: java -jar sexpr.jar script <FILE-NAME>") return } val fileName = args[1] interp.loadScript(fileName) return } if( args[0] == "tests") { val testSet = TestSet() // Create a test case testSet.addTest("test parser - case parse a simple list", { val src = "(list \"some string\" 100.60 symbol :akeyword :x )" val expected = Ast.AstLst( listOf( Ast.AstSym("list"), Ast.AstStr( "some string" ), Ast.AstFlt(100.60) , Ast.AstSym("symbol"), Ast.AstKey("akeword") , Ast.AstKey("x") //, Ast.AstLst( listOf( Ast.AstSym("+"), Ast.AstInt(1), Ast.AstInt(2), Ast.AstInt(3) ) ) )) val ast: Ast = Sexp.parse(src) // val nodes = ast.nodes() Assert.assert( Sexp.equal(expected, ast) ) }) testSet.addTest("test parser -parse a nested list", { val src = "(list \"some string\" 100.60 symbol :akeyword :x (+ 1 2 sym) )" val expected = Ast.AstLst( listOf( Ast.AstSym("list"), Ast.AstStr( "some string" ), Ast.AstFlt(100.60) , Ast.AstSym("symbol"), Ast.AstKey("akeyword") , Ast.AstKey("x") , Ast.AstLst( listOf( Ast.AstSym("+"), Ast.AstInt(1), Ast.AstInt(2), Ast.AstSym("sym") ) ) )) val ast: Ast = Sexp.parse(src) Assert.astEqual(expected, ast) }) testSet.addTest("test intepreter - eval () empty list evaluation", { val interpreter = Evaluator() val src = " () ; empty list " val expected = Ast.AstNil val ast: Ast = Sexp.parse(src) val result = interpreter.eval(ast) Assert.astEqual(expected, result) }) testSet.addTest("test intepreter - expression evaluation", { val interpreter = Evaluator() val src = " (list :x (* 1 2 3 4 5) '(sqrt 100) :y (/ (+ (* 3 5 1) 20) 100) :z \"hello world\") " val astExpected = Sexp.parse(" (:x 120 (sqrt 100) :y 0.35 :z \"hello world\" ) ") val result = interpreter.eval(src) Assert.astEqual( astExpected, result ) }) testSet.addTest("test intepreter - quoted expression evaluation", { val interpreter = Evaluator() val src = " '(list (* 3 5) x 'z 'w :keyword \"some string\" ) " val astExpected = Sexp.parse(" (list (* 3 5) \n\t x 'z \n'w :keyword \"some string\" ) ") val result = interpreter.eval(src) Assert.astEqual( astExpected, result ) }) testSet.addTest("test intepreter - lambda function application ", { val interpreter = Evaluator() val src = "(map (fn (x) (+ 5 (* 10 x))) (list 1 2 5 -6 3.5 9.0 (+ 1 2 3 4)))) " val astExpected = Sexp.parse(" (15 25 55 \t\n-55 40.0\n95.0 105 ) ; comment expected result ") val result = interpreter.eval(src) Assert.astEqual( astExpected, result ) }) if( args.size == 1){ testSet.runAll() } else if (args.size == 2 && args[1] == "--all") { testSet.runAll() } else if(args.size == 2 && args[1] == "--list") { testSet.showTests() } else if(args.size == 3 && args[1] == "--single") { val n = args[2].toInt() testSet.runTestByNumber(n) } else { System.err.println(" Error: invalid option.") } return } }
File: swing.jlisp => Sample script that creates a Java Swing GUI at runtime.
;;----------------------------------------------;; ;; Lisp Script ;; ;;----------------------------------------------;; (comment Anything in this comment special form is not evaluted.) (fn make-color (r g b) (new java.awt.Color r g b)) ;; ------- Color contants ------------------------;; (def color-light-blue (make-color 51 204 255)) (def color-red (make-color 255 0 0)) (def color-white (make-color 255 255 255)) (def color-black (make-color 0 0 0 )) (def label (new javax.swing.JLabel "my label")) (label setBounds 200 10 200 30 ) (def output (new javax.swing.JTextArea)) (def entry (new javax.swing.JTextArea)) (def btn-run (new javax.swing.JButton "Run")) (btn-run setBounds 10 60 100 30) (def btn-demo (doto (new javax.swing.JButton) (setName "btn-demo") (setLabel "Clik me") (setBounds 10 10 100 30) )) (btn setBackground color-light-blue) (def scroll-entry (doto (new javax.swing.JScrollPane) (setViewportView entry) (setBounds 30 120 500 100) )) (def scroll-output (new javax.swing.JScrollPane)) (scroll-output setViewportView output) ;; (setBounds x y width height) => (x, y) origin at top-left corner (scroll-output setBounds 30 250 500 250) (def frame (doto (new javax.swing.JFrame) (:name "frame1") (:size '(632 568)) (:title "My Java Swing App") (setLayout nil) ;; (setContentPane pane) ;; (:visible #t) )) (frame add btn-demo) (frame add btn-run) (frame add label) (frame add scroll-entry) (frame add scroll-output) (def count 0) (btn-demo addActionListener (action-listener (fn () (def text (concat "Button clicked " count)) (set count (+ 1 count)) ;; (frame :title text) (label :text text) (frame setTitle text) ))) (btn-run addActionListener (action-listener (fn () (println "Btn-run clicked OK") (def result (eval (read (concat "(do\n" (entry getText) "\n)")))) (println result) (output setText (concat result)) (println "Code evaluated. OK.") ))) ((frame getContentPane) :background color-light-blue) (frame setVisible #t) (println " [TRACE] Script started Ok. ") ;;; ------------------------------------------ ;;; (fn jframe-setSize (fr width height) (fr setSize width height)) (fn jframe-setVisible (fr flag) (fr setVisible flag)) (fn build-frame (title width height) (doto (new javax.swing.JFrame) (setSize height height) (setTitle title) (setVisible #t) ) ) (println " [TRACE ] Code evaluated OK. ")
Check compiler version:
$ kotlin -version Kotlin version 1.7.10-release-333 (JRE 16.0.1+9-Ubuntu-120.04)
Building:
$ kotlinc sexpr.kt -include-runtime -d sexpr.jar
Inspect the generated file:
$ file sexpr.jar
sexpr.jar: Java archive data (JAR)
Run unit tests:
# ------ List all tests -----------------------# # $ >> java -jar sexpr.jar tests --list [0] => test parser - case parse a simple list [1] => test parser -parse a nested list [2] => test intepreter - eval () empty list evaluation [3] => test intepreter - expression evaluation [4] => test intepreter - quoted expression evaluation [5] => test intepreter - lambda function application # --------- Run a single unit test -------------# # $ java -jar sexpr.jar tests --single 2 [PASSED] Test 'test intepreter - eval () empty list evaluation' (id = 2) passed. Ok. $ java -jar sexpr.jar tests --single 4 [PASSED] Test 'test intepreter - quoted expression evaluation' (id = 4) passed. Ok. #-------- Run all unit tests -------------------# # $ >> java -jar sexpr.jar tests ------------------------ TEST REPORT ---------------------------------- [*] SUCCESS - ALL TESTS PASSED OK. [*] Passed = 6 (100.0%) / Failures = 0 (0.0%) / Error = 0 (0.0%) / Total = 6
Run the REPL:
$ >> rlwrap java -jar sexpr.jar repl [TRACE ] Code evaluated OK. =====> Interactive S-Expression interpreter started. Ok. [*] Type .help or .h for showing the help menu. ; ------------ Show the REPL help menu -----------; In[23]> .h The interpreter has the following commands: + .h or .help => To display this help menu. + .q or .quit => To quit the interpreter. + .multi => To paste multi-line S-Expression in the repl + .reset or .r => To reset the interpreter state + .vars => To display all variables + .funcs => To show all functions + .verbose-on => Enable verbose mode + .verbose-off => Disable verbose mode ; ------------ Playing with lits --------------------; ; In[0]> In[0]> (list '+ (* 3 5 1) :y (m/sqrt (+ (* 3.0 3.0) (* 4.0 4.0))) 'symbol 'x 'z '(+ 1 2 3 5))) $0 = (+ 15 :y 5.0 symbol x z (+ 1 2 3 5)) In[1]> In[1]> $0 $1 = (+ 15 :y 5.0 symbol x z (+ 1 2 3 5)) In[2]> (car $0) $2 = + In[3]> (cdr $0) $3 = (15 :y 5.0 symbol x z (+ 1 2 3 5)) In[4]> (car (cdr $0)) $4 = 15 In[5]> (cdr (cdr $0)) $5 = (:y 5.0 symbol x z (+ 1 2 3 5)) In[6]> (car (cdr (cdr $0))) $6 = :y ; --------- Ge the nth element of a list ------------ In[10]> (nth 1 $0) $10 = 15 In[11]> (nth 2 $0) $11 = :y In[12]> (nth 4 $0) $12 = symbol In[13]> (nth 5 $0) $13 = x In[14]> (nth 6 $0) $14 = z ; ------------ Define variables ------------------; In[15]> (def r '(+ (* 3 5 10) (/ 100 20 5) (m/exp 1))) $15 = (+ (* 3 5 10) (/ 100 20 5) (m/exp 1)) In[16]> r $16 = (+ (* 3 5 10) (/ 100 20 5) (m/exp 1)) In[17]> (eval r) $17 = 153.71828182845906 In[18]> (nth 1 r) $18 = (* 3 5 10) In[19]> (eval (nth 1 r)) $19 = 150 ;----------- Functions ---------------------------------; (def cnt 0) (fn increment() (set cnt (+ cnt 1)) (println "cnt = " cnt)) In[25]> (fn increment() (set cnt (+ cnt 1)) (println "cnt = " cnt)) $25 = <FuncUDF: increment> In[26]> In[26]> (increment) cnt = 1 $26 = nil In[27]> (increment) cnt = 2 $27 = nil In[28]> (increment) cnt = 3 $28 = nil In[29]> (increment) cnt = 4 $29 = nil In[30]> (increment) cnt = 5 $30 = nil In[31]> (increment) cnt = 6 $31 = nil In[32]> (increment) cnt = 7 $32 = nil In[33]> cnt $33 = 7 In[34]> (list (+ cnt 10) (* 3 cnt) (* 5 (+ 10 cnt))) $34 = (17 21 85) ;----- Functions that return functions ------------------; In[35]> (fn make-add (k) (fn (x) (+ k x))) $35 = <FuncUDF: make-add> In[36]> ((make-add 10) 5) $36 = 15 In[37]> ((make-add 10) 80) $37 = 90 In[38]> ((make-add 20) 5) $38 = 25 In[39]> ((make-add 20) 80) $39 = 100 In[40]> (def add10 (make-add 10)) $40 = <FuncUDF: > In[41]> (def add20 (make-add 20)) $41 = <FuncUDF: > In[42]> (add20 50) $42 = 70 In[43]> (add10 50) $43 = 60 In[44]> (map (make-add 10) (list 1 2 3 4 5 6)) $44 = (11 12 13 14 15 16) In[45]> (map (make-add 20) (list 1 2 3 4 5 6)) $45 = (21 22 23 24 25 26) In[46]> ;------- Interact with Java Objects --------------------; In[46]> (def frame (new javax.swing.JFrame "My Window")) $46 = <Obj: javax.swing.JFrame[frame0,0,0,0x0,invalid,hidden,layout=java.awt.BorderLayout ,title=My Window,resizable,normal,defaultCloseOperation=HIDE_ON_CLOSE ,rootPane=javax.swing.JRootPane[,0,0,0x0,invalid,layout=javax.swing.JRootPane$RootLayout ,alignmentX=0.0,alignmentY=0.0,border=,flags=16777673,maximumSize=,minimumSize=,preferredSize=] ,rootPaneCheckingEnabled=true]> In[48]> (frame :size '(400 500)) ; equivalent to frame.size = (400 500) property $48 = nil In[49]> (frame setSize 400 500) ; equivalent to frame.setSize(400, 500) $49 = nil In[50]> (frame setTitle "My Java Swing Lisp") $50 = nil In[51]> (frame setVisible #t) ;; Equivlanet to frame.setVisible(true) $51 = nil In[52]> (frame :visible #f) ; Property call. frame.visible = false $52 = nil In[53]> (frame :visible #t) ; Property call. frame.visible = true $53 = nil In[54]> ; Get object class object ;---------------------------------------------------; In[55]> (ref/get-obj-class frame) $55 = <Obj: class javax.swing.JFrame> ; Display all methods of object javax.swing.JFrame ;---------------------------------------------------; In[56]> (ref/show-methods frame) public void javax.swing.JFrame.remove(java.awt.Component) public void javax.swing.JFrame.update(java.awt.Graphics) public static boolean javax.swing.JFrame.isDefaultLookAndFeelDecorated() public javax.swing.JRootPane javax.swing.JFrame.getRootPane() public void javax.swing.JFrame.setJMenuBar(javax.swing.JMenuBar) public javax.swing.JMenuBar javax.swing.JFrame.getJMenuBar() ... ... ... ... .... ... .... ... ... ... ... ... ... .... ... .... ... ... ; Display all methods of object javax.swing.JFrame containing the string "visible" ;----------------------------------------------------------------- In[47]> (ref/show-methods frame "visible") public void java.awt.Window.setVisible(boolean) public boolean java.awt.Component.isVisible() $57 = nil ; ----- Quit the interpreter --------------------------;; In[59]> .q
Loading script swing.jlisp, in the REPL, that creates a Java Swing GUI application:
$ >> rlwrap java -jar sexpr.jar repl [TRACE ] Code evaluated OK. =====> Interactive S-Expression interpreter started. Ok. [*] Type .help or .h for showing the help menu. In[0]> In[0]> (load-script "swing.jlisp") [========>>> Loaded Code <<<============] ;;----------------------------------------------;; ;; Lisp Script ;; ;;----------------------------------------------;; ... ... ... ... ... .. (println " [TRACE ] Code evaluated OK. ") [TRACE] Script started Ok. [TRACE ] Code evaluated OK. $0 = nil ; ...... Show all variables in the repl ---------;; In[1]> .vars =>> Variables: m/PI env color-light-blue color-red color-white color-black label output entry btn-run btn-demo scroll-entry scroll-output frame count ans $0 In[2]> (m/sin m/PI) $2 = 1.2246467991473532E-16 In[3]> (m/cos (* 2 m/PI)) $3 = 1.0 In[4]> (m/tan (* 2 m/PI)) $4 = -2.4492935982947064E-16 In[5]> (frame getTitle) $5 = "My Java Swing App" In[6]> (frame :title) $6 = "My Java Swing App" ; =>> User clicks at the button with the label 'Click me' In[8]> (label getText) $8 = "Button clicked 25" In[9]> In[9]> (label :text) $9 = "Button clicked 25" In[10]> In[10]> (label :text) $10 = "Button clicked 29" In[11]> In[11]> (label :text) $11 = "Button clicked 32" In[12]> In[12]> count $12 = 33 In[13]> ;; ----- User enters some code at the upper JTextArea widget and press the button with label 'Run' ;; In[15]> (def code (entry getText)) $15 = "(def myFrame (doto (new javax.swing.JFrame) (:size '(400 500)) (:title "Lisp Interpreter") (:visible #t))) " ;; User clicks at the button 'Run' and the code is evaluated creating the variable myFrame In[16]> Btn-run clicked OK <Obj: javax.swing.JFrame[frame0,0,0,400x500, ....> Code evaluated. OK. In[16]> (myFrame :text) Error: Method getText not found. In[16]> (myFrame :title) $16 = "Lisp Interpreter" ;; ----- Now the user types a new code in the upper JTextArea widget ------------;; In[17]> (entry :text) $17 = "(println "Code typed by user in a JFrame Widget")" In[19]> (read (entry :text)) $19 = (println "Code typed by user in a JFrame Widget") In[20]> (eval (read (entry :text))) Code typed by user in a JFrame Widget $20 = nil In[22]> (nth 1 (read (entry :text))) $22 = "Code typed by user in a JFrame Widget" ;; ---- Now the user clicks at button 'Run' -------------------------------;; ;; In[23]> Btn-run clicked OK Code typed by user in a JFrame Widget nil Code evaluated. OK.
Run script swing.jlisp from command line without entering in the REPL:
$ java -jar sexpr.jar script swing.jlisp
Creating a self-executable jar file
A self-executable Jar file can be run without java -jar file.jar on any Unix-like operating system, including, Linux, BSD or MacOSX. This technique does not work on Microsoft Windows because unlike Unix-based operating system, simple text files cannot be set as executable or be run like a native executable on Windows without explicitly specifying the interpreter.
# Creates Unix shell script file 'sexpr' cat <<EOF > sexpr #!/usr/bin/env sh java -jar "\$0" \$* exit 0 EOF # append the contents of jar file to the script. $ cat sexpr.jar >> sexpr # Inspect it $ file sexpr sexpr: a /usr/bin/env sh script executable (binary data) # Make it executable. chmod +x sexpr
Inspect the generated file:
$ file ./sexpr ./sexpr: a /usr/bin/env sh script executable (binary data) $ head -n 3 ./sexpr #!/usr/bin/env sh java -jar "$0" $* exit 0
Run the executable:
# List all unit tests $ ./sexpr tests --list [0] => test parser - case parse a simple list [1] => test parser -parse a nested list [2] => test intepreter - eval () empty list evaluation [3] => test intepreter - expression evaluation [4] => test intepreter - quoted expression evaluation [5] => test intepreter - lambda function application # Run a single unit test $ ./sexpr tests --single 3 [PASSED] Test 'test intepreter - expression evaluation' (id = 3) passed. Ok. # Run the script $ ./sexpr script swing.jlisp [TRACE] Script started Ok. [TRACE ] Code evaluated OK. Btn-run clicked OK Hello world Java Rulez! nil Code evaluated. OK.
Install the executable for running it without specifying the relative path with /path/executable/sexpr <OPTIONS>. Just move the file 'sexpr' to any directory listed in $PATH environment variable such as /bin.
#-------------------------------# # BEFORE # #-------------------------------# $ which sexpr $ sexpr Command 'sexpr' not found, did you mean: command 'expr' from deb coreutils (8.30-3ubuntu2) Try: sudo apt install <deb name> #--------------------------------# # INSTALLATION # #--------------------------------# $ sudo cp -v ./sexpr /bin #------------------------------# # AFTER # #------------------------------# $ which sexpr /usr/bin/sexpr $ sexpr tests ------------------------ TEST REPORT ---------------------------------- [*] SUCCESS - ALL TESTS PASSED OK. [*] Passed = 6 (100.0%) / Failures = 0 (0.0%) / Error = 0 (0.0%) / Total = 6
Run the script swing.jlisp without explicitly specifying the interpreter in the command line.
- Add the following header to the script:
#!/usr/bin/env sh exec /bin/sexpr script "$0" !#
or
#!/usr/bin/env sh exec java -jar /bin/sexpr.jar script "$0" !#
Now, the script swing.jlisp becomes:
#!/usr/bin/env sh exec /bin/sexpr script "$0" !# ;;----------------------------------------------;; ;; Lisp Script ;; ;;----------------------------------------------;; (comment Anything in this comment special form is not evaluted.) (fn make-color (r g b) (new java.awt.Color r g b)) . .. .. ... ... ... ... ... ... ... ... . .. .. ... ... ... ... ... ... ... ...
Make the script executable:
$ chmod +x swing.jlisp
Run the script:
$ ./swing.jlisp [TRACE] Script started Ok. [TRACE ] Code evaluated OK. Btn-run clicked OK Btn-run clicked OK ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
The script execution can simplified even further by creating an executor script named as sexprsh at /bin directory or any other directory listed in the $PATH enviroment variable.
File: /bin/sexprsh
#!/usr/bin/env sh exec java -jar /bin/sexpr.jar script "$1" # exec java -jar /bin/sexpr script "$1" # Alternative way
or:
#!/usr/bin/env sh exec /bin/sexpr script "$1"
Set the shell script as executable:
$ sudo chmod +x /bin/sexprsh
Create a new script named as myscript.jlisp
#!/bin/sexprsh (def frame (doto (new javax.swing.JFrame) (:title "Hello world") ; Similar doto frame.setTile("hello world") (:size '(400 500)) ; Similar to frame.setSize(400, 400) (:visible #t))) ; Similar to frame.setVisible(true) (def EXIT-ON-CLOSE 3) ; Exit process when user clicks at button close (frame setDefaultCloseOperation EXIT-ON-CLOSE) (println "Hello world! [EN] / Hola Mundo! [ES]")
Make the script executable:
$ sudo chmod +x myscript.jlisp
Run the script - the process terminates when the user clicks at the button close of the Java Swing window located at the left upper corner:
$ ./myscript.jlisp Hello world! [EN] / Hola Mundo! [ES]
The GUI script myscript.jlisp can be launched clicking by creating a desktop entry launcher file .desktop at directory ~/Desktop/myscript.jlisp. The shortcut files specified by Freedesktop standard which is used by Linux and BSD distributions for launching applications by clicking at icons don't support relative paths without complicated shell scripts hacks what makes it harder to ship self-contained applications for Linux desktop. Unfortunately, what should be simple is not simple.
Creating the file ~/Desktop/myscript.desktop
[Desktop Entry] Version=1.0 Type=Application Name=Myscript Launcher Comment=JLisp Script with GUI interface 1 Exec=java -jar /bin/sexpr.jar script /home/user/formula-parser/myscript.jlisp Icon=terminator Path= Terminal=false StartupNotify=true
Or:
[Desktop Entry] Version=1.0 Type=Application Name=Myscript Launcher Comment=JLisp Script with GUI interface 1 Exec= /home/user/formula-parser/myscript.jlisp Icon=terminator Path= Terminal=false StartupNotify=true
Make the desktop entry executable. Now the ad-hoc GUI lisp-like application can be launched just by clicking at the desktop file icon.
$ chmod +x myscript.desktop
See:
1.11.6 Further Reading
Scheme and Lisp Introduction:
- Beautiful Racket: Glossary
- A Scheme Primer
- Introduction to Scheme programming language.
- The Axis of Eval: Three Principles of Lisp
- Homoiconicity isn’t the point
- Beautiful Racket: Why Racket? Why Lisp?
- The Roots of Lisp - Paul Graham
- Growing a Language - Guy L. Steele Jr.
- Lisp and the foundations of computing - LWN.net
- Common Lisp VS Racket. Feedback from lispers. · GitHub
- Differences between Scheme and Common Lisp
- Notes on Lisp
- Function Cells and Value Cells
- What Makes LISP Unique? - by Erik Engheim - Erik Explores
- Book: Simply Scheme: Introducing Computer Science (1999)
- The Mysteries of Lisp – I: The Way to S-expression Lisp – arXiv Vanity
About how to implement Lisp-like dialects:
- (How to Write a (Lisp) Interpreter (in Python)) - Peter Norvig
- (An ((Even Better) Lisp) Interpreter (in Python)) - Peter Norvig
- Lithp - A interpreter for John McCarthy's original Lisp.
- Interpreter written in Python in literate programming style.
- BOOK: Write Yourself a Scheme in 48 Hours - Wikibooks, open books for an open world
- Scheme implementation in Haskell programming language, that can be hard to understand for people not acquainted with Haskell.
- BOOK: Write Yourself a scheme in 48 hours in F# (FSharp)
- BOOK: Write You A Scheme, Version 2.0
- Scheme implementation in Haskell.
- BOOK: Learn C • Build Your Own Lisp
- BOOK: Lisp from nothing
- Presents how to build a metacircular LISP interpreter in common lisp; a metacircular lisp interpreter in Scheme; a self-hosting lisp compiler and garbage collector.
- BOOK: The Common Lisp Cookbook
- BOOK: Structure and Interpretation of Computer Programs, 2e: Top
- BOOK: Structure and Interpretation of Classical Mechanics
- BOOK: Common Lisp: A Gentle Introduction to Symbolic Computation
- Kwang's Haskell Blog - Write you an interpreter
- Scheme implementation in Haskell.
- How to write a Scheme interpreter · hello world
- Implementation in .NET
- Repository: https://github.com/Microsoft/schemy
- Scheme from Scratch - Introduction
- Roll A Lisp In C - Evaluation
- rzubek/CSLisp: C# Scheme - Lisp implementation for embedding in .NET projects
- S-expression Compiler in Scala
- The SECD Abstract Machine
- Why is it easy to implement a Lisp? - Eli Bendersky's website
- CMSC 430: Design and Implementation of Programming Languages
- https://github.com/norvig/paip-lisp
- Brief: "This is an open-source repository for the book Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp by Peter Norvig (1992), and the code contained therein. The copyright has reverted to the author, who has shared it here under MIT license. On the list of most influential books for programmers. As seen on TV. See also: errata, comments, retrospective."
Lisp Variants:
- Janet Programming Language (akin to Clojure)
- GitHub - picolisp/pil21: PicoLisp dialect
- LambdaLisp - A Lisp Interpreter That Runs on Lambda Calculus | Woodrush’s Blog
- NU Lisp
- "Nu is an interpreted Lisp that builds on the Objective-C runtime and Foundation framework."
- newLISP - FAQ
- "newLISP is a general purpose scripting language for developing web applications and programs in general in the domains of artificial intelligence (AI), statistics and machine learning."
- uLisp - ARM Assembler in Lisp
- http://lush.sourceforge.net/
- Femtolisp
- Scheme-lisp dialect that inspired the Julia programming language.
- s7 Scheme - Embedded Scripting Language
- SCSH Scheme - Unix-shell scripting in scheme scripting language
- "Scsh is an open-source Unix shell embedded within Scheme, running on all major Unix platforms including AIX, Cygwin, Linux, FreeBSD, GNU Hurd, HP-UX, Irix, Mac OS X, Solaris, and some others. Scsh is a variant of Scheme 48 (an R5RS compliant new-tech Scheme system) Scsh is designed for writing real-life standalone Unix programs and shell scripts. Scsh spans a wide range of application, from “script” applications usually handled with perl or sh, to more standard systems applications usually written in C."
- Scsh - Scheme Unix Shell Scripting - Reference Manual
- "This is the reference manual for scsh, a Unix shell that is embedded within Scheme. Scsh is a Scheme system designed for writing useful standalone Unix programs and shell scripts – it spans a wide range of application, from ``script'' applications usually handled with perl or sh, to more standard systems applications usually written in C. Scsh comes built on top of Scheme 48, and has two components: a process notation for running programs and setting up pipelines and redirections, and a complete syscall library for low-level access to the operating system. This manual gives a complete description of scsh. A general discussion of the design principles behind scsh can be found in a companion paper, ``A Scheme Shell.'' "
- [PDF] A Scheme Shell - MIT - Olin Shivers
- [PDF] SKILL: a Lisp Based Extension Language
- [PDF] A quick tour to skill programming language - Cadence CAD / Lisp.
- [PDF] An implementation of LISPKIT Lisp in Java (SECD Abstract Machine)
- LispMe internals
- This implementation uses a SECD Virtual Machine instead of a recursive AST evaluator. The code is compiled to SECD machine bytecodes and later executed by a SECD virtual machine.
- How to write a Scheme interpreter · hello world
- Implementation in .NET
- Repository: https://github.com/Microsoft/schemy
- GitHub - shikantaza/pLisp: Don't Lisp, Talk!
- pLisp: A Friendly Lisp IDE for Beginners
- "pLisp is an integrated development environment for Lisp. While it aims to be a friendly Lisp development system for beginners, its feature-set is comprehensive enough to address the needs of a small-to-medium sized Lisp project."
- Note: Based on Commonlisp, Scheme and Smalltalk.
- Interface Builder's Alternative Lisp timeline
- Le_Lisp — Software Preservation Group
- Embeddable Common-Lisp / ECL · GitLab
- STELLA Programming Language / Lisp that compiles to Java or C++ ; lisp
- GitHub - kanaka/mal: mal - Make a Lisp
- "Each implementation of mal is separated into 11 incremental, self-contained (and testable) steps that demonstrate core concepts of Lisp. The last step is capable of self-hosting (running the mal implementation of mal). See the make-a-lisp process guide. Mal is implemented in 87 languages (93 different implementations and 115 runtime modes)."
- Embedding Scheme for a game mission scripting DSL
- Game Oriented Assembly Lisp - Wikipedia
- OpenGOAL | OpenGOAL
- Postmortem: Naughty Dog's Jak and Daxter: the Precursor Legacy
- Making Crash Bandicoot – GOOL – part 9 - All Things Andy Gavin
- [PDF] Performance and Evaluation of Lisp Systems
- AutoLisp - Lisp Dialect, based on common lisp, used by Autocad software for engineering drawings.
Common Lisp:
- Common LISP Hints - 1993
- On repl-driven programming - by mikel evins
- Structurally-Typed Condition Handling — Infinite Negative Utility
- Condition Handling in the Lisp Language Family'' c by Kent Pitman
- Common Lisp: A Tutorial on Conditions and Restarts
- On Pitman's „Special forms in Lisp“
- Data notation in Clojure - Ostash.Dev
- LispWorks - 1.7 Calling COM interface methods
- The Common Lisp Object System
- Reader Macros in Common Lisp
- Reader Macros in Common Lisp · GitHub
- C++ and Lisp / multiple dispatch, multimethod, visitor design pattern
- common-lisp Tutorial - LOOP, a Common Lisp macro for iteration
- Delivering games written in Common Lisp
- Lisp Compiler
S-Expression Parsing:
- Wiki C2 - ESS Expressions
- S-expressions - Rosetta Code
- EDN / Clojure - edn-format/edn: Extensible Data Notation
- Clojure variant of S-Expressions (SExp) - Clojure EDN is easier to read and has more rich data types than lisp's SEXP.
- Clojure EDN - Learn edn in Y Minutes
- S-expression parser · GitHub (JavaScript)
- WIKI C2 - Xml Isa Poor Copy Of Ess Expressions
- Some thoughts on JSON vs. S-expressions
- Example Of Greenspuns Tenth At Work
- An Emacs Lisp DSL for GNU Plot
- Cakelisp: a programming language for games
- S-expressions data structure - Ron Rivest
- GitHub - janestreet/sexplib: Automated S-expression conversion
- OCaml S-Expression parser
- S-Expressions in ruby - Janestreet
- "We use a lot of S-Expressions at Jane Street. Almost every system written at Jane Street in OCaml uses sexps for config files, and we use it for a lot of IPC (Inter-Process Communication) when resources aren’t an issue."
- Data Serialization with S-Expressions - Real World OCaml
- SXP - A universal S-expression parser for Ruby
- "This is a Ruby implementation of a universal S-expression parser. Parses S-expressions in universal, Scheme, Common Lisp, or SPARQL syntax."
- SPARQL S-Expressions (or "SPARQL Syntax Expressions")
- https://github.com/BitPuffin/sexpresso
- C++ S-Expression parser library
- https://github.com/SuperTux/sexp-cpp
- C++ S-Expression parser library (GPL 3)
- s-cargot: A flexible, extensible s-expression library
- Haskell S-Expression parser. The description of this page also tells how SExp can be used as storage format, serialization format and data description language.
- GitHub - drslump/sexp-php: S-expression parser and serializer for PHP
- SRFI 110: Sweet-expressions (t-expressions)
1.12 Expression interpreter
1.12.1 Overview
The following source code provides an implementation of an interpreter that able evaluate infix math expressions, with mathematics operator precendence rules, and programming constructs including if-else statements, while loop, for loop, function calls and function definition. The application is comprised of three parts a lexer and evaluator. The tokenizer, also known as tokenizer, is a component that breaks the source code into lexical units called tokens. In this case, a token is a struct containing the token lexeme, a string that represents the token; the token type enumeration, which indicates whether the token is a number, a keyword or a string; and metadata related to the token position, column and line in the source code. The parser component takes the token stream generated by the tokenizer as input and produces an AST - Abstract Syntax Tree using a recursive descent parsing algorithm that consists of mutually recursive functions, each implementing a non-terminal grammar symbol. The final part of the expression interpreter is an AST evaluator, that traverses the AST - Abstract Syntax Tree nodes generated by the parser in a recursive way for producing runtime values, that represents all possible computation results.
The techniques adopted in this sample code, can be used for implementing formula evaluators, spreadsheets, CAS - Computer Algebra Ssystems, DSL - Domain Specific Languages or embedded scripting languages for allowing extending applications at runtime. Despite that the C++ was used for implementing this interpreter, the algorithm can easily be implemented in any other programming language and rewritten in a more concise way in functional programming languages with pattern matching and algebraic data types. For instance, in a functional programming language, the visitor design pattern could be replaced by a more concise pattern matching and the class hierarchies representing the AST - Abstract Syntax Tree and interpreter runtime values could be replaced by algebraic data types.
This parser implements the following grammar stated in ANTLR-style EBNF (Extended Backnaur notation). The syntax of programming constructs were borrowed from Basic and Pascal programming languages.
// Complete program - is a collection of zero or more statements prog: stat* ; // Statement stat: expr ";"? // Expression | IDEN "=" expr ";" // Assignment | while_loop | for_loop ; expr: logical ; logical: equality ( ( "or" | "and" ) equality )* ; equality: comparison ( ( "!=" | "==" ) comparison )* ; comparison: arithmetic ( ( ">" | ">=" | "<" | "<=" ) arithmetic )* ; arithmetic: factor ( ("+" | "-") factor )* ; factor: unary ( ("*" | "/") unary )* ; unary: "+" unary | "-" unary | "!" unary | power ; power: call ("^" unary)* ; // Exponentiation call: atom | IDEN "(" ( expr ( "," expr )* )? ")" // Function call ; atom : INTEGER // number | FLOATING_POINT | IDEN // Identifier, example function name or variable name | BOOLEAN // Logical value | SYMBOL // symbol like Lisp keyword or Ruby's symbols | NIL // null value | "(" expr ")" | expr_if // If-else statement | expr_def // Function definition ; // IF-Else statement (NOT IMPLEMENTED) expr_if: "if" expr "then" stat+ "end" | "if" expr "then" stat+ "else" stat+ "end" ; // Function definition expr_def: "fn" IDEN "(" (IDEN ("," IDEN)*)? ")" stat* "end" | "fn" IDEN "(" (IDEN ("," IDEN)* ))? ")" "->" expr ; while_loop: "while" expr "do" stat* "end" ; for_loop: "for" IDEN "=" expr "to" expr ("by" expr)? "do" stat* "end" ; BOOLEAN: "true" | "false" ; NIL: "nil" ; SYMBOL: ":" ID ;
The expression parser is able to parse the following syntax:
// Math expressions and formulas 10 + 3 * 5 / (x + 2 * y + 4 * z) + exp(2.1) ^ 2 ^ 3 // Numbers in percent 30% // 0.30 // Numbers with _ as thousand separator 10_125_905 // 10125905 // Floating point numbers 2.15e3 if x > 0 then print("x is positive") else print("x is negative or zero") end x = 20 // Result z = 100 z = if x > 10 then 5 * x else 10 * x end for i = 1 to 10 do print(" i = ", i) end k = 10 while k > 0 do print(" k = ", k) k = k - 1 end fn short_function(x) -> 3 * x + log10(x) fn long_function(x, y, z) print(" x = ", x) print(" y = ", y) print(" z = ", z) end
BNF - Backus-Naur and grammar-related terminology
- CFG - Context-Free Grammar
- As opposite to CSG - Context-Sensitive Grammars, production rules of CFG grammars does not depend on the context. This type of grammar can be described by BNF or EBFN notation.
- CSG - Context-Sensitive Grammar
- Grammar which production rules are context-sensitive and depends on the current context. The C and C++ programming languages grammars are context-sensitive, since the interpretation of many programming constructs are context-dependent. For instance, the meaning of T * X in C and are dependent on the context, if T is type, such as int, then X is a pointer variable. If both X and T are of type int, float or any other numeric type, then the expression is a multiplication between variables T and X.
- CFG - Control Flow Graph
- BNF - Backus-Naur Normal Form
- Notation introduced by John Backus and Peter Naur for formally describing context-free grammars, which are not context-sensitive and not ambiguous. For instance C and C++ grammars are not context-free grammars since the meaning of an expressions depends on the context, whether the symbol is a function, pointer, class and so on.
- EBNF - Extended Backus-Naur Form
- ABNF - Augmented Backus-Naur Form
- Modified Back-Naur Form notation for describing bidirectional communication protocols. This notation is used to describe the Http protocol, FTP protocol and other network protocols.
- ANTLR - ANother Tool For Language Recognition
- Open source tool written in Java that takes an ANTLR-style EBNF grammar and generates a parser for this grammar in a given target language, such as Python, Ruby, Perl, Go (Golang), Java, C# (CSharp), C, C++, JavaScript (ECMAScript), PHP and Swift. ANTLR is one of the most used tools for generating parsers and lexers/tokenizers. This parser generator tool has its own EBNF grammar style that does not conform to ISO/IEC 14977 technical standard.
- Lex - Program that generates scanner/tokenizer/texer source code in C programming language.
- Yacc - (Yet Another Compiler-Compiler) - Program, akin to ANTLR, that generates parser source code in C programming language. Yacc is often used alongside Lex (Lex & Yacc).
- Flex (Lexical Analyser Generator) - GNU alternative to Lex.
- GNU Bison - GNU alternative to Yacc. Ofen used together with Flex (Flex & Bison).
- RE2C - Lexer code generator for C/C++, Go and Rust. This tool is used by PHP, Ninja build system and BRL-CAD.
- Parsing - process of turning a string from text or source code into some data structure such as AST - Abstract Syntax Tree. Example, turning a string containing a JSON (Java-Script Object) notation code into a tree-like data structure.
- AST - Abstract Syntax Tree
- Data structure that describes a tree representation of the source code used as input of compiler code generators, bytecode code generators, interpreters and pretty printers.
- DOM - Document Object Model
- Abstract Syntax Tree representation of an XML or HTML document. Whenever a web page is loaded, the web browser turns the html code into a DOM object, a tree-like representatio the html source code that is exposed to the JavaScript (ECMaScript) code.
- DAG - Direct Acyclic Graph
- SEXP S-Expression from Lisp
- Lisp S-expressions are one of the most convenient and compact to represent AST - Abstract Syntax Trees as text.
- Top-down Recursive Descent Parsing or just Recursive Descent Parsing
- Parser algorithm that uses mutually recursive functions or methods for each non-terminal production rule. Despite the simplicity of implementation of this parsing technique, it has few shortcomings such as not being able to deal with left-recursive and ambiguous grammars.
- TDOP - Top-Down Operator Precedence Parser (the same as Pratt Parser)
- Parser algorithm that can parse infix, prefix and postfix expressions different operator precedence levels without a non-terminal grammar rule for every precedence level. This algorithm is more flexible than Top-Down Recursive Descent Parsing since new operator precedence levels can be added without modifying the grammar or adding new recursive function.
- Shunting-yard algorithm (Edsger Dijkstra)
- This algorithm uses a stack for turning infix expressions into postfix expression, RPL - Reverse Polish Notation, which is suitable for direct evaluation of infix expressions. Unlike other algorithms, the output is not an AST - Abstract Syntax Tree. Therefore, if an AST is needed, an additional algorithm will be required for turning the postfix output into an AST.
- LALR - Look Ahead Left-to-Right Rightmost derivation parser (Parser Algorithm)
- DFA - Deterministic Finite Automaton
- DSA - Deterministic Stack Automata (related to automata theory and formal languages)
- NSA - Non-deterministic Stack Automata (related to automata theory and formal languages)
- DSL - Domain Specific Language
- GPL - General Purpose Language
- CAS - Computer Algebra System
- SAS - Symbolic Algebra System (Another name of CAS)
- IR - Intermediate Representation
- IL - Intermediate Language
- SSA Static Single-Assignment Form
- EOI - End Of Input
- EOF - End Of File
- EOL - End Of Line
- CR - Carriage Return - new line character (
\r
- 0x0D - 13) - LF - Line Feed - new line character (
\n
- 0x0A - 10) - CRLF - Carriage Return and Line Feed (
\r\n
) - LHS - Left-Hand Side, often the left-hand side of an infix operator such as (plus): lhs + rhs.
- RHS - Right-Hand Side, often the right-hand side of an infix operator such as (star): lhs * rhs.
- terminals - (ANTLR) are written with upper case letters, for
instance, IDENTIFIER, NUMBER, …
- A Terminal symbol is just a single token that represents an AST (Abstract Syntax Tree) node without any child AST nodes. For instance, terminal symbols are string literals; boolean literals (True, False in Python); integer literals; floating point literals and so on.
- non-terminals - (ANTLR) are written using camel-case or
snake-case convention. For instance: expr, program,
for_loop
.- A non-terminal symbol are symbols comprised of multiple terminal symbols or tokens. Non-terminal symbols correponds to AST - Abstract Syntax Tree nodes with multiple AST child nodes. For instance, non-terminal symbols are often if-else statement, for-loop statement, while-loop and so on.
- '=' - ISO/IEC 14977 - definition of grammar production rules (non-terminals and terminals)
- ':' - (ANTLR) - definition of grammar production rules
- ':=' - defintion of grammar production rules - this notation is used by some EBNF grammar variants.
- '*' - Kleene Star - means 0 or more occurences of a pattern
- '+' - Kleene Cross - means 1 or more occurrences of a pattern
- '?' - question mark - means 0 or 1 occurences of a pattern
- ',' - (ISO/IEC 14977) - concatenation
- '|' - (ISO/IEC 14977) alternation - separation between alternatives grammar production rules.
- { … } - (ISO/IEC 14977) - repetition of 0 or more ocurrences - equivalent to (…)*
- ( … ) - parentheses are used for grouping
- " … " - terminal string
- ' … ' - terminal string
- (* … *) - (ISO/IEC 14977) - comment
- // … - (ANTLR) comment
- ';' - ANTLR used as delimiter of grammar production rules
- 'ε' - Epsilon - used in grammar descriptions for eliminating left-recursion by indicating an empty string. This notation is not used by EBFN, that uses an empty string instead of any special notation.
Performance Optimization for Interpreters
- Compilation to virtual machine
- Most programming languages interpreter engines do not execute AST nodes directly, instead they compile the source code to bytecodes, which are a instructions for some virtual machine.
- Compilation to a standardized virtual machine
- Rather than implementing its own bytecode virtual machine, a programming language could use a standardized bytecode virtual machine runtimes that are exposed to external use and not likely to have breaking changes on every release since they are regarded as public APIs and interfaces. Some of these virtual machines, that are a compilation target of many programming languages, are JVM - Java Virtual Machine, used by Jython (Python for JVM), Clojure and Groovy; CRL - Commong Language Runtime (.NET Virtual Machine), use by C# (CSharp), F# (FSharp), IronPython; and WASM - Web Assembly.
- JIT - Just-In-Time compilation
- Compilation of bytecodes (virtual machine instructions) to machine code (also known as native code) at runtime. This processs combines the advantage of interpretation and compilation to native code. The shortcoming of this approach is the loss of flexibility and portability, since the JIT has to be ported to every processor architecture where the virtual machine will be deployed.
- FFI - Foreign Function Interface
- FFI allows loading native code shared libraries, which may be written in C, C++, Rust, D (Dlang), exposing a C interface and C symbols.
- Native Extension API
- A plugin system that allows loading libraries provided as shared libraries (*.so - files on Linux, BSD; *.dylib on MacOSX or *.dll on Windows), written in C, C++, Rust or any other language that compiles to native code, conforming to an expected C interface. Example: Java's JNI that allows loading Java Libraries written in C or C++ that conforms to the JNI API by using JNI SDK - Software Development Kit.
- Hidden Class Optimization used in JavaScript V8 engine => See: richardartoul - hidden class
- Choice and implementation of garbage collector algorithm
- String Interning
- Constant Folding
- Constant Propagation
- Escape Analysis
- NaN Boxing
Further Reading
Parsing:
- A Guide To Parsing: Algorithms And Terminology
- From AST to Lossless Syntax Tree
- Context-free grammar - wikipedia
- Backus–Naur form - wikipedia
- Augmented Backus–Naur form - wikipedia
- Extended Backus–Naur form - wikipedia
- EBNF: How to Describe the Grammar of a Language - tomassetti.net
- EBNF: How to Describe the Grammar of a Language - Dzone
- The language of languages
- Don’t Use ISO/IEC 14977 Extended Backus-Naur Form (EBNF)
- RFC5234 - Augmented BNF for Syntax Specifications: ABNF
- ISO/IEC 14977 : 1996(E) - [PDF]
- "Defines a notation, Extended BNF, for specifying the syntax of a linear sequence of symbols. It defines both the logical structure of the notation and its graphical representation. "
- BNF - Unnecessary Diversity of Notation for Syntactic Definitions [PDF]
- Original title: "BNF WAS HERE: What Have We Done About the Unnecessary Diversity of Notation for Syntactic Definitions"
- CS211 - Gramamr and Parsing - Cornell
- C and C++ are not context free
- The context sensitivity of C’s grammar, revisited
- The context sensitivity of C's grammar
- Why I don’t use a Parser Generator – Musing Mortoray
- The ANTLR Mega Tutorial - tomassetti.net
- Parsing in JavaScript: all the tools and libraries you can use
- Parsing in Java: all the tools and libraries you can use
- Parsing in Python: all the tools and libraries you can use Strumenta
- https://github.com/antlr/grammars-v4
- Collection of ANTLR-EBNF grammars for several programming languages and markup languages including, Fortran 77, GraphQL, Java, Kotlin, Lua, JSON - JavaScript Object Notation, html.
Semantics:
- Evaluation strategy
- eval - programming languages
- https://wiki.c2.com/?CallByName
- Thunk
- De Bruijn notation
- Functional Data Abstraction
- Halting problem
- Operational Semantics | Principles of Programming Languages
- Introduction to Programming Languages/Evaluation Strategies
- TLA+ - Formal Specification Language
- Formal Specification
- Turing completeness - Wikipedia
- It doesn’t have to be Turing complete to be useful – Increment: Programming Languages
- The Excel Formula Language Is Now Turing-Complete
- The complete guide to (external) Domain Specific Languages - Strumenta
- Abstract semantic graph
- Common Object Operations in .NET | Object Equality and Identity
- Equality comparisons and sameness - JavaScript | MDN
- Computability and models of computation
- Laurence Tratt: Why Aren't Programming Language Specifications Comprehensive?
Operator Precedence:
- Polish notation
- Operator Precedence: We can do better
- Operator Precedence Considered Harmful
- The Math Equation That Tried to Stump the Internet - The New York Times
- Languages without operator precedence
- The Role of Operators in APL - Kenneth E. Iverson
- Note: APL, just like Smalltalk programming language, does not have operator precedence, all operators have the same precedence level. In order to specify a particular order of evaluation, the user must use parentheis.
- Smalltalk Message Rules
- Operator associativity
- Order of Operations
- Operator Precedence in JavaScript
- C# operators and expressions (C# reference)
- C++ built-in operators, precedence, and associativity
- https://rosettacode.org/wiki/Operator_precedence
- Infix Functions In Haskell (Operator Precedence in Haskell)
- operator precedence in Haskell - stack overflow
- r/haskell - Confusion between (.) and ($) operators (Operator Precedence in Haskell)
- Better operator precedence
- https://en.wikipedia.org/wiki/Null_coalescing_operator
- Kotlin - The Elvis Operator (?:) - null coalescing operator
- Kotlin - Null Safety
- Null coalescing operators - ?? and ??= operators (C# reference)
Non-Categorized:
- AST explorer
- Ohm: Parsing Made Easy - Nextjournal => JavaScript parser library, but it is useful for prototyping parsers and grammars for custom/bespoke DSL - Domain Specific Programming languages.
- Programming Naming Conventions – Camel, Snake, Kebab, and Pascal Case Explained
- Camel Case vs. Snake Case vs. Pascal Case — Naming Conventions
- GitHub - DSLsofMath/DSLsofMath: Domain-Specific Languages of Mathematics
- Forth, Lisp, & Ruby: languages that make it easy to write your own domain specific language (DSL)
- Forth: The programming language that writes itself / Dave's Forth Talk 2023
- What the hell is Forth? — blog dot information dash superhighway dot net
- A Dynamic Forth Compiler for WebAssembly | mko.re
- A Lambda Calculus With Coroutines and Heapless, Directly-Called Closures
- Why Take a Compiler Course? – Embedded in Academia
- UCB CS294-113: Virtual Machines and Managed Runtimes
- CS 6120: Advanced Compilers: The Self-Guided Online Course - Cornell
- GitHub - IUCompilerCourse/Essentials-of-Compilation: A book about compiling Racket and Python to x86-64 assembly
- cpplinks/compilers.md at master · MattPD/cpplinks · GitHub
- Essentials of Compilation: An Incremental Approach
- Introduction to Compilers and Language Design
- Static Single Assignment (SSA) Book
- SSA-based Compiler Design
- https://craftinginterpreters.com
- => Online book about how to implement interpreter in C programming language.
- Writing Interpreters in Rust: a Guide
- => Online book about how to implement interpreter and bytecode virtual machines using Rust (better memory-safety).
- https://github.com/pfalcon/ssabook
- https://craftinginterpreters.com/optimization.html
- Course Webpage for Compilers (P423, P523, E313, and E513) | IU-Fall-2022
- https://github.com/IUCompilerCourse/Essentials-of-Compilation
- IUCompilerCourse/Essentials-of-Compilation: A book about compiling Racket and Python to x86-64 assembly
- DSL Patterns - Martin Fowler
- Chapter 2. The Structure of the Java Virtual Machine
- Virtual machine and javascript engine
- From AST to bytecode execution in EndBASIC
- Implementing a Toy Optimizer | PyPy
- Pointer Compression in V8 · V8 (JavaScript Engine used by Chrome browser variants, NodeJS and Deno runtime)
- Pointer compression in Oilpan
- Small objects and pointer tagging | Max Bernstein
- Memory, Pages, mmap, and Linear Address Spaces | Pointers Gone Wild
- Interesting things about the Lua interpreter | thesephist.com
- [PDF] Book - Programming Language - Application and Interpretation
- [PDF] Niklaus Wirth - A PASCAL Subset and its Implementation (Oberon)
- Scripting Objects without Trash Day | Capital's Corner
- BASIC interpreter
- Another NaN-based tagging strategy for dynamic programming languages
- VX32 Virtual Extension Environment
- Microfeatures I'd like to see in more languages • Buttondown
- Binaries, strings, and charlists - The Elixir programming language
- Low-Level Software Security for Compiler Developers
- data structures - How can one interpret an Abstract Syntax Tree without recursion?
- design patterns - What is the purpose of NaN boxing? - Software Engineering Stack Exchange
- ppkwok: Java Cafe 1 : Never write NaN == NaN (they're not equal)
- GitHub - byo-books/pretty_laughable_lang
- An educational C-like toy programming language that compiles to x64 binary.
- GitHub - AZHenley/teenytinycompiler: A small BASIC-to-C compiler written in Python.
- JavaScript memory model demystified
- http://c2.com/cgi/wiki?TransparentPersistence
- Transparent Persistence | Lambda the Ultimate
1.12.2 Code
All the sources are available at gist:
File: xmake.lua
add_rules("mode.debug", "mode.release") target("formula") set_kind("binary") add_files("./formula.cpp")
File: script.txt => Contains scripting code for the expression parser.
print("\t[TRACE] Starting user script ") print("\t[INFO] REPL Version = " + version) // Function that computes square fn sq(x) -> x * x print("\n ------ Test IF-ELSE -------------") x = -10 if x > 0 then print(" => x greater than zero ") else print(" => x negative or zero ") end z = if x > 0 then print(" => x greater than zero ") 100 else print(" => x negative or zero ") 200 end print(" [TRACE] z = ", z) fn test_number(x) if x < 0 then print(" => x = ", x, " => negative ") end if x == 0 then print(" => x = ", x, " => zero ") end if x > 0 then print(" => x = ", x, " => positive ") end end test_number(-10) test_number(20) test_number(0) print("\n---------- WHILE LOOP ----------- ") fn do_sum(i) sum = 0; // Start loop while i > 0 do i = i - 1; sum = sum + sq(i); print("\tsq(i) = ", sq(i)); end print("\tsum = ", sum); // The return value is the last function value sum; end sum = do_sum(5) print(" sum = do_sum(5) = ", sum) print("\n---------- FOR LOOP ----------- ") for i = 10 to 0 by -2 do print(" \t i = ", i, " ; i^2 = ", i * i) end print("\n ------- CLOSURES ----------------------") fn make_add(k) -> fn(x) -> k + x add10 = make_add(10) add20 = make_add(20) add30 = make_add(30) fn test_function(name, fun) print("\n --- Testing function: ", name) for k = 10 to 30 by 5 do print("\t", name, "(", k,")", " = ", fun(k)) end end test_function("add10", add10) test_function("add20", add20) test_function("add30", add30)
File: formula.cpp (about 2600 lines of code)
#include <iostream> #include <sstream> #include <string> #include <cassert> #include <map> #include <vector> #include <memory> #include <functional> #include <fstream> #include <cmath> #undef EOF // Token Type enum class Type { INT // Number - integer , FLT // Number - floating point , NPC // Number in percent ending with '%' character , IDEN // Identifier , BOOL // Boolean , STR // String , SYM // Symbol - starts with ':', for instance :asymbol , NIL // Nil , ADD // '+' operator , SUB // '-' , MUL // '*' multiplication operator , DIV // '/' division operator , MOD // '%' modulus operator , POW // '^' power operator - example 2^3 = 8 , ASN // '=' assignment , EQ // '==' equal , NEQ // '!=' not equal , LT // '<' less than , GT // '>' greater than , LTE // '<=' less or equal than , GTE // '>=' greater or equal than , NOT // '!' not - invert logical value , OR // 'or' - or operator keyword , AND // 'and' - and operator keyword , FN // 'fn' - keyword for defining functions , END // 'end' - keyword delimiter , BEGIN // 'begin' keyword , FOR // 'for' - keyword , WHILE // 'while' - keyword , IF // 'if' - keyword , ELSE // 'else' - keyword , THEN // 'then' - keyword , DO // 'do' - keyword , TO // 'to' - keyword (used in for-loop) , BY // 'by' - keyword , ARROW // '->' - token used short function defintion // --- Delimiters --- // , LPAR // Left parenthesis , RPAR // Right parenthesis , COM // ',' Comma , SEM // ';' semicolon , EOF // End of File , ERR // Indicates lexer error , NONE // Indicates an empty token, non initialized token. }; enum class Oper { ADD, SUB, MUL, DIV, MOD, POW, NOT, EQ, NEQ, LT, LTE, GT, GTE, AND, OR, ERR }; std::string operators[] = { "+", "-", "*", "/", "%", "^", "!" }; std::string operator_names[] = { "ADD", "SUB", "MUL", "DIV", "MOD", "POW", "NOT" }; // Converts token type to operator Oper type_to_oper(Type t) { static auto map = std::map<Type, Oper> { {Type::ADD, Oper::ADD}, {Type::SUB, Oper::SUB}, {Type::MUL, Oper::MUL} ,{Type::DIV, Oper::DIV}, {Type::MOD, Oper::MOD}, {Type::POW, Oper::POW} ,{Type::NOT, Oper::NOT}, {Type::EQ, Oper::EQ }, {Type::NEQ, Oper::NEQ} ,{Type::LT, Oper::LT }, {Type::LTE, Oper::LTE }, {Type::AND, Oper::AND} ,{Type::GT, Oper::GT}, {Type::GTE, Oper::GTE }, {Type::OR, Oper::OR } }; return map.at(t); } std::string oper_to_name(Oper op) { static auto map = std::map<Oper, std::string> { {Oper::ADD, "ADD"}, {Oper::SUB, "SUB" }, {Oper::MUL, "MUL"} ,{Oper::DIV, "DIV"}, {Oper::MOD, "MOD" }, {Oper::POW, "POW"} ,{Oper::NOT, "NOT"}, {Oper::EQ, "EQ" }, {Oper::NEQ, "NEQ"} ,{Oper::LT, "LT" }, {Oper::LTE, "LTE" }, {Oper::AND, "AND"} ,{Oper::GT, "GT" }, {Oper::GTE, "GTE" }, {Oper::OR, "OR" } }; return map.at(op); } std::string oper_to_str(Oper op) { static auto map = std::map<Oper, std::string> { {Oper::ADD, "+" }, {Oper::SUB, "-" }, {Oper::MUL, "*" } ,{Oper::DIV, "/" }, {Oper::MOD, "%" }, {Oper::POW, "^" } ,{Oper::NOT, "!" }, {Oper::EQ, "==" }, {Oper::NEQ, "!=" } ,{Oper::LT, "<" }, {Oper::LTE, "<=" }, {Oper::AND, "and"} ,{Oper::GT, ">" }, {Oper::GTE, ">=" }, {Oper::OR, "or" } }; return map.at(op); } struct Token { Type type; std::string text; int begin; int end; int lin; int col; Token(){ type = Type::NONE; text = ""; begin = end = lin = col = 0; } Token(Type type, std::string const& text, int begin, int end, int lin, int col) : type(type), text(text), begin(begin), end(end), lin(lin), col(col){ } bool isEOF(){ return type == Type::EOF; } }; std::ostream& operator<<(std::ostream& os, Token const& tok) { static auto map = std::map<Type, std::string>{ {Type::INT, "INT"}, {Type::FLT, "FLT"}, {Type::IDEN, "IDEN"}, {Type::BOOL, "BOOL" } ,{Type::STR, "STR"}, {Type::NIL, "NIL"}, {Type::ADD, "ADD" }, {Type::SUB, "SUB" } ,{Type::MUL, "MUL"}, {Type::DIV, "DIV"}, {Type::MOD, "MOD" }, {Type::POW, "POW" } ,{Type::ASN, "ASN"}, {Type::EQ, "EQ" }, {Type::NEQ, "NEQ" }, {Type::LT, "LT" } ,{Type::GT, "GT" }, {Type::LTE, "LTE"}, {Type::GTE, "GTE" }, {Type::NOT, "NOT" } ,{Type::OR, "OR" }, {Type::AND, "AND"}, {Type::LPAR, "LPAR"}, {Type::RPAR, "RPAR" } ,{Type::COM, "COM"}, {Type::COM, "COM"}, {Type::SEM, "SEM" }, {Type::RPAR, "RPAR" } ,{Type::EOF, "EOF"}, {Type::ERR, "ERR"}, {Type::NONE, "NONE" }, {Type::FN, "FN" } ,{Type::WHILE, "WHILE"}, {Type::ARROW, "ARROW"}, {Type::END, "END"}, {Type::BEGIN, "BEGIN"} ,{Type::TO, "TO"}, {Type::NPC, "PEC"} }; os << "Token( " << (int) tok.type << " ; type = " << map[tok.type] << " ; " << tok.text << "; " << " ; begin = " << tok.begin << " ; end = " << tok.end << " ) "; return os; } class Tokenizer { std::istream& _is; // Input stream char _chr = '\0'; // Current character int _pos = 0; // Current position int _col = 0; // current column int _lin = 0; // current line // Keywords database std::map<std::string, Type> _keywords; public: Tokenizer(std::istream& is): _is(is) { this->next_chr(); add_keyword("true", Type::BOOL); add_keyword("false", Type::BOOL); add_keyword("nil", Type::NIL); add_keyword("or", Type::OR); add_keyword("and", Type::AND); add_keyword("fn", Type::FN); add_keyword("if", Type::IF); add_keyword("do", Type::DO); add_keyword("else", Type::ELSE); add_keyword("then", Type::THEN); add_keyword("begin", Type::BEGIN); add_keyword("end", Type::END); add_keyword("for", Type::FOR); add_keyword("to", Type::TO); add_keyword("by", Type::BY); add_keyword("while", Type::WHILE); } void add_keyword(std::string const& keyword, Type type) { _keywords[keyword] = type; } // Returns true if at end of file bool is_eof() const { return _is.eof(); } char peek() const { return _is.peek(); } // Advances to next character char next_chr() { _chr = _is.get(); _pos++; _col++; // Assumes that characcter for new line // always will be '\n' and never '\r\n' or '\r' if(_chr == '\n'){ _col = 0; _lin++; } return _chr; } bool match(char x) { if( _chr == x){ next_chr(); return true; } return false; } // Generates an error token Token error(std::string const& message) { return Token(Type::ERR, message, _pos, _pos, _lin, _col); } // Return all tokens std::vector<Token> tokens() { std::vector<Token> toks; Token tk; while( tk.type != Type::EOF ){ tk = this->next_token(); if( tk.type == Type::EOF){ break; } toks.push_back(tk); } return toks; } Token next_token() { // Ignore blank characters, such as white spaces, tabs and // new line characters. while( !is_eof() && std::isspace(_chr) ){ next_chr(); } // std::fprintf(stderr, " [TRACE] _chr = %c \n", _chr); if( is_eof() || _chr == '\0' ){ return Token(Type::EOF, "", _pos, _pos, _lin, _col); } // Sanity checking assert( !isspace(_chr) ); int begin = _pos - 1; auto tok = next_delimiter(); if( tok.type != Type::NONE){ return tok; } // Number => Integer or floating point if( std::isdigit(_chr) || _chr == '.' ) { return next_number(); } // Identifier or keyword if (std::isalpha(_chr) || _chr == '_') { return next_identifier_or_keyword(); } // Symbol literal if( match(':') ) { return next_symbol(); } // String literal if (_chr == '"') { return next_string(); } // Discard next character to avoid infinite loop char ch = _chr; next_chr(); // Return a tokenizer error value return Token(Type::ERR, std::string("Edge case - tokenizer cannot match character: ") + ch , _pos, _pos, _lin, _col); } private: Token next_delimiter() { int begin = _pos - 1; if( match('(') ){ return Token(Type::LPAR, "(", begin, _pos - 1, _lin, _col); } if( match(')') ){ return Token(Type::RPAR, ")", begin, _pos - 1, _lin, _col); } if( match('+') ){ return Token(Type::ADD, "+", begin, _pos - 1, _lin, _col); } if( match('-') ){ if( match('>') ){ return Token(Type::ARROW, "->", begin, _pos - 1, _lin, _col); } return Token(Type::SUB, "-", begin, _pos - 1, _lin, _col); } if( match('*') ){ return Token(Type::MUL, "*", begin, _pos - 1, _lin, _col); } if( match('/') ){ // Ingore comment until new line if( match('/') ) { while( !is_eof() && _chr != '\n' ){ next_chr(); } return next_token(); } return Token(Type::DIV, "/", begin, _pos - 1, _lin, _col); } if( match('^') ){ return Token(Type::POW, "^", begin, _pos - 1, _lin, _col); } if( match(',') ){ return Token(Type::COM, ",", begin, _pos - 1, _lin, _col); } if( match(';') ){ return Token(Type::SEM, ";", begin, _pos - 1, _lin, _col); } if( match('=') ){ if( match('=') ){ return Token(Type::EQ, "==", begin, _pos, _lin, _col); } return Token(Type::ASN, "=", begin, _pos, _lin, _col); } if( match('>') ){ if( match('=') ){ return Token(Type::GTE, ">=", begin, _pos, _lin, _col); } return Token(Type::GT, ">", begin, _pos, _lin, _col); } if( match('<') ){ if( match('=') ){ return Token(Type::LTE, "<=", begin, _pos, _lin, _col); } return Token(Type::LT, "<", begin, _pos, _lin, _col); } if( match('!') ){ if( match('=') ){ return Token(Type::NEQ, "!=", begin, _pos, _lin, _col); } return Token(Type::NOT, "!", begin, _pos, _lin, _col); } return Token(Type::NONE, "", begin, _pos, _lin, _col); } Token next_identifier_or_keyword() { std::string lexeme; int begin = _pos - 1; while (!is_eof() && (std::isalnum(_chr) || _chr == '_')) { lexeme = lexeme + _chr; next_chr(); } auto it = _keywords.find(lexeme); if(it != _keywords.end()){ Type t = it->second; return Token(t, lexeme, begin, _pos - 1, _lin, _col); } return Token(Type::IDEN, lexeme, begin, _pos - 1, _lin, _col); } Token next_symbol() { std::string lexeme; // next_chr(); int begin = _pos - 1; while (!is_eof() && (std::isalnum(_chr) || _chr == '_')) { lexeme = lexeme + _chr; next_chr(); } return Token(Type::SYM, lexeme, begin, _pos - 1, _lin, _col); } // Returns next string literal Token next_string() { assert( _chr == '"'); std::string lexeme; int begin = _pos - 1; next_chr(); // std::fprintf(stderr, " [TRACE] Detect string literal\n"); while (!is_eof() && _chr != '"') { // Handle escape characters if( match('\\') ){ if( match('n') ){ lexeme = lexeme + "\n"; continue; } if( match('t') ){ lexeme = lexeme + "\t"; continue; } if( match('r') ){ lexeme = lexeme + "\r"; continue; } if( match('v') ){ lexeme = lexeme + "\v"; continue; } if( match('"') ){ lexeme = lexeme + "\""; continue; } if( match('\\') ){ lexeme = lexeme + "\\"; continue; } return Token(Type::ERR, "Tokenizer error - invalid escape character.", _pos, _pos, _lin, _col); } lexeme = lexeme + _chr; next_chr(); } assert( _chr == '"' ); next_chr(); // std::fprintf(stderr, " [TRACE] _chr = %c \n", _chr); return Token(Type::STR, lexeme, begin, _pos - 1, _lin, _col); } // Returns next integer std::string next_int() { std::string lexeme = ""; while( std::isdigit(_chr) || _chr == '_' ){ lexeme = lexeme + _chr; next_chr(); } // std::fprintf(stderr, " [TRACE] next number = %s \n", lexeme.c_str()); return lexeme; } // Returns next number token Token next_number() { std::string lexeme = ""; int begin = _pos - 1; if(_chr != '.'){ lexeme = next_int(); } // std::fprintf(stderr, "[TRACE] _chr = %c \n", _chr); if(_chr != '.' && _chr != 'e' && _chr != 'E' ) { if( match('%') ) { return Token(Type::NPC, lexeme, begin, _pos - 1, _lin, _col); } return Token(Type::INT, lexeme, begin, _pos - 1, _lin, _col); } if( match('.') ) { lexeme = lexeme + "."; lexeme = lexeme + next_int(); } // std::cout << " [TRACE] chr = " << _chr << std::endl; // Floating point exponent if( match('e') || match('E') ) { // std::fprintf(stderr, "[TRACE] Found exponent \n"); lexeme = lexeme + "e"; // next_chr(); if( _chr != '-' && !std::isdigit(_chr) ){ return Token(Type::ERR, "Invalid floatitng point number: " + lexeme, begin, _pos -1, _lin, _col ); } if( match('-') ){ lexeme = lexeme + "-"; } if( !isdigit(_chr) ){ return Token(Type::ERR, "Invalid floatitng point number. Expected digit.", begin, _pos -1, _lin, _col ); } lexeme = lexeme + next_int(); } if( match('%') ) { return Token(Type::NPC, lexeme, begin, _pos - 1, _lin, _col); } return Token(Type::FLT, lexeme, begin, _pos - 1, _lin, _col); } }; // ------ End of class Tokenizer --------- // Forward declaration struct AstNil; struct AstInt; struct AstFlt; struct AstIden; struct AstSym; struct AstBool; struct AstStr; struct AstBinop; struct AstUnop; struct AstAsn; struct AstCall; struct AstDef; struct AstIf; struct AstWhile; struct AstFor; struct AstProg; struct AstErr; struct IAstVisitor{ virtual void visit(AstNil& ast) = 0; virtual void visit(AstInt& ast) = 0; virtual void visit(AstFlt& ast) = 0; virtual void visit(AstIden& ast) = 0; virtual void visit(AstSym& ast) = 0; virtual void visit(AstBool& ast) = 0; virtual void visit(AstStr& ast) = 0; virtual void visit(AstBinop& ast) = 0; virtual void visit(AstUnop& ast) = 0; virtual void visit(AstAsn& ast) = 0; virtual void visit(AstCall& ast) = 0; virtual void visit(AstDef& ast) = 0; virtual void visit(AstWhile& ast) = 0; virtual void visit(AstIf& ast) = 0; virtual void visit(AstFor& ast) = 0; virtual void visit(AstProg& ast) = 0; virtual void visit(AstErr& ast) = 0; }; // Represents abstract syntax tree noes struct Ast { int begin = 0; int end = 0; virtual ~Ast() = default; /// Check if AST node is nil (null value) virtual bool is_nil() const { return false; }; /// Check if AST node is number virtual bool is_num() const { return false; }; /// Check if AST node is integer virtual bool is_int() const { return false; }; /// Check if AST node is floating point virtual bool is_flt() const { return false; }; /// Check if AST node is boolean virtual bool is_bool() const { return false; }; /// Check if AST node is identifier virtual bool is_iden() const { return false; }; /// Check if AST node is symbol virtual bool is_sym() const { return false; }; /// Check if AST node is string virtual bool is_str() const { return false; }; /// Check if AST node is binary operation virtual bool is_binop() const { return false; }; /// Check if AST node is unary operation virtual bool is_unop() const { return false; }; /// Check whether AST node is assignment virtual bool is_asn() const { return false; }; /// Check whether AST node is an error virtual bool is_err() const { return false; } // Check if AST is function call virtual bool is_call() const { return false; } // Check if AST is function defintion virtual bool is_def() const { return false; } // Check if AST is if statement virtual bool is_if() const { return false; } // Check if AST is while statement virtual bool is_while() const { return false; } // Chekc if AST is for-loop statement virtual bool is_for() const { return false; } // Check if AST is program virtual bool is_prog() const { return false; } virtual int to_int() const { return -1; }; virtual double to_flt() const { return -1; }; virtual std::string to_str() const { return ""; }; virtual void accept(IAstVisitor& v) = 0; }; // Nil AST node struct AstNil: public Ast { bool is_nil() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } AstNil(){} AstNil(int begin, int end) { this->begin = begin; this->end = end; } }; // Integer AST node struct AstInt: public Ast { int value; AstInt(int value): value(value){ } AstInt(int value, int begin, int end): value(value) { this->begin = begin; this->end = end; } bool is_num() const override { return true; }; bool is_int() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } virtual int to_int() const override { return value; }; virtual double to_flt() const override { return value; }; }; // Floating point literal struct AstFlt: public Ast { double value; AstFlt(double value): value(value){ } AstFlt(double value, int begin, int end): value(value) { this->begin = begin; this->end = end; } bool is_num() const override { return true; }; bool is_flt() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } int to_int() const override { return value; }; double to_flt() const override { return value; }; }; // Identifier node struct AstIden: public Ast { std::string value; AstIden(std::string const& value): value(value){ } AstIden(std::string const& value, int begin, int end): value(value) { this->begin = begin; this->end = end; } bool is_iden() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } std::string to_str() const override { return value; }; }; // Symbol node - for isntance, :x, :y, :asymbol struct AstSym: public Ast { std::string value; AstSym(std::string const& value): value(value){ } AstSym(std::string const& value, int begin, int end): value(value) { this->begin = begin; this->end = end; } bool is_sym() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } std::string to_str() const override { return ":" + value; }; }; // Boolean literal struct AstBool: public Ast { bool value; AstBool(bool value): value(value){ } AstBool(bool value, int begin, int end): value(value) { this->begin = begin; this->end = end; } bool is_bool() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } }; // String literal AST node struct AstStr: public Ast { std::string value; AstStr(std::string const& value): value(value){ } AstStr(std::string const& value, int begin, int end): value(value) { this->begin = begin; this->end = end; } bool is_str() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } std::string to_str() const override { return value; }; }; // Binary operation such as: <node> + <node> struct AstBinop: public Ast { // Operator Oper op; // Left-hand side std::shared_ptr<Ast> lhs; // Right-hand side std::shared_ptr<Ast> rhs; AstBinop(Oper op, std::shared_ptr<Ast> lhs, std::shared_ptr<Ast> rhs) : op(op), lhs(lhs), rhs(rhs){ } bool is_binop() const override { return true; } void accept(IAstVisitor& v) override { v.visit(*this); } }; // Unary operation such as: -<node> +<node> -200 struct AstUnop: public Ast { // Operator Oper op; // Left-hand side std::shared_ptr<Ast> node; AstUnop(Oper op, std::shared_ptr<Ast> node) : op(op), node(node) { } bool is_unop() const override { return true; } void accept(IAstVisitor& v) override { v.visit(*this); } }; // Assignment node ('=') - represents expressions such as // x = 20; z = call(20, 1.25) struct AstAsn: public Ast { // identifier (left-hand side) L-value std::string iden; // right-hand side of assignment R-value std::shared_ptr<Ast> node; AstAsn(std::string const& iden, std::shared_ptr<Ast> node) : iden(iden), node(node) { } bool is_asn() const override { return true; }; void accept(IAstVisitor& v) override { v.visit(*this); } }; // Represents a function call struct AstCall: public Ast { // identifier - AST or function name (identidier) std::shared_ptr<Ast> iden; // Function arguments std::vector<std::shared_ptr<Ast>> args; bool is_call() const override { return true; } // Get function name std::string name() const { if(iden->is_iden()){ return iden->to_str(); } return ""; } AstCall(){} AstCall(std::shared_ptr<Ast> iden, std::vector<std::shared_ptr<Ast>> const& args) : iden(iden), args(args) { } AstCall(std::string const& name, std::vector<std::shared_ptr<Ast>> const& args) : iden( std::make_shared<AstIden>(name) ), args(args) { } void accept(IAstVisitor& v) override { v.visit(*this); } }; // Function defintion struct AstDef: public Ast { // identifier std::string name; // Function arguments std::vector<std::string> args; // Function body std::vector<std::shared_ptr<Ast>> body; bool is_def() const override { return true; } AstDef(){} void accept(IAstVisitor& v) override { v.visit(*this); } }; struct AstIf: public Ast { // Condition of an if-else statement std::shared_ptr<Ast> cond; // Code block related to then statement std::shared_ptr<AstProg> then_block; // Coode block related to else statement std::shared_ptr<AstProg> else_block; bool is_if() const override { return true; } AstIf(){} void accept(IAstVisitor& v) override { v.visit(*this); } }; struct AstWhile: public Ast { std::shared_ptr<Ast> cond; // Function arguments std::vector<std::shared_ptr<Ast>> block; bool is_while() const override { return true; } AstWhile(){} void accept(IAstVisitor& v) override { v.visit(*this); } }; // Represents a for-loop, for instance: // for i=0 to 10 do print(i) end // for i=10 to 0 by -2 do print(i) end struct AstFor: public Ast { // For-loop variable std::string var; // For-loop lower limit std::shared_ptr<Ast> lower = nullptr; // For-loop upper limit std::shared_ptr<Ast> upper = nullptr; // For-loop step std::shared_ptr<Ast> step = nullptr; // Code block executed each cycle // std::vector<std::shared_ptr<AST>> body{}; std::shared_ptr<AstProg> block = {}; bool is_for() const override { return true; } AstFor(){} void accept(IAstVisitor& v) override { v.visit(*this); } }; // Represents a program - zero or more statements struct AstProg: public Ast { // Function arguments std::vector<std::shared_ptr<Ast>> statements; AstProg(){} AstProg(std::vector<std::shared_ptr<Ast>> const& statements) : statements(statements) { } bool is_prog() const override { return true; } void accept(IAstVisitor& v) override { v.visit(*this); } }; // Node returned by parser when there is an error struct AstErr: public Ast { std::string message; AstErr(){} AstErr(std::string message): message(message){ } bool is_err() const override { return true; } std::string to_str() const override { return message ; }; void accept(IAstVisitor& v) override { v.visit(*this); } }; struct Val; struct ValErr; /// Type alias for interpreter function using InterpFunc = std::function<std::shared_ptr<Val> (std::vector<std::shared_ptr<Val>> const& args)>; enum class ValType{ INT, FLT, SYM, STR, NIL, BOOL, FUN, ERR }; // Represents all runtime values produced by the interpreter struct Val { virtual ~Val() = default; virtual ValType type() const = 0; // Checks whether runtime value is a number virtual bool is_num() const { return false; } // Checks whether runtime value is int virtual bool is_int() const { return false; } // Checks whether runtime value is floating point number virtual bool is_flt() const { return false; } // Checks whether runtime value is string virtual bool is_str() const { return false; } // Checks whether runtime value is bool virtual bool is_bool() const { return false; } // Checks whether runtime value is symbpl virtual bool is_sym() const { return false; } // Checks whether runtime value is nil virtual bool is_nil() const { return false; } // Checks whether runtime value is error virtual bool is_err() const { return false; } // Checks whether value is a function virtual bool is_fun() const { return false; } // Convert runtime value to integer virtual int to_int() const { return 0; } // Convert runtime value to floating point virtual double to_flt() const { return 0; } // Convert value to string virtual std::string to_str() const = 0 ; // Convert value boolean - anything not nil or false (bool) is true virtual bool to_bool() const = 0; // It evaluates to true if function value is native (implemented in C++) // The usage of this function makes no sense if the current value is // not a function. virtual bool is_native() const { return false; } virtual void print(std::ostream& os) = 0; virtual std::shared_ptr<Val> call(std::vector<std::shared_ptr<Val>> const& args) { return nullptr; } friend std::ostream& operator<<(std::ostream& os, Val& val) { val.print(os); return os; } }; struct Env { std::map<std::string, std::shared_ptr<Val>> store; std::shared_ptr<Env> outer = nullptr; Env(){} Env(std::shared_ptr<Env> outer): outer(outer) { } std::shared_ptr<Val> get(std::string const& name) { auto it = store.find(name); if(it != store.end()){ return it->second; } if( outer == nullptr ){ return nullptr; } auto res = outer->get(name); return res; } void set(std::string const& name, std::shared_ptr<Val> value) { store[name] = value; } void clear(){ store.clear(); } }; /// Reprents an integer number struct ValInt : public Val { int value = 0; ValInt(){} ValInt(int value): value(value){} ValType type() const override { return ValType::INT; } bool is_num() const override { return true; } bool is_int() const override { return true; } std::string to_str() const override { return std::to_string(value); } bool to_bool() const override { return true; } int to_int() const override { return value; } double to_flt() const override { return value; } void print(std::ostream& os) override { os << value; } }; // Represents a floating point number struct ValFlt : public Val { double value = 0.0; ValFlt(){} ValFlt(double value): value(value){} ValType type() const override { return ValType::FLT; } bool is_num() const override { return true; } bool is_flt() const override { return true; } std::string to_str() const override { return std::to_string(value); } bool to_bool() const override { return true; } int to_int() const override { return value; } double to_flt() const override { return value; } void print(std::ostream& os) override { os << value; } }; // Represents a string struct ValStr: public Val { std::string value = ""; ValStr(){} ValStr(std::string value): value(value){} ValType type() const override { return ValType::STR; } std::string to_str() const override { return value; } bool to_bool() const override { return true; } bool is_str() const override { return true; } void print(std::ostream& os) override { os << '"' << value << '"'; } }; // Represents a symbol struct ValSym: public Val { std::string value = ""; ValSym(){} ValSym(std::string value) { // Remove first character from string if(value[0] == ':'){ value.erase(0, 1); } this->value = value; } ValType type() const override { return ValType::SYM; } std::string to_str() const override { return value; } bool to_bool() const override { return true; } bool is_sym() const override { return true; } void print(std::ostream& os) override { os << ':' << value ; } }; // Represents a boolean struct ValBool: public Val { bool value = false; ValBool(){} ValBool(bool value): value(value){} ValType type() const override { return ValType::BOOL; } std::string to_str() const override { return value ? "true" : "false"; } bool to_bool() const override { return value; } bool is_bool() const override { return true; } void print(std::ostream& os) override { os << (value ? "true" : "false" ) ;} }; // Represents a nil struct ValNil: public Val { ValNil(){} ValType type() const override { return ValType::NIL; } std::string to_str() const override { return "nil"; } bool to_bool() const override { return false; } bool is_nil() const override { return true; } void print(std::ostream& os) override { os << "nil"; } }; // Represents a runtime error (interpreter error) struct ValErr: public Val { int code = 0; std::string message = ""; ValErr(){} ValErr(std::string message): message(message){ } ValErr(int code, std::string message): code(code), message(message){ } ValType type() const override { return ValType::ERR; } std::string to_str() const override { return message; } bool to_bool() const override { return false; } bool is_err() const override { return true; } void print(std::ostream& os) override { os << "ERROR - " << message ; } }; // Represents a function at runtime struct ValFun: public Val { // Function name or identifier std::string name = ""; // Function description metadata std::string desc = ""; // Function arguments names std::vector<std::string> args = {}; // If true, this flag indicates that the function is implemented in C++ bool _is_native = false; // Function body AstProg body; // Funcion implementation in C++ InterpFunc func = nullptr; // Function environment std::shared_ptr<Env> env = std::make_shared<Env>(); ValFun(){} ValType type() const override { return ValType::FLT; } // Convert value to string std::string to_str() const override { return "<Function " + name + "() >"; } ; bool is_fun() const override { return true; } bool is_native() const override { return func != nullptr; } // Convert value boolean - anything not nil or false (bool) is true bool to_bool() const override { return true; } void print(std::ostream& os) override { os << "<Function " << name << ">"; } virtual std::shared_ptr<Val> call(std::vector<std::shared_ptr<Val>> const& args) override { return this->func(args); } }; /// Print AST as Lisp-like S-expression struct PrintSexpVisitor: public IAstVisitor { void visit(AstNil& ast) { std::cout << "nil"; } void visit(AstInt& ast) { std::cout << ast.value; } void visit(AstFlt& ast) { std::cout << ast.value; } void visit(AstIden& ast) { std::cout << ast.value; } void visit(AstSym& ast) { std::cout << ":" << ast.value; } void visit(AstBool& ast) { std::cout << (ast.value ? "true" : "false"); } void visit(AstStr& ast) { std::cout << '"' << ast.value << '"'; }; void visit(AstBinop& ast) { auto op = oper_to_str(ast.op); std::cout << "(" << op; std::cout << " "; ast.lhs->accept(*this); std::cout << " "; ast.rhs->accept(*this); std::cout << " )"; } void visit(AstUnop& ast) { auto op = oper_to_str(ast.op); std::cout << "(" << op; std::cout << " "; ast.node->accept(*this); std::cout << ")"; } void visit(AstAsn& ast) { std::cout << "(SET " << ast.iden; std::cout << " "; ast.node->accept(*this); std::cout << ")"; } void visit(AstCall& ast) { std::cout << "(CALL "; ast.iden->accept(*this); for(auto const& arg: ast.args) { std::cout << " "; arg->accept(*this); } std::cout << ")"; } void visit(AstDef& ast) { std::cout << "(FN " << ast.name << " ( "; for(auto const& arg: ast.args) { std::cout << arg << " "; } std::cout << ") "; for(auto const& s: ast.body) { std::cout << " "; s->accept(*this); } std::cout << ")"; } void visit(AstIf& ast) { std::cout << "(IF "; ast.cond->accept(*this); ast.then_block->accept(*this); ast.else_block->accept(*this); std::cout << ")"; } void visit(AstWhile& ast) { std::cout << "(WHILE "; ast.cond->accept(*this); std::cout << " (DO "; for(auto& st: ast.block) { st->accept(*this) ; std::cout << " "; } std::cout << "))"; } void visit(AstFor& ast) { std::cout << "(FOR " << ast.var; std::cout << " "; ast.lower->accept(*this); std::cout << " "; ast.upper->accept(*this); std::cout << " "; if( ast.step ){ ast.step->accept(*this); std::cout << " "; } ast.block->accept(*this); std::cout << ")"; } void visit(AstProg& ast) { std::cout << "(DO "; for(auto const& stat: ast.statements) { std::cout << "\n "; stat->accept(*this); } std::cout << " \n)"; } void visit(AstErr& ast){ std::cout << "(ERROR " << ast.to_str() << " )"; } }; // Pretty print AST as infix expressions (no lisp-like S-Expressions) // This procedure is useful for printing ASTs representing math expressions in math format (infix). void pprint(const Ast& ast) { auto priority = [](Oper op) -> int { if(op == Oper::ADD || op == Oper::SUB){ return 3; } if(op == Oper::MUL || op == Oper::DIV){ return 4; } if(op == Oper::POW){ return 5; } throw std::runtime_error( " Not implemented for operator: " + oper_to_str(op)); }; if( ast.is_int() ) { std::cout << ast.to_int(); } else if( ast.is_flt() ) { std::cout << ast.to_flt(); } else if( ast.is_iden() ) { std::cout << ast.to_str(); } else if( ast.is_asn() ) { auto n = static_cast<const AstAsn&>(ast); std::cout << n.iden << " = "; pprint(*n.node); } else if( ast.is_call() ) { //std::fprintf(stderr, " [TRACE] Function call AST node \n"); auto n = static_cast<const AstCall&>(ast); pprint(*n.iden); std::cout << "("; size_t k = n.args.size(); if( k > 0){ pprint(*n.args[0]); } for(size_t i = 1; i < n.args.size(); i++) { std::cout << ", "; pprint(*n.args[i]); } std::cout << ")"; } else if( ast.is_unop() ) { auto n = static_cast<const AstUnop&>(ast); auto op = n.op; std::cout << oper_to_str(op); if( n.node->is_num() || n.node->is_iden() || n.node->is_nil() || n.node->is_bool() ) { pprint(*n.node); } if( n.node->is_binop() ) { std::cout << "("; pprint(*n.node); std::cout << ")"; } } else if( ast.is_binop() ) { auto n = static_cast<const AstBinop&>(ast); auto op = n.op; if( n.lhs->is_num() || n.lhs->is_iden() || n.lhs->is_unop() || n.lhs->is_call() ) { pprint(*n.lhs); } if( n.lhs->is_binop()) { auto lhs = std::static_pointer_cast<AstBinop>(n.lhs); auto opn = lhs->op; //if( (op == Oper::MUL || op == Oper::DIV) && (opn == Oper::ADD || opn == Oper::SUB) ) if( priority(op) > priority(opn) ) { std::cout << "( "; pprint(*lhs->lhs); std::cout << " "; std::cout << oper_to_str(opn); std::cout << " "; pprint(*lhs->rhs); std::cout << " )"; } else { pprint(*lhs); } } std::cout << " " << oper_to_str(op) << " "; if( n.rhs->is_num() || n.rhs->is_iden() || n.rhs->is_unop() || n.rhs->is_call() ) { pprint(*n.rhs); } if( n.rhs->is_binop()) { auto rhs = std::static_pointer_cast<AstBinop>(n.rhs); auto opn = rhs->op; //if( (op == Oper::MUL || op == Oper::DIV) && (opn == Oper::ADD || opn == Oper::SUB) ) if( priority(op) > priority(opn) ) { std::cout << "( "; pprint(*rhs->lhs); std::cout << " "; std::cout << oper_to_str(opn); std::cout << " "; pprint(*rhs->rhs); std::cout << " )"; } else { pprint(*rhs); } } } // std::cerr << " [ERROR] Not implemented for this AST node \n"; } // ---- End of pprint() function -----// class Parser { Token _token; std::vector<Token> _tokens; int _idx; public: Parser() { _tokens = {}; _idx = 0; } void read(std::istream& is) { Tokenizer tok(is); _tokens = tok.tokens(); _idx = 0; this->next(); } // Parse a program std::shared_ptr<Ast> parse(std::string const& code) { std::stringstream ss(code); read(ss); return _parse_prog(); } std::shared_ptr<Ast> parse(std::istream& is) { read(is); return _parse_prog(); } // Parse a single expression or statement std::shared_ptr<Ast> parse_expr(std::string const& code) { std::stringstream ss(code); this->read(ss); return _parse_stat(); } Token next() { // _token = _tokenizer.next_token(); // std::cerr << " [TRACE] token = " << _token << std::endl; if( _idx == _tokens.size() ) { _token = Token(Type::EOF, "", 0, 0, 0, 0); } if( _idx < _tokens.size() ){ _token = _tokens[_idx]; _idx = _idx + 1; } return _token; } Token peek() const { return _token; } bool is_eof(){ return _token.type == Type::EOF; } bool check(Type type) { return _token.type == type; } bool match(Type type) { if( _token.type == type ){ this->next(); return true; } return false; } void expect(Type type, std::string const& message) { if( _token.type == type ){ this->next(); return; } throw std::runtime_error(message); } std::shared_ptr<AstErr> expect_token(Type type) { if( _token.type == type){ this->next(); return nullptr; } std::stringstream ss; ss << "Expected token of type "; if (type == Type::ASN ){ ss << "'=' (assignemnt)"; } else if(type == Type::RPAR){ ss << "')' right/closing parenthesis"; } else { ss << (int) type; } ss << " but got token " << _token; return std::make_shared<AstErr>(ss.str()); } private: // Replace string std::string replace( const std::string& text, const std::string& rep, const std::string& subst ) { std::string out = text; // Find position of character matching the string size_t i = out.find(rep); while(i != std::string::npos){ out.replace(i, rep.size(), subst); i = out.find(rep, i); } return out; } /// Parse atom std::shared_ptr<Ast> _parse_atom() { auto tok = peek(); // Parse integer number if( match(Type::INT) ){ return std::make_shared<AstInt>( std::stoi( replace(tok.text, "_", "") ), tok.begin, tok.end ); } // Parse floating point number if( match(Type::FLT) ){ return std::make_shared<AstFlt>( std::stod( replace(tok.text, "_", "") ), tok.begin, tok.end ); } // Parse floating point number (in percent format) if( match(Type::NPC) ){ return std::make_shared<AstFlt>( std::stod( replace(tok.text, "_", "") ) / 100.0, tok.begin, tok.end ); } if( match(Type::NIL) ){ return std::make_shared<AstNil>( tok.begin, tok.end); } if( match(Type::STR) ){ return std::make_shared<AstStr>( tok.text, tok.begin, tok.end ); } if( match(Type::IDEN) ){ return std::make_shared<AstIden>( tok.text, tok.begin, tok.end ); } if( match(Type::SYM) ){ return std::make_shared<AstSym>( tok.text, tok.begin, tok.end ); } if( match(Type::BOOL) ){ return std::make_shared<AstBool>( tok.text == "true" ? true : false, tok.begin, tok.end ); } if( _token.type == Type::FN ){ return _parse_def(); } if( _token.type == Type::IF ){ return _parse_if(); } int begin = _token.begin; if( match(Type::LPAR) ) { auto expr = this->_parse_expr(); //std::cout << " [TRACE] token = " << _token << std::endl; int end = _token.end; auto err = expect_token(Type::RPAR); if(err != nullptr ){ return err; } // expect(Type::RPAR, "Expected right/closing parenthesis"); expr->begin = begin; expr->end = end; return expr; } // throw std::runtime_error("Error: - atom() function - not implemented "); std::stringstream ss; ss << "ERROR - edge case found - atom() cannot be parserd => token = " << tok; return std::make_shared<AstErr>(ss.str()); } // Parse function call, example atan2(y, x) // call: atom // | IDEN "(" ( expr ( "," expr )* )? ")" std::shared_ptr<Ast> _parse_call() { auto atom = this->_parse_atom(); if( atom->is_err() ){ return atom; } if( !match(Type::LPAR) ) { return atom; } int begin = atom->begin; // Function arguments auto args = std::vector<std::shared_ptr<Ast>>{}; // Function with zero arguments if( match(Type::RPAR) ) { auto _call = std::make_shared<AstCall>(atom, args); _call->begin = begin; _call->end = _token.end; return _call; } auto arg = _parse_expr(); args.push_back(arg); while( _token.type == Type::COM ) { next(); arg = _parse_expr(); args.push_back(arg); } int end = _token.end; auto err = expect_token(Type::RPAR); if( err ){ return err; } auto _call = std::make_shared<AstCall>(atom, args); _call->begin = begin; _call->end = end; return _call; } // Parse power operation such as 2^3 std::shared_ptr<Ast> _parse_power() { auto lhs = _parse_call(); if( lhs->is_err() ){ return lhs; } int begin = lhs->begin; int end = lhs->end; while( check(Type::POW) ) { auto op = type_to_oper(_token.type); next(); auto rhs = _parse_unary(); if( rhs->is_err() ){ return rhs; } lhs = std::make_shared<AstBinop>(op, lhs, rhs); end = lhs->end; } lhs->begin = begin; lhs->end = end; return lhs; } // Parse unary expressions such as: a, -a or +a std::shared_ptr<Ast> _parse_unary() { auto tok = _token; // std::cerr << " [TRACE] unary => token = " << tok << std::endl; if( tok.type == Type::SUB || tok.type == Type::ADD || tok.type == Type::NOT) { int begin = tok.begin; // std::fprintf(stdout, " [TRACE] unary =>> begin = %d ; end = %d \n" // , tok.begin, tok.end); auto oper = type_to_oper(tok.type); next(); auto factor = _parse_unary(); if( factor->is_err() ){ return factor; } int end = factor->end; auto ast = std::make_shared<AstUnop>(oper, factor); ast->begin = begin; ast->end = end; return ast; } return _parse_power(); } // Parse multiplication and division expressions // such as 10 * 20 or x / y std::shared_ptr<Ast> _parse_factor() { auto lhs = _parse_unary(); if( lhs->is_err() ){ return lhs; } while( _token.type == Type::MUL || _token.type == Type::DIV ) { auto oper = type_to_oper(_token.type); next(); auto rhs = _parse_unary(); if( rhs->is_err() ){ return rhs; } int begin = lhs->begin; lhs = std::make_shared<AstBinop>(oper, lhs, rhs); lhs->begin = begin; lhs->end = rhs->end; } // lhs->begin = begin; // lhs->end = end; return lhs; } // Parse arithmetic expressions, such as sum and subtraction expressions such as // a + 10 ; a - b std::shared_ptr<Ast> _parse_arithmetic() { auto lhs = _parse_factor(); if( lhs->is_err() ){ return lhs; } while( _token.type == Type::ADD || _token.type == Type::SUB ) { auto oper = type_to_oper(_token.type); next(); auto rhs = _parse_factor(); if( rhs->is_err() ){ return rhs; } int begin = lhs->begin; lhs = std::make_shared<AstBinop>(oper, lhs, rhs); lhs->begin = begin; lhs->end = rhs->end; } return lhs; } // Parse comparison expressions // x >= 10 ; x + 20 < 200 std::shared_ptr<Ast> _parse_comparison() { auto lhs = _parse_arithmetic(); if( lhs->is_err() ){ return lhs; } while( _token.type == Type::LT || _token.type == Type::LTE || _token.type == Type::GT || _token.type == Type::GTE ) { auto op = type_to_oper(_token.type); next(); auto rhs = _parse_arithmetic(); if( rhs->is_err() ){ return rhs; } int begin = lhs->begin; lhs = std::make_shared<AstBinop>(op, lhs, rhs); lhs->begin = begin; lhs->end = rhs->end; } return lhs; } // Parse equality and inequality expressions // a + 200 == 400 ; a > 10 == a > 20 std::shared_ptr<Ast> _parse_equality() { // return term1(); auto lhs = _parse_comparison(); if( lhs->is_err() ){ return lhs; } while( _token.type == Type::EQ || _token.type == Type::NEQ ) { auto op = type_to_oper(_token.type); next(); auto rhs = _parse_comparison(); if( rhs->is_err() ){ return rhs; } int begin = lhs->begin; lhs = std::make_shared<AstBinop>(op, lhs, rhs); lhs->begin = begin; lhs->end = rhs->end; } return lhs; } // Parse logical expressions such as '10 > x and x > sqrt(y) or false' std::shared_ptr<Ast> _parse_expr() { auto lhs = _parse_equality(); if( lhs->is_err() ){ return lhs; } while( _token.type == Type::AND || _token.type == Type::OR ) { auto op = type_to_oper(_token.type); next(); auto rhs = _parse_equality(); if( rhs->is_err() ){ return rhs; } int begin = lhs->begin; lhs = std::make_shared<AstBinop>(op, lhs, rhs); lhs->begin = begin; lhs->end = rhs->end; } return lhs; } // Parse while-do-end statement // For instance, this function parses this syntax: // // while i > 0 do // print(i); // i = i - 1; // end std::shared_ptr<Ast> _parse_while() { auto begin = _token.begin; if( !match(Type::WHILE) ) { return std::make_shared<AstErr>("Invalid while statement. Expected keyword 'while'"); } auto cond = _parse_expr(); if( cond->is_err() ){ return cond; } auto err = expect_token(Type::DO); if( err ){ return err; } auto block = std::vector<std::shared_ptr<Ast>>{}; while ( _token.type != Type::END ) { auto stat = this->_parse_stat(); if( stat->is_err() ){ return stat; } block.push_back(stat); } auto end = _token.end; err = expect_token(Type::END); if( err ){ return err; } auto ast = std::make_shared<AstWhile>(); ast->begin = begin; ast->end = end; ast->cond = cond; ast->block = block; return ast; } // Parse for-statement // Syntax example: for i = 0 to 10 by 2 do print(i) end // Syntax example: for i = 0 to 10 do print(i) end // Syntax: for <VAR> = <LOWER> to <UPPER> do <BLOCK>... end // Syntax for <VAR> = <LOWER> to <UPPER> by <STEP> do <BLOCK>... end std::shared_ptr<Ast> _parse_for() { auto begin = _token.begin; if( !match(Type::FOR) ) { return std::make_shared<AstErr>("Invalid FOR statement. Expected keyword 'for'"); } auto atom = _parse_atom(); if(atom->is_err()){ return atom; } if( !atom->is_iden() ) { return std::make_shared<AstErr>("Invalid FOR statement. Expected identifier"); } // For-loop variable auto var = atom->to_str(); // Equal '=' sign if( !match(Type::ASN) ) { return std::make_shared<AstErr>("Invalid FOR statement. Expected '=' sign."); } auto lower = _parse_expr(); if(lower->is_err()){ return lower; } if( !match(Type::TO) ) { return std::make_shared<AstErr>("Invalid FOR statement. Expected 'to' keyword."); } auto upper = _parse_expr(); if(upper->is_err()){ return upper; } auto step = std::shared_ptr<Ast>{nullptr}; if( match(Type::BY) ){ step = _parse_expr(); if( step->is_err() ){ return step; } } if( !match(Type::DO) ) { return std::make_shared<AstErr>("Invalid FOR statement. Expected 'do' keyword."); } auto block = std::vector<std::shared_ptr<Ast>>{}; while ( _token.type != Type::END ) { auto stat = this->_parse_stat(); if( stat->is_err() ){ return stat; } block.push_back(stat); } auto end = _token.end; if( !match(Type::END) ) { return std::make_shared<AstErr>("Invalid FOR statement. Expected 'end' keyword."); } auto ast = std::make_shared<AstFor>(); ast->begin = begin; ast->end = end; ast->var = var; ast->lower = lower; ast->upper = upper; ast->step = step; ast->block = std::make_shared<AstProg>(block); return ast; } std::shared_ptr<Ast> _parse_block() { auto block = std::vector<std::shared_ptr<Ast>>{}; auto begin = _token.begin; while ( _token.type != Type::END && _token.type != Type::ELSE && _token.type != Type::END ) { // parse statement auto stat = this->_parse_stat(); // Abort computation if there is any error if( stat->is_err() ){ return stat; } block.push_back(stat); } auto end = _token.end; //auto err = expect_token(Type::END); //if( err ){ return err; } auto ast = std::make_shared<AstProg>(block); ast->begin = begin; ast->end = end; return ast; } /** * @brief Parse if-else satements. * * Example => Parses statements such as: * * if x > 10 then * print("Greater than 10") * end * * z = if x < 0 then * print("negative") * :pos * else * print("Positive or zero") * :neg_or_zero * end * */ std::shared_ptr<Ast> _parse_if() { // std::fprintf(stderr, " [TRACE] Enter _parse_if() \n"); auto begin = _token.begin; if( !match(Type::IF) ){ return std::make_shared<AstErr>("Invalid if-else statement. Expected IF keyword."); } auto cond = _parse_expr(); if( cond->is_err() ){ return cond; } if( !match(Type::THEN) ){ return std::make_shared<AstErr>("Invalid if-else statement. Expected THEN keyword."); } auto then_block = _parse_block(); if( then_block->is_err() ){ return then_block; } auto ast = std::make_shared<AstIf>(); ast->begin = begin; ast->then_block = std::static_pointer_cast<AstProg>(then_block); ast->cond = cond; ast->end = _token.end; ast->else_block = std::make_shared<AstProg>(); if( match(Type::END) ) { return ast; } if( !match(Type::ELSE) ) { return std::make_shared<AstErr>("Invalid if-else statement. Expected ELSE keyword."); } auto else_block = _parse_block(); if( else_block->is_err() ){ return else_block; } if( !match(Type::END) ) { return std::make_shared<AstErr>("Invalid if-else statement. Expected END keyword."); } ast->else_block = std::static_pointer_cast<AstProg>(else_block); ast->end = _token.end; return ast; } // Parse function defintion // Example: // fn myfun (x, y) -> 2 * y + y // myfun = fn(x, y) -> 2 * y + y // // fn myfun(x, y) // z = x / 10; // a = z - x; // a + z // Last value of block is the return value // end // // fn <identifier> "(" arg0, arg1, ..., argN-1 ")" std::shared_ptr<Ast> _parse_def() { auto tok = _token; if( !match(Type::FN) ) { return std::make_shared<AstErr>("Invalid function defintion. Expected keyword 'fn'"); } std::string name = ""; if( _token.type == Type::IDEN ) { name = _token.text; next(); // return std::make_shared<ASTErr>("Invalid function defintion. Expected identifier."); } if( !match(Type::LPAR) ){ return std::make_shared<AstErr>("Invalid function defintion. Expected left parenthesis '('."); } // Function with empty argument, example: fn myfun() -> 10 + random() if( match(Type::RPAR) ){ auto out = std::make_shared<AstDef>(); out->name = name; out->args = {}; if( match(Type::ARROW) ) { auto expr = _parse_expr(); if( expr->is_err()){ return expr; } out->body = { expr }; out->begin = tok.begin; out->end = expr->end; return out; } auto block = _parse_block(); if(block->is_err()){ return block; } assert( block->is_prog() ); if( !match(Type::END) ){ return std::make_shared<AstErr>("Invalid function definition. Expected end keyword."); } out->body = std::static_pointer_cast<AstProg>(block)->statements; out->end = block->end; return out; } auto arg0 = _parse_atom(); if( !arg0->is_iden() ){ return std::make_shared<AstErr>("Invalid function defintion. Expected identifier as argument."); } auto args = std::vector<std::string>{ arg0->to_str() }; while( _token.type == Type::COM){ next(); auto arg = _parse_atom(); if( !arg0->is_iden() ){ return std::make_shared<AstErr>("Invalid function defintion. Expected identifier as argument."); } args.push_back( arg->to_str() ); } if( !match(Type::RPAR) ){ return std::make_shared<AstErr>("Invalid function defintion. Expected right parenthesis."); } auto out = std::make_shared<AstDef>(); out->name = name; out->args = args; out->begin = tok.begin; if( match(Type::ARROW) ){ auto expr = _parse_expr(); if( expr->is_err()){ return expr; } out->body = { expr }; out->end = expr->end; return out; } // Parse multiple statements until an 'end' keyword is found auto block = _parse_block(); if(block->is_err()){ return block; } if( !match(Type::END) ){ return std::make_shared<AstErr>("Invalid function definition. Expected end keyword."); } assert( block->is_prog() ); out->body = std::static_pointer_cast<AstProg>(block)->statements; out->end = block->end; return out; } // Parse statement // stat: expr | IDEN '=' expr std::shared_ptr<Ast> _parse_stat() { if( _token.type == Type::WHILE ){ return _parse_while(); } if( _token.type == Type::FOR ){ return _parse_for(); } // std::fprintf(stderr, " [TRACE] Enter stats() function \n"); int saved_idx = _idx; auto saved_token = _token; // int begin = atom->begin; int begin = _token.begin; auto atom = this->_parse_atom(); if( atom->is_err() || !atom->is_iden() || _token.type != Type::ASN ) { // Backtrack parser - restoring its state _idx = saved_idx; _token = saved_token; // attempt to parse an expression auto expr = this->_parse_expr(); // expect(Type::SEM, "Expected semicolon"); // optional semicolon for avoid ambiguity match(Type::SEM); return expr; } expect(Type::ASN, "Expected assignment"); auto expr = this->_parse_expr(); if( expr->is_err() ){ return expr; } int end = _token.end; //expect(Type::SEM, "Expected semicolon"); // optional semicolon for avoid ambiguity match(Type::SEM); auto ast = std::make_shared<AstAsn>(atom->to_str(), expr); ast->begin = begin; ast->end = end; return ast; } std::shared_ptr<Ast> _parse_prog() { auto statements = std::vector<std::shared_ptr<Ast>>{}; int begin = _token.begin; int end = 0; while ( _token.type != Type::EOF ) { auto stat = this->_parse_stat(); if( stat->is_err() ){ return stat; } end = stat->end; statements.push_back(stat); } auto ast = std::make_shared<AstProg>(statements); ast->begin = begin; ast->end = end; return ast; } }; // ----- End of class Parser ----------- // std::string substring(std::string const& str, int begin, int end) { std::string out; for(int k = begin; k < end; k++){ out = out + str[k]; } return out; } // Interpreter - turns the AST into a runtime value class Interp { std::shared_ptr<Env> _env; bool verbose = false; bool in_function = false; std::string code = ""; public: Interp() { _env = std::make_shared<Env>(); reset(); } void set_verbose(bool flag) { verbose = flag; } // Interpreter's RPL - Read-Print-Eval Loop (interactive shell) void repl() { Parser parser; PrintSexpVisitor printer; this->set_verbose(false); bool show_ast = false; std::string line, command, arg; auto ast = std::shared_ptr<Ast>{nullptr}; while( std::cin.good() ) { std::cout << "\n $> "; std::getline(std::cin, line); if(line == ""){ continue; } auto ss = std::stringstream(line); ss >> command >> arg; if(command == ":show_ast"){ show_ast = true; continue; } if(command == ":hide_ast"){ show_ast = false; continue; } if(command == ":quit" || line == ":q"){ break; } if(command == ":reset"){ this->reset(); continue; } // Allows entering a multi-line expression if(command == ":block") { std::cerr << " [INFO] Enter multi-line expression mode. Type (;;) when you are done. \n"; auto text = std::string{}; while( std::cin.good() ) { std::getline(std::cin, line); if(line == ";;"){ break; } text = text + line; } line = text; } if( command == ":load") { std::ifstream ifs(arg); if( !ifs.good() ){ std::fprintf(stderr, " [ERROR] Unable to open file '%s' \n", arg.c_str()); continue; } ast = parser.parse(ifs); } else { ast = parser.parse(line); } assert( ast != nullptr ); if(show_ast){ // Print AST in SEXP (S-Expression) Lisp-like format std::cout << " ast = "; ast->accept(printer); std::cout << '\n'; // Print AST in math-like infix notation /// std::cout << " pprint(ast) = "; pprint(*ast); std::cout << '\n'; } auto result = this->eval(*ast); // Set answer variable this->add_var("ans", result); std::cout << " = " << *result << '\n'; } } void add_var(std::string name, double value) { auto val = std::make_shared<ValFlt>(value); _env->set(name, val); } void add_var(std::string name, std::string value) { auto val = std::make_shared<ValStr>(value); _env->set(name, val); } void add_var(std::string name, std::shared_ptr<Val> value) { _env->set(name, value); } // Add C++ native function to interpreter void add_fun(std::string name, std::function<double (double)> fun) { auto fn = std::make_shared<ValFun>(); fn->name = name; fn->args = { "x" }; fn->func = [=](std::vector<std::shared_ptr<Val>> const& args) -> std::shared_ptr<Val> { if( args.size() != 1){ return std::make_shared<ValErr>("Function expects 1 argument"); } if( !args[0]->is_num() ){ return std::make_shared<ValErr>("Type mismatch - function expects number argument"); } auto x = args[0]->to_flt(); auto y = fun(x); return std::make_shared<ValFlt>(y); }; _env->set(name, fn); } void add_fun(std::string name, InterpFunc func) { auto fn = std::make_shared<ValFun>(); fn->name = name; fn->func = func; _env->set(name, fn); } // Reset interpreter environment (state) void reset() { _env->clear(); add_fun("inv", [](double x){ return 1.0 / x; } ); add_fun("abs", static_cast<double (*) (double)>(&std::abs) ); add_fun("cos", static_cast<double (*) (double)>(&std::cos) ); add_fun("sin", static_cast<double (*) (double)>(&std::sin) ); add_fun("tan", static_cast<double (*) (double)>(&std::tan) ); add_fun("exp", static_cast<double (*) (double)>(&std::exp) ); add_fun("sqrt", static_cast<double (*) (double)>(&std::sqrt) ); add_fun("log", static_cast<double (*) (double)>(&std::log) ); add_fun("log10", static_cast<double (*) (double)>(&std::log10) ); add_fun("log2", static_cast<double (*) (double)>(&std::log2) ); add_var("PI", 3.1415); // version constant add_var("version", "0.1"); // Load script file add_fun("load", [self = this](std::vector<std::shared_ptr<Val>> const& args) -> std::shared_ptr<Val> { if( args.size() != 1 ){ return std::make_shared<ValErr>("Expects 1 argument of type string."); } auto file = args[0]->to_str(); std::ifstream ifs(file); if( !ifs.good() ){ std::string text = " [ERROR] Unable to open file " + file; return std::make_shared<ValErr>(); } Parser parser; auto ast = parser.parse(ifs); return self->eval(*ast); }); add_fun("print", [=](std::vector<std::shared_ptr<Val>> const& args) -> std::shared_ptr<Val> { for(auto const& a: args){ std::cout << a->to_str(); } std::cout << '\n'; return std::make_shared<ValNil>(); }); add_fun("type", [=](std::vector<std::shared_ptr<Val>> const& args) -> std::shared_ptr<Val> { if( args.size() != 1 ){ return std::make_shared<ValErr>("Function type() expects 1 argument."); } if( args[0]->is_bool()){ return std::make_shared<ValSym>("bool"); } if( args[0]->is_nil()){ return std::make_shared<ValSym>("nil"); } if( args[0]->is_fun()){ return std::make_shared<ValSym>("fun"); } if( args[0]->is_int()){ return std::make_shared<ValSym>("int"); } if( args[0]->is_flt()){ return std::make_shared<ValSym>("flt"); } if( args[0]->is_str()){ return std::make_shared<ValSym>("str"); } if( args[0]->is_sym()){ return std::make_shared<ValSym>("sym"); } return std::make_shared<ValErr>("Edge case - function type() edge case ."); }); add_fun("max", [=](std::vector<std::shared_ptr<Val>> const& args) -> std::shared_ptr<Val> { if( args.size() < 1 ) { return std::make_shared<ValErr>("Function max() expects at least 1 argument."); } if( !args[0]->is_num() ) { return std::make_shared<ValErr>("Function max() expects number as argument."); } auto max = args[0]->to_flt(); float x = max; for(auto const& a: args){ if( !a->is_num() ) { return std::make_shared<ValErr>("Function max() expects number as argument."); } x = a->to_flt(); if(max < x){ max = x;} } return std::make_shared<ValFlt>(max); }); } std::shared_ptr<Val> eval(std::string const &code) { Parser parser; auto ast = parser.parse(code); this->code = code; return eval(*ast, _env); } std::shared_ptr<Val> eval(const Ast& ast) { return eval(ast, _env); } std::shared_ptr<Val> eval(const Ast& ast, std::shared_ptr<Env> env) { // Parser error if (ast.is_err()) { return std::make_shared<ValErr>("Parser error - " + ast.to_str()); } if (ast.is_nil()) { return std::make_shared<ValNil>(); } if (ast.is_int()) { return std::make_shared<ValInt>(ast.to_int()); } if (ast.is_flt()) { return std::make_shared<ValFlt>(ast.to_flt()); } if (ast.is_bool()) { //auto it = std::static_pointer_cast<ASTBool>(ast); auto it = static_cast<AstBool const&>(ast); return std::make_shared<ValBool>(it.value); } if (ast.is_str()) { return std::make_shared<ValStr>(ast.to_str()); } if (ast.is_sym()) { return std::make_shared<ValSym>(ast.to_str()); } if (ast.is_binop()) { // auto it = std::static_pointer_cast<ASTBinop>(ast); auto it = static_cast<AstBinop const&>(ast); return eval_binop(it, env); } if (ast.is_unop()) { // auto it = std::static_pointer_cast<ASTUnop>(ast); // auto it = static_cast<ASTUnop const&>(ast); auto it = static_cast<AstUnop const&>(ast); return eval_unop(it, env); } if (ast.is_asn()) { //auto it = std::static_pointer_cast<ASTAsn>(ast); auto it = static_cast<AstAsn const&>(ast); return eval_asn(it, env); } if (ast.is_iden()) { auto name = ast.to_str(); auto val = env->get(name); if (val == nullptr) { return std::make_shared<ValErr>("Unbound variable " + name); } return val; } if (ast.is_prog()) { //auto it = std::static_pointer_cast<ASTProg>(ast); auto it = static_cast<AstProg const&>(ast); return eval_prog(it, env); } if (ast.is_call()) { //auto it = std::static_pointer_cast<ASTCall>(ast); auto it = static_cast<AstCall const&>(ast); return eval_call(it, env); } if(ast.is_def()) { //auto it = std::static_pointer_cast<ASTDef>(ast); auto it = static_cast<AstDef const&>(ast); return eval_def(it, env); } if(ast.is_if()) { auto it = static_cast<AstIf const&>(ast); return eval_if(it, env); } if(ast.is_while()) { //auto it = std::static_pointer_cast<ASTWhile>(ast); auto it = static_cast<AstWhile const&>(ast); return eval_while(it, env); } if(ast.is_for()) { //auto it = std::static_pointer_cast<ASTFor>(ast); auto it = static_cast<AstFor const&>(ast); return eval_for(it, env); } return std::make_shared<ValErr>("Interpreter not implemented for this AST type"); } // --- End of eval() ---- // private: // Equality operator evaluation std::shared_ptr<Val> equal(Val const& lhs, Val const& rhs ) const { if( lhs.is_num() && rhs.is_num() ) { return std::make_shared<ValBool>(lhs.to_flt() == rhs.to_flt() ); } if( lhs.type() != rhs.type() ) { return std::make_shared<ValBool>( false); } if( lhs.type() == ValType::STR && rhs.type() == ValType::STR ) { return std::make_shared<ValBool>(lhs.to_str() == rhs.to_str() ); } if( lhs.type() == ValType::SYM && rhs.type() == ValType::SYM ) { return std::make_shared<ValBool>(lhs.to_str() == rhs.to_str() ); } if( lhs.type() == ValType::BOOL && rhs.type() == ValType::BOOL ) { return std::make_shared<ValBool>(lhs.to_bool() == rhs.to_bool() ); } if( lhs.type() == ValType::NIL && rhs.type() == ValType::NIL ) { return std::make_shared<ValBool>(true); } return std::make_shared<ValErr>("Type mismatch. Invalid use case of operator (==)."); } // Eval assignment operation std::shared_ptr<Val> eval_asn(AstAsn const &ast, std::shared_ptr<Env> env) { auto value = eval(*ast.node, env); if (value->is_err()) { return value; } env->set(ast.iden, value); return value; } // Eval unary operation std::shared_ptr<Val> eval_unop(AstUnop const &ast, std::shared_ptr<Env> env) { auto node = eval(*ast.node, env); if (node->is_err()) { return node; } switch (ast.op) { case Oper::ADD: return node; break; case Oper::SUB: if (!node->is_num()) { return std::make_shared<ValErr>("Negative operation is only valid for numbers"); } if (node->is_int()) { return std::make_shared<ValInt>(-node->to_int()); } if (node->is_flt()) { return std::make_shared<ValFlt>(-node->to_flt()); } default: return std::make_shared<ValErr>("Operation not defined for this operator: " + oper_to_str(ast.op)); } } // Eval binary operation std::shared_ptr<Val> eval_binop(AstBinop const &ast, std::shared_ptr<Env> &env) { auto lhs = eval(*ast.lhs, env); auto rhs = eval(*ast.rhs, env); if (lhs->is_err()) { return lhs; } if (rhs->is_err()) { return rhs; } switch (ast.op) { case Oper::ADD: if (lhs->is_flt() || rhs->is_flt()) { return std::make_shared<ValFlt>(lhs->to_flt() + rhs->to_flt()); } if (lhs->is_int() || rhs->is_int()) { return std::make_shared<ValInt>(lhs->to_int() + rhs->to_int()); } else if (lhs->is_str() || rhs->is_str()) { return std::make_shared<ValStr>(lhs->to_str() + rhs->to_str()); } else { return std::make_shared<ValErr>("Type mismatch (+) operator only valid for numbers or strings"); } break; case Oper::SUB: if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Addition operation is only valid for numbers"); } else if (lhs->is_flt() || rhs->is_flt()) { return std::make_shared<ValFlt>(lhs->to_flt() - rhs->to_flt()); } else { return std::make_shared<ValInt>(lhs->to_int() - rhs->to_int()); } break; case Oper::MUL: if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Multiplication operation is only valid for numbers"); } else if (lhs->is_flt() || rhs->is_flt()) { return std::make_shared<ValFlt>(lhs->to_flt() * rhs->to_flt()); } else { return std::make_shared<ValInt>(lhs->to_int() * rhs->to_int()); } break; case Oper::DIV: if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Division operation not valid for non numbers"); } return std::make_shared<ValFlt>(lhs->to_flt() / rhs->to_flt()); break; // POwer operation a^b case Oper::POW: if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Power operation not valid for numbers"); } return std::make_shared<ValFlt>(std::pow(lhs->to_flt(), rhs->to_flt())); break; // Less than => lhs < rhs case Oper::LT: if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Type mismatch. Less than (<) operator is only valid for numbers."); } return std::make_shared<ValBool>(lhs->to_flt() < rhs->to_flt()); break; // Greater than => lhs > rhs case Oper::GT: if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Type mismatch. Greater than (>) operator is only valid for numbers."); } return std::make_shared<ValBool>(lhs->to_flt() > rhs->to_flt()); break; // Less or equal than => lhs <= rhs case Oper::LTE: { if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Type mismatch. Operator (<=) is only valid for numbers."); } auto res = lhs->to_flt() <= rhs->to_flt(); return std::make_shared<ValBool>(res); break; } // Greater or equal than or equal => lhs >= rhs case Oper::GTE: { if (!lhs->is_num() || !rhs->is_num()) { return std::make_shared<ValErr>("Type mismatch. Operator (>=) is only valid for numbers."); } auto res = lhs->to_flt() >= rhs->to_flt(); return std::make_shared<ValBool>(res); break; } // Equality operator: lhs == rhs case Oper::EQ: return this->equal(*lhs, *rhs); break; // Inequality operator: lhs != rhs case Oper::NEQ: { auto res = this->equal(*lhs, *rhs); if( res->is_err() ){ return res; } auto b = res->to_bool(); return std::make_shared<ValBool>(!b); break; } case Oper::OR: { if( !lhs->is_bool() || !rhs->is_bool() ) { return std::make_shared<ValErr>("Or logical operator only valid for booleans."); } return std::make_shared<ValBool>(lhs->to_bool() || rhs->to_bool() ); break; } case Oper::AND: if( !lhs->is_bool() || !rhs->is_bool() ) { return std::make_shared<ValErr>("And logical operator only valid for booleans."); } return std::make_shared<ValBool>(lhs->to_bool() && rhs->to_bool() ); break; default: return std::make_shared<ValErr>("Binary operation not defined for this operator: " + oper_to_str(ast.op)); break; } } // ---- End of eval_binop() ------ // std::shared_ptr<Val> eval_if(AstIf const &ast, std::shared_ptr<Env> env) { auto cond = eval(*ast.cond, env); if( cond->is_err() ){ return cond; } if( cond->to_bool() ){ return eval_prog(*ast.then_block, env); } return eval_prog(*ast.else_block, env); } std::shared_ptr<Val> eval_while(AstWhile const &ast, std::shared_ptr<Env> env) { auto cond = eval(*ast.cond, env); if(cond->is_err()){ return cond; } while( cond->to_bool() ) { for(auto& st: ast.block) { auto res = eval(*st, env); if(res->is_err()){ return res; } } cond = eval(*ast.cond, env); if(cond->is_err()){ return cond; } } return std::make_shared<ValNil>(); } std::shared_ptr<Val> eval_for(AstFor const &ast, std::shared_ptr<Env> env) { auto lower = eval(*ast.lower, env); if(lower->is_err()){ return lower; } auto upper = eval(*ast.upper, env); if(upper->is_err()){ return upper; } auto step = std::shared_ptr<Val>{nullptr}; if( ast.step != nullptr ){ step = eval(*ast.step, env); if(step->is_err()){ return step; } } if( !lower->is_num() ){ return std::make_shared<ValErr>("Expected number as for-loop lower limit"); } if( !upper->is_num() ){ return std::make_shared<ValErr>("Expected number as for-loop upper limit"); } if( step && !step->is_num() ){ return std::make_shared<ValErr>("Expected number as for-loop step"); } auto lo = lower->to_int(); auto up = upper->to_int(); auto sp = step != nullptr ? step->to_int() : 1; if( sp == 0 ) { return std::make_shared<ValErr>("Expected for-loop step to be non zero."); } // Create temporary environment auto tenv = std::make_shared<Env>(env); if( lo <= up){ for(int k = lo; k < up; k = k + sp ) { tenv->set(ast.var, std::make_shared<ValInt>(k)); eval(*ast.block, tenv); } } else { if( sp > 0 ){ sp = -sp; } for(int k = lo; k > up; k = k + sp ) { tenv->set(ast.var, std::make_shared<ValInt>(k)); eval(*ast.block, tenv); } } return std::make_shared<ValNil>(); } // Eval program std::shared_ptr<Val> eval_prog(AstProg const &ast, std::shared_ptr<Env> env) { std::shared_ptr<Val> res = std::make_shared<ValNil>(); for (auto &n : ast.statements) { res = eval(*n, env); if (res->is_err()) { return res; } if (verbose && !in_function) { auto s = substring(this->code, n->begin, n->end); std::cout << " [*] " << s << " => " << *res << std::endl; } } return res; } // Eval function defintion std::shared_ptr<Val> eval_def(AstDef const &ast, std::shared_ptr<Env> env) { auto val = std::make_shared<ValFun>(); // std::fprintf(stderr, " [TRACE] Fundef => name = %s \n", ast.name.c_str()); val->name = ast.name; val->args = ast.args; val->_is_native = false; val->body = ast.body; val->env->outer = env; // Only register non anonymous functions (non lambdas) if( val->name != "" ){ env->set(val->name, val); } return val; // return std::make_shared<ValErr>("Not implemented for ASTDef"); } // Eval function call std::shared_ptr<Val> eval_call(AstCall const &ast, std::shared_ptr<Env> env) { // auto it = env.get(ast.name); auto it = this->eval(*ast.iden, env); //if ( it == nullptr ) //{ return std::make_shared<ValErr>("Function " + ast.name + " not defined"); } if ( !it->is_fun() ) { return std::make_shared<ValErr>("Object is not a function."); } auto fun = std::static_pointer_cast<ValFun>(it); // Vector containing evaluated function arguments std::vector<std::shared_ptr<Val>> args; args.reserve(ast.args.size()); // Evalute arguments before function application for (auto &arg : ast.args) { // Evaluate function argument auto res = this->eval(*arg, env); // Abort computation if any error is found. if (res->is_err()) { return res; } args.push_back(res); } // Native function - implemented in C++ if( fun->func != nullptr ) { auto res = fun->func(args); return res; } // ---- Non native function ---------- // // Create temporary enviroment. if (ast.args.size() != fun->args.size()) { return std::make_shared<ValErr>("Invalid number of arguments passed to function " + fun->name); } auto tenv = std::make_shared<Env>(fun->env); for (size_t i = 0; i < fun->args.size(); i++) { tenv->set(fun->args[i], args[i]); } in_function = true; // Evaluate function body using the temporary environment. auto res = eval_prog(fun->body, tenv); in_function = false; return res; } }; // ---- End of class interp -------------- // int main() { #if 1 printf("\n ============= Interpreter =============== \n"); Interp interp; interp.set_verbose(true); auto parser = Parser(); auto ast = parser.parse_expr(" z = - ( -20 + 8 * 10 / 25 ) - 3 * random() + 100 / sqrt( 5 * 10 + 3 * (20 - 7 + 8) + 3 ^ (2 * 3) ) * 15"); PrintSexpVisitor printer; std::cout << " ast = "; ast->accept(printer); std::cout << "\n"; std::cout << " PPRINT => "; pprint(*ast); interp.repl(); #endif return 0; }
Building and running
Building:
# Clone repository $ git clone https://gist.github.com/b5c5dc8d940a5927159f9d48d7b43a0c interp # Enter project directory $ cd interp # List repository $ ls formula.cpp script.txt xmake.lua # Build $ xmake build -P . checking for platform ... linux checking for architecture ... x86_64 [ 25%]: ccache compiling.release formula.cpp [ 50%]: linking.release formula [100%]: build ok!
Running:
Start interpreter:
$ rlwrap xmake run ============= Interpreter =============== ast = (SET z (+ (- (- (+ (- 20) (/ (* 8 10 ) 25 ) )) (* 3 (CALL random) ) ) (* (/ 100 (CALL sqrt (+ (+ (* 5 10 ) (* 3 (+ (- 20 7 ) 8 ) ) ) (^ 3 (* 2 3 ) ) )) ) 15 ) )) PPRINT => z = -(-20 + 8 * 10 / 25) - 3 * random() + 100 / sqrt(5 * 10 + 3 * ( 20 - 7 + 8 ) + 3 ^ ( 2 * 3 )) * 15 $> $> " String literal " = " String literal " $> :symbol_literal = :symbol_literal $> true // Boolean literal = true $> false // Boolean literal = false $> sin // Function object = <Function sin> $> type(sin) // Get type of object (result is a symbol) = :fun $> type(sin) == :fun = true $> nil // nil literal = nil $> 201.5E3 = 201500 $> 201.5e3 // Floating point number = 201500 $> .215e3 = 215 $> 10% // 10 percent = 0.1 $> 90.25 // 90.25 percent = 90.25 $> 10 + 3 * 5 * 2 ^ 2 ^ 3 - 5 * exp(4.2) ast = (DO (- (+ 10 (* (* 3 5 ) (^ 2 (^ 2 3 ) ) ) ) (* 5 (CALL exp 4.2) ) ) ) = 3516.57 $> -20 + 100 * 4 * 2 ^ 2 ^ 3 ast = (DO (+ (- 20) (* (* 100 4 ) (^ 2 (^ 2 3 ) ) ) ) ) = 102380 $> log10(-20 + 100 * 4 * 2 ^ 2 ^ 3) ast = (DO (CALL log10 (+ (- 20) (* (* 100 4 ) (^ 2 (^ 2 3 ) ) ) )) ) = 5.01022 $> ans * 5 / 100 ast = (DO (/ (* ans 5 ) 100 ) ) = 0.250511 $> ans ast = (DO ans ) = 0.250511 $> x = log10(-20 + 100 * 4 * 2 ^ 2 ^ 3) = 5.01022 $> y = 10 * x - 4 * log(x) = 43.6562 $> z = 4 * x + y = 63.6971
Playing with for-loops:
$> :block [INFO] Enter multi-line expression mode. Type (;;) when you are done. for i = 1 to 100 by 10 do print(" i = ", i, " sqrt(i) = ", sqrt(i)) end ;; ast = (DO (FOR i 1 100 10 (DO (CALL print " i = " i " sqrt(i) = " (CALL sqrt i)) )) ) i = 1 sqrt(i) = 1.000000 i = 11 sqrt(i) = 3.316625 i = 21 sqrt(i) = 4.582576 i = 31 sqrt(i) = 5.567764 i = 41 sqrt(i) = 6.403124 i = 51 sqrt(i) = 7.141428 i = 61 sqrt(i) = 7.810250 i = 71 sqrt(i) = 8.426150 i = 81 sqrt(i) = 9.000000 i = 91 sqrt(i) = 9.539392 = nil
Playing with while-loops:
$> :block [INFO] Enter multi-line expression mode. Type (;;) when you are done. k = 20; while k > 0 do print("\t count down => k = ", k); k = k - 4; end ;; ast = (DO (SET k 20) (WHILE (> k 0 ) (DO (CALL print " count down => k = " k) (SET k (- k 4 )) )) ) count down => k = 20 count down => k = 16 count down => k = 12 count down => k = 8 count down => k = 4 = nil
Playing with functions:
:show_ast // Define a short function (single line function) $> get_yield = fn (rate, nper) -> 100 * ((1 + rate) ^ nper - 1) // rate (interest rate) ; nper - number of periods ast = (DO (SET get_yield (FN ( rate nper ) (* 100 (- (^ (+ 1 rate ) nper ) 1 ) ))) ) = <Function > $> :hide_ast $> get_yield(0.25%, 12) // 0.25 per month => gets the total return per year = 3.0416 $> get_yield(1%, 12) = 12.6825 $> get_yield(5%, 12) = 79.5856 $> rate = ERROR - Unbound variable rate $> nper = ERROR - Unbound variable nper // ----- define a long function --------// $> :block [INFO] Enter multi-line expression mode. Type (;;) when you are done. fn myfunc(x, y, z) k = 3 * x + 5 * log(z) / exp(y) a = k + x + y + z print(" x = ", x) print(" y = ", y) print(" k = ", k, " ; a = ", a) k + a + x + y end ;; ast = (DO (FN myfunc ( x y z ) (SET k (+ (* 3 x ) (/ (* 5 (CALL log z) ) (CALL exp y) ) )) (SET a (+ (+ (+ k x ) y ) z )) (CALL print " x = " x) (CALL print " y = " y) (CALL print " k = " k " ; a = " a) (+ (+ (+ k a ) x ) y )) ) = <Function myfunc> $> myfunc(2, 5, 6) x = 2 y = 5 k = 6.060364 ; a = 19.060364 = 32.1207 $> result = myfunc(2, 5, 6) x = 2 y = 5 k = 6.060364 ; a = 19.060364 = 32.1207 $> result = 32.1207
Load a script file (script.txt)
$> :load /home/gmhf8z4/formula-parser/interp/script.txt [TRACE] Starting user script [INFO] REPL Version = 0.1 ------ Test IF-ELSE ------------- => x negative or zero => x negative or zero [TRACE] z = 200 => x = -10 => negative => x = 20 => positive => x = 0 => zero ---------- WHILE LOOP ----------- sq(i) = 16 sq(i) = 9 sq(i) = 4 sq(i) = 1 sq(i) = 0 sum = 30 sum = do_sum(5) = 30 ---------- FOR LOOP ----------- i = 10 ; i^2 = 100 i = 8 ; i^2 = 64 i = 6 ; i^2 = 36 i = 4 ; i^2 = 16 i = 2 ; i^2 = 4 ------- CLOSURES ---------------------- --- Testing function: add10 add10(10) = 20 add10(15) = 25 add10(20) = 30 add10(25) = 35 --- Testing function: add20 add20(10) = 30 add20(15) = 35 add20(20) = 40 add20(25) = 45 --- Testing function: add30 add30(10) = 40 add30(15) = 45 add30(20) = 50 add30(25) = 55 = nil $> sq(2) = 4 $> sq(6) = 36 $> x = -10 $> add10(20) = 30 $> add30(100) = 130
1.13 Operator Precedence Climbing Algorithm
Despite the simplicity of recursive-descent parser implementations, they are inefficient and cumbersome for parsing typical infix expressions found in mathematics and programming languages with lots operators, such as C, C++ or Python, since this type of parser require a grammar production rule and a recursive function call for every operator precedence level. In addition, changing or adding new operators with different associativity or precedence, requires a significant amount of changes in the recursive descent parser code. Parser algorithms including operator precedence climbing and Pratt parser are more suitable for dealing with infix expressions than pure recursive descent parser, since they don't need a recursive function call for every precedence level and allow new operators with custom precedence and associativity to be even be added at runtime or at compile-time without a huge amount of code changes. These precedence parsers, are not full replacement for conventional parsers, instead they are better used as a components of larger parsers.
Besides the complexity that nested infix operators with several precedence levels and associativity introduces in parser implementations, they can also be cumbersome for readers to reason about the code evaluation and lead to subtle bugs since only the precedence and associativity of math arithmetic and assignment operators are common, while other operators, including logical and bitwise, vary across programming languages. As a result, in order to understand a piece of code is evaluated in a particular programming language it is necessary to check the operator precedence and associativity table of this language.
Operator Properties
- Types of operators according with arity (number of operands):
- unary prefix operators (-, !, +]
- => Example: +100, -25, !false
- unary postfix operators
- => Example: factorial 5! = 120
- infix operators (*, +, /..]
- => Example: 125 * 3 / 5
- infix left associative
- => Example: 4 + 3 + 10 is evaluated as ((4 + 3] + 10]
- infix right associative
- => Example: 2 ^ 3 ^ 4 is evaluated as (2 ^ (3 ^ 4]]
- ternary operator
- => Example: w = x > z ? "greater" : "LT";
- unary prefix operators (-, !, +]
- Mneumonics of operator precedence listed from highest precendence
level to lowest precedence level:
- PEDMAS => Parenthesis, Exponents, Multiplication/Division, Addition/Subtraction
- BEDMAS => Bracket, Order, Division/Multiplication
- BODMAS => Bracket, Order, Division/Multiplication, Addition/Subtraction
- BIDMAS => Bracket, Indices, Division/Multiplication, Addition/Subtraction
- In general, assignment operators, including :=,
+=
and*=
, and exponentiation operator (^) or (**) in Fortran and Python, are right associative. So, a = b = c is evaluated as a = (b = c) and regarding exponentiation,x^y^z = x^(y^z)
. - Operator precedence confusion => case when there are (-] minus
prefix and (^] power/exponentiation.
- In mathematics and Python,
-a^b
is evaluated as -(a^b]
, so-1^2 = -1
- In Microsoft Excel and in C, unary operators always have higher
precendence levels than binary operators, so
-a^b
is evaluated as(-a]^b
, therefore-1^2
is evaluated to +1 as opposite of what most people familiar with mathematics expects. Other spreadsheet software copy the same bug that Excel keeps due to backward compatibility reasons.
- In mathematics and Python,
- Spaceship operator: <=> introduced in C++20 for implementing three-way comparison.
- x <=> y
=>
evaluates to 0 ifx == y
- x <=> y
=>
evaluates to -1 ifx < y
- x <=> y
=>
evaluates to +1 ifx > y
- x <=> y
- In some literature, multiplicaton can have a higher precedence
than division, so:
- x ÷ 5 * y is interpreted as x ÷ (5 * y)
- Operator precedence confusion => exponentiation associativity
- Exponentiation operator is left associative in Matlab
(MathWorks], Octave (Matlab clone] and Excel, so
4^3^2
is evaluated as (4 ^ 3) ^ 2 = (^ (^ 4 3) 2) 4096 - In mathematics and most programming languages, the
exponentiation operator is right associative, so
4^3^2
is evaluated as4^(3^2)
= (^ 4 (^ 3 2)) = 262144
- Exponentiation operator is left associative in Matlab
(MathWorks], Octave (Matlab clone] and Excel, so
Operator Classification
+---->> Unary Operator | Takes one argument +--- arity -->+ Example: -10, not x > y, !finished() / | operator-->+ +---->> Binary Operator | | Takes 2 arguments. | | Example: +, *, -, ^ | | | +---->> Ternary Operator => Example from C or C++: x > y ? 10 : 50 or lisp (if (> x y) 10 50) | | /----> Left associative: | | 20 * 5 * 3 is computed as ((20 * 5) * 3) or in Lisp notation: (* (* 20 5) 3) +---->> associativity + | | | \----> Right associative (exponentiaion and operators 'and' and 'or' in Python) | Example: 2 ^ 3 ^ 4 = (2 ^ (3 ^ 4)) = (^ 2 (^ 3 4)) | | | +--> Logical | / | / +---->> types +----> Arithmetic | \ | \----> Bitwise operators => << (left shift in C or C++), >> (right shift in C or C++)
See:
- Pratt Parsing and Precedence Climbing Are the Same Algorithm
- Parsing Expressions by Recursive Descent
- Operator-precedence parser
- Polish notation
- Operator Precedence: We can do better
- Operator Precedence Considered Harmful
- The Math Equation That Tried to Stump the Internet - The New York Times
- Languages without operator precedence
- The Role of Operators in APL - Kenneth E. Iverson
- Note: APL, just like Smalltalk programming language, does not have operator precedence, all operators have the same precedence level. In order to specify a particular order of evaluation, the user must use parentheis.
- Smalltalk Message Rules
- Operator associativity
- Order of Operations
- Operator Precedence in JavaScript
- C# operators and expressions (C# reference)
- C++ built-in operators, precedence, and associativity
- https://rosettacode.org/wiki/Operator_precedence
- Infix Functions In Haskell (Operator Precedence in Haskell)
- operator precedence in Haskell - stack overflow
- r/haskell - Confusion between (.) and ($) operators (Operator Precedence in Haskell)
- Better operator precedence
- https://en.wikipedia.org/wiki/Null_coalescing_operator
- Kotlin - The Elvis Operator (?:) - null coalescing operator
- Kotlin - Null Safety
- Null coalescing operators - ?? and ??= operators (C# reference)
Precedence Climbing Parser Algorithm Pseudocode
Pseudocode of a precedence climbing parsing algorithm, based on (Norvell, 1999), but modified for parsing infix expressions with function calls, such as 10 + 3 * x / sqrt(x + y) or atan2(3 * x + y, x).
// Token Types // //------------------------------------------------// // Delimiters T_LPAR "(" // left parenthesis T_RPAR ")" // right parenthesis T_DOT "." // Dot - found in dotted pair '(a . b) T_QUOTE "'" // quote (') // Literal token types (terminal elements of grammar) T_NUM // number T_STR // string literal "something else" T_IDENT // identifier T_NIL // nil => null element and empty list T_BOOL // boolean #t (true) or #f (false) // Sentinel Token Types T_EOF // End Of File T_ERR // Error T_NDF // Non Defined - non initialized token // -------------------------------------------------------// // --------- Abstract Syntax Tree -------- Ast = AstInteger(x: int) | AstVariable(x: string) | AstBoolean(x: bool) // Binary operator | AstBinop(operator: string, lhs: Ast, rhs: Ast) // Unary operator | AstUnop // Function call | AstCall(ident: string, args: List[Ast]) // Those tokens come from the tokenizer which skips blank characters, white spaces and new lines, and // breaks the source code into terminal elements of the grammar. tokens := [ Token(T_LPAR, "(", Token(T_NUM, "-1245.13e3"), Token(T_SYM, "x1") ,Token(T_STR, "hello world"), Token(T_NIL, "nil"), Token(T_BOOL, "#f") ,Token(T_BOOL, "#t"), Token(T_DOT, "."), Token(T_RPAR, ")") ] // ------- Global Variables ==> They should be contained in an object. // Current position of cursors index := 0 // Current token current := Token(T_NDF, "") // Advance to next token FUNCTION advance(): void IF index >= len(tokens) THEN RETURN Token(T_EOF, "") END current := tokens[index] index := index + 1 END // Check whether the current token type has the same type as the argument FUNCTION check(token_type): bool RETURN current.type == token_type END // Returns true if the current token matches the expected type and consume it. FUNCTION match(token_type): bool IF current.type == token_type THEN advance() RETURN true END RETURN false END // If the current token does not match the expected type, raise an error. FUNCTION expect(token_type) IF current.type == token_type advance() ELSE error("Expected token ", token_type, ", but given ", current) END END // Return true if the current token is EOF (End Of File) FUNCTION is_eof(): bool RETURN current.type == EOF END //--------------------------------------------------------// // Parse infix expression (parse entrypoint) FUNCTION parseExpr() expr := parseBinary(0) RETURN expr END // Parse infix expression with binary operator // such as '+', '/' or '&&' and FUNCTION parseBinary(precedence) // left hand-side of infix expression lhs := parsePrimary() WHILE operator := peek(); operator.isBinary() AND operator.precedence >= precedence DO oper := operator advance() prec := 0 IF oper.isRightAssociative() THEN prec := oper.precedence IF oper.isLeftAssociative() THEN prec := 1 + oper.precedence // right hand side of infix expression rhs := parseBinary(prec) // oper.lexeme => string representation of operator such as "+" or "*" lhs := AstBinop( oper.lexeme, lhs, rhs ) END RETURN lhs END FUNCTION parsePrimary() // Get current token tok := peek() // Unary operator including: '+' in +10, or - in -20 IF tok.isUnaryOperator() THEN op := token advance() rhs := parseBinary(op.precedence) result := AstUnop(op.lexeme, rhs) RETURN rhs ELSE IF check(T_LPAR) THEN advance() expr := parseBinary(0) RETURN expr ELSE IF check(T_INT) expr := parseInt(tok.lexeme) RETURN AstInt(expr) // Function call such as hypothn(x, y) ELSE IF T_IDEN AND lookAheadToken(1) == T_LPAR THEN expr := parseFunction() REUTNR expr ELSE IF T_IDEN expr := AstVariable(tok.lexeme) advance() RETURN expr ELSE ERROR("Invalid token") END FUNCTION parseFunction() functioName := peek().lexeme advance() expect(T_LPAR) IF match(T_RPAR) THEN expr := AstCall(functionName, []) ELSE // Parse first argument x := parseExpr() // List or arguments args := [ arg ] // Iterate while there is a comma WHILE match(T_COMMA) DO // Parse remaining arguments x := parseExpr() lst.append(x) END // Expect right parenthesis expect(T_RPAR) // Build AST - Abstract Syntax Tree Node expr := AstCall(functionName, lst) RETURN expr END END THEN //---------- Program entry point - main() --------// // FUNCTION main(): void // Lexical analysis, break text/source into tokens tokens := tokenize(source_code) // Initialize parser advance() // parser infix expression ast := parseExpr() // Display result print(" Infix Expression AST: ", ast) END
Example - Precedence Clibing Algorithm in F# (FSharp) language
File: parser.fsx
/// Position within a source file type Position = { /// Offset - position of cursor as the number of characters /// from beginning of the file until current cursor position /// minus one. The offset starts from zero. Offset: int /// Line where a token or ast node is found within a file. (Starts from one)" Line: int /// Column where a token or ast node is found within a file. (Starts from one)" Column: int } /// Ast metadata type Location = { File: string Start: Position End: Position } with /// Null location for default intialization static member EmptyLocation = let pos: Position = { Offset = -1; Line = -1; Column = -1} let loc: Location = { File = ""; Start = pos; End = pos } loc /// Merge two locations returning a new one return a new location /// where location.Start = a.Start and location.end = b.End let mergeLocation (a: Location) (b: Location) = let result: Location = { File = a.File; Start = a.Start; End = b.End } result type TokenKind = | TInt // Integer number | TFlt // Floating point number | TStr // String | TBool // Boolean | TEql // '==' Equal | TAsn // ':=' Assignment | TIden // Identifier => variable name or function name | TOper // Operator: '+', '==', '-', '+' .... | TKey // Keyword | TLpar // '(' Left parenthesis | TRpar // ')' Right parenthesis | TComm // ',' Comma | TSemicolon // ';' | TComment // Any line starting with double forward slash // .... | TNL // New line | TEof // End of file | TErr // Tokenizer error | TEmpty // Used for initializing a dummy empty token member this.IsErr() = this = TErr member this.IsEof() = this = TEof type Token = { Kind: TokenKind Lexeme: string Location: Location } with /// Null token or default intialization static member EmptyToken = let pos: Position = { Offset = -1; Line = -1; Column = -1} let loc: Location = { File = ""; Start = pos; End = pos } let tok: Token = { Kind = TokenKind.TEmpty; Lexeme = ""; Location = loc } tok let tokenOffset (tok: Token) = (tok.Location.Start.Offset, tok.Location.End.Offset) /// Abstract Syntax tree nodes /// @tag: expression ast type ast = /// Integer literal AST node | AstInt of Loc: Location * Val: int /// Floating point literal AST node | AstFlt of Loc: Location * Val: float /// Variable AST node | AstVar of Loc: Location * Val: string /// Unary operator AST node | AstUnop of Loc: Location * Opr: string * Rhs: ast /// Binary operator AST node | AstBinop of Loc: Location * Opr: string * Lhs: ast * Rhs: ast /// Functoon call AST node | AstCall of Loc: Location * Ident: string * Args: ast list /// Source code comments | AstComment of Loc: Location * Content: string | AstProg of Loc: Location * Statements: ast list /// Get source location of AST node member this.Location(): Location = match this with | AstInt (Loc = loc) -> loc | AstFlt (Loc = loc) -> loc | AstVar (Loc = loc) -> loc | AstUnop (Loc = loc) -> loc | AstBinop(Loc = loc) -> loc | AstCall (Loc = loc) -> loc | AstComment (Loc = loc) -> loc | AstProg (Loc = loc) -> loc member this.ToInt(): int = match this with | AstInt(Val = value) -> value | _ -> failwith "This type cannot be casted to integer" member this.ToFloat(): float = match this with | AstFlt(Val = value) -> value | _ -> failwith "This type cannot be casted to floating point." /// Convert variable to string identifier /// This is a convenience method for quickly accessing the string representation of a variable member this.ToId: string = match this with | AstVar(Val = value) -> value | _ -> failwith "This type cannot be casted to string." member this.IsVar() = match this with | AstVar(_, _) -> true | _ -> false member this.IsInt() = match this with | AstInt _ -> true | _ -> false member this.IsFlt() = match this with | AstFlt _ -> true | _ -> false member this.IsUnop() = match this with | AstUnop _ -> true | _ -> false member this.IsBinop() = match this with | AstBinop _ -> true | _ -> false member this.IsComment() = match this with | AstBinop _ -> true | _ -> false member this.ToStr() = match this with | AstInt (Val = value) -> value.ToString() | AstFlt (Val = value) -> value.ToString() | AstVar (Val = value) -> value.ToString() | AstUnop (Opr = opr; Rhs = rhs) -> sprintf "(%s %s)" opr (rhs.ToStr()) | AstBinop (Opr = opr; Lhs = lhs; Rhs = rhs) -> sprintf "(%s %s %s)" opr (lhs.ToStr()) (rhs.ToStr()) | AstCall ( Ident = func; Args = args) -> let f (x: ast) = x.ToStr() let xs = List.map f args let st = System.String.Join(" ", xs) let res = sprintf "(CALL %s %s)" func st res | AstProg(Statements = stats) -> let f (x: ast) = x.ToStr() let xs = List.map f stats let st = "\n " + System.String.Join("\n ", xs) let res = "(Program " + st + "\n)" res | AstComment(Content = content) -> sprintf "(comment \"%s\")" content | _ -> failwith "Error: not implemented for %A" this /// Get string representation of AST let astToString (x: ast) = x.ToStr() type Tokenizer(source: string) as self = let mutable offset: int = 0 let mutable line: int = 1 let mutable column: int = 1 let mutable start_offset: int = 1 let mutable start_line: int = 1 let mutable start_column: int = 1 let mutable current: char = '0' let file = "" let nullChar = '\000' do self.Advance() member this.Peek(): char = current member this.Lookahed(n: int): char = if offset >= source.Length then nullChar else source[n + offset - 1] member private this.SetMark() = start_offset <- offset start_line <- line start_column <- column member private this.Advance() = if offset >= source.Length then current <- nullChar else current <- source[offset] column <- column + 1 if current = '\n' then line <- line + 1 column <- 1 offset <- offset + 1 member this.IsEof() = current = nullChar member private this.Check(c: char) = c = current member private this.CheckNext(c: char) = c = this.Lookahed 1 member private this.Match(c: char) = if c = current then this.Advance(); true else false /// Return a token with metadata member private this.Token(kind: TokenKind, lexeme: string) = let _start: Position = { Offset = start_offset - 1 ; Line = start_line; Column = start_column - 1} let _end: Position = { Offset = offset - 2; Line = line ; Column = column - 2} let t = if lexeme.Length = 1 then _start else _end let loc: Location = { File = file ; Start = _start ; End = t } let tok: Token = { Kind = kind ; Lexeme = lexeme ; Location = loc } tok member private this.Error(message: string) = this.Token(TokenKind.TErr, message) member private this.Expect(c: char) = if c = current then this.Advance() this.Token(TokenKind.TEmpty, "") else this.Error <| sprintf "Expected character '%c', but given '%c'" c current member private this.IsSpace() = current = ' ' || current = '\t' || current = '\n' || current = '\r' member private this.IsDigit() = System.Char.IsDigit(current) member private this.IsLetter() = System.Char.IsLetter(current) member private this.IsAlphaNum() = System.Char.IsLetterOrDigit(current) member this.Tokens(): Token list = let mutable tokens: Token list = [] while not <| this.IsEof() do let tok = this.Next() tokens <- tokens @ [ tok ] tokens /// Return next token in the input string member this.Next(): Token = while this.IsSpace() && not <| this.IsEof() do this.Advance() if this.IsEof() then this.Token(TEof, "") else this.SetMark() if this.Check '/' && this.CheckNext '/' then let sb = System.Text.StringBuilder() this.Advance() this.Advance() while not <| self.Check('\n') && not <| this.IsEof() do let ch = this.Peek() sb.Append(ch) this.Advance() let tok = this.Token(TComment, sb.ToString()) ignore <| this.Match '\n' tok elif this.Match '(' then this.Token(TLpar, "(") elif this.Match ')' then this.Token(TRpar, ")") elif this.Match ',' then this.Token(TComm, ",") elif this.Match ';' then this.Token(TSemicolon, ";") elif this.Match '+' then this.Token(TOper, "+") elif this.Match '-' then this.Token(TOper, "-") elif this.Match '*' then this.Token(TOper, "*") elif this.Match '/' then this.Token(TOper, "/") elif this.Match '^' then this.Token(TOper, "^") elif this.Match '=' then if this.Match '=' then this.Token(TOper, "==") else this.Token(TOper, "=") elif this.IsDigit() then this.NextNumber() elif this.IsLetter() then this.NextIdentifier() else this.Token(TErr, sprintf "Invalid character '%c'" (this.Peek()) ) member private this.NextIdentifier() = let sb = System.Text.StringBuilder() while this.IsAlphaNum() && not <| this.IsEof() do ignore <| sb.Append(current) this.Advance() let lexeme = sb.ToString() let result = this.Token(TIden, lexeme) result member private this.NextNumber(): Token = this.SetMark() let sb = System.Text.StringBuilder() // if this.Match '-' then ignore <| sb.Append("-"); () if not <| this.Check '.' then ignore (sb.Append( this.NextInteger() )) if not <| this.Check '.' && not <| this.Check 'e' && not <| this.Check 'E' then if this.IsLetter() then this.Error("Number cannot terminate with letter") else this.Token(TInt, sb.ToString()) else if this.Match '.' then ignore <| sb.Append( "." + this.NextInteger() ) let letter = this.Peek() if this.Match 'e' || this.Match 'E' then ignore <| sb.Append(letter) if not <| this.Match '-' && not <| this.IsDigit() then this.Error("Invalid floating point number") else if this.Match '-' then ignore <| sb.Append('-') if not <| this.IsDigit() then this.Error("Invalid floating point number, expected digit") else ignore <| sb.Append( this.NextInteger() ) this.Token(TInt, sb.ToString()) elif this.IsLetter() then this.Error("Floating point numbers cannot end with letter") else this.Token(TFlt, sb.ToString()) member private this.NextInteger(): string = let sb = System.Text.StringBuilder() while this.IsDigit() && not <| this.IsEof() do ignore <| sb.Append(current) this.Advance() let lexeme = sb.ToString() lexeme // let result = this.Token(TInt, lexeme) // result type Arity = ArityUnary | ArityBinary type Assoc = AssocLeft | AssocRight type Operator = { Lexeme: string Precedence: int Assoc: Assoc Arity: Arity } /// Parser component that uses precedence climbing algorithm for parsing infix expressions type ExprParser(source: string) as self = let tokenizer = Tokenizer source let tokens = tokenizer.Tokens() let mutable current = Token.EmptyToken let mutable pos = 0 let mutable operators: Operator list = [] /// List of tuples of predicate, that checks whether a current token matches the expression /// to be parsed, and a function that parses the primary expression let mutable primaryExprList: ( (unit -> bool) * (unit -> ast) ) list = [] let mutable skipComment = false do // self.Advance() // Fill database of binary and unary operators. // Any client code can add new operators later // ------- Unary Operator -------------- // self.AddUnaryOperator "-" 90 self.AddUnaryOperator "+" 90 // ------- Binary Operator ---------------// // Addition and subtraction self.AddBinaryOperator "+" 40 AssocLeft self.AddBinaryOperator "-" 40 AssocLeft // Multiplication and division self.AddBinaryOperator "*" 50 AssocLeft self.AddBinaryOperator "/" 50 AssocLeft // Power/exponentiation self.AddBinaryOperator "^" 70 AssocRight // Equality operator self.AddBinaryOperator "==" 20 AssocLeft // Assignment operators self.AddBinaryOperator "=" 10 AssocRight self.AddBinaryOperator "+=" 10 AssocRight self.AddBinaryOperator "-=" 10 AssocRight self.AddBinaryOperator "*=" 10 AssocRight self.AddBinaryOperator "/=" 10 AssocRight // -----------------------------------------// /// Parse Expression (Entry Point) member this.ParseExpr() = let expr = this.ParseBinary 0 expr /// Intialize parser state member this.Initialize() = this.Advance() member this.AddPrimaryExpr(predicate: unit -> bool) (parser: unit -> ast) = primaryExprList <- primaryExprList @ [(predicate, parser) ] /// Add binary operator to the database member this.AddBinaryOperator (lexeme: string) (precedence: int) (assoc: Assoc) = let op: Operator = { Lexeme = lexeme ; Precedence = precedence ; Assoc = assoc ; Arity = ArityBinary } operators <- operators @ [op] /// Add unary operator to the database member this.AddUnaryOperator (lexeme: string) (precedence: int) = let op: Operator = { Lexeme = lexeme ; Precedence = precedence ; Assoc = AssocLeft ; Arity = ArityUnary } operators <- operators @ [op] /// Get current token member this.Peek() = current /// Get lexeme (string representation) of current token member this.Lexeme() = current.Lexeme /// Get location of current token member this.Location() = current.Location /// Indicates whether current token is EOF token, /// indicating that there is no more tokens. /// EOF - End of FIle member this.IsEof() = this.Check TEof /// Advance to the next token member this.Advance() = if pos < tokens.Length then current <- tokens[pos] pos <- pos + 1 // Skip comment tokens // if current.Kind = TComment && skipComment then this.Advance() /// Check whether current token is comment member this.IsComment(): bool = this.Check TComment member this.SkipComment() = while this.Match TComment do this.Advance() /// Make AST node containing the current comment member this.ParseComment() = if not <| this.Check TComment then failwith "Expected comment token" else let node = AstComment(this.Location(), this.Lexeme()) this.Advance() node /// Loke n tokens ahead of current token without consuming them member this.Lookahead(n: int): Token = if pos + n > tokens.Length then current else tokens[pos + n - 1] /// If the current token is the same as the argument, consume it and return true. member this.Match(kind: TokenKind) = if kind = this.Peek().Kind then this.Advance() true else false /// Return true if the current token is the same as the argument member this.Check(kind: TokenKind) = kind = this.Peek().Kind /// Return true if current token can be a binary operator member this.IsBinaryOperator() = let lexeme = this.Lexeme() let result= List.exists (fun x -> lexeme = x.Lexeme && x.Arity = ArityBinary ) operators result // Return true if current token can be a unary operator member this.IsUnaryOperator() = let lexeme = this.Lexeme() let result = List.exists (fun x -> lexeme = x.Lexeme && x.Arity = ArityUnary ) operators result member this.Operator(operator: string, arity: Arity) = let op = List.tryFind (fun x -> operator = x.Lexeme && x.Arity = arity ) operators if op.IsNone then failwith <| sprintf "Invalid or unregistered operator %s" operator else op.Value member this.Operator(arity: Arity) = let lexeme = this.Lexeme() let op = List.tryFind (fun x -> lexeme = x.Lexeme && x.Arity = arity ) operators if op.IsNone then failwith <| sprintf "Operator not found %s" lexeme else op.Value member this.Expect(kind: TokenKind) = if this.Peek().Kind = kind then this.Advance() else let t = this.Peek().Kind let lexeme = this.Lexeme() let loc = this.Location() let s = sprintf "Token(%A, %s)" t lexeme failwith <| sprintf "Expected token of type %A at line %d and column %d, but given %s" kind loc.Start.Column loc.Start.Line s /// Parse an expression with binary operator member private this.ParseBinary(precedence: int): ast = let mutable lhs: ast = this.ParsePrimary() // printfn " [TRACE] ParseBinary() token = %A ; lhs = %A " (this.Lexeme()) lhs while this.IsBinaryOperator() && this.Operator(ArityBinary).Precedence >= precedence do // Get current operator let tok = this.Peek() let op = this.Operator(ArityBinary) // printfn " [TRACE] Enter while loop () => op = %A " op this.Advance() let mutable prec = 0 if op.Assoc = AssocRight then prec <- op.Precedence if op.Assoc = AssocLeft then prec <- 1 + op.Precedence let rhs = this.ParseBinary(prec) let loc = mergeLocation (lhs.Location()) (rhs.Location()) lhs <- AstBinop(loc, tok.Lexeme, lhs, rhs) lhs /// Parse primary expression: Unary operator expression, expression between parenthesis /// literal expressions (numbers, strings, boolean, null), function call member private this.ParsePrimary(): ast = let tok = this.Peek() let loc = tok.Location // Parse expression with unary operator if this.IsUnaryOperator() then let op: Operator = this.Operator(ArityUnary) // printfn "\n [TRACE] <A> Before => Unary operator expr = %A " <| this.Peek() this.Advance() let rhs = this.ParseBinary(op.Precedence) let expr = AstUnop( mergeLocation tok.Location (rhs.Location()), tok.Lexeme, rhs) // printfn "\n [TRACE] <A> After => Unary operator expr = %A " <| this.Peek() expr // Parse expression between parenthesis elif this.Check TokenKind.TLpar then this.Advance() let expr = this.ParseBinary 0 this.Expect TokenKind.TRpar expr // Parse integer elif this.Check TokenKind.TInt then let num = int tok.Lexeme // printfn "\n [TRACE] Before => Unary operator expr = %A " <| this.Lexeme() this.Advance() let expr = AstInt(loc, num) // printfn "\n [TRACE] After => Unary operator expr = %A " <| this.Lexeme() expr // Parse floating point number elif this.Check TokenKind.TFlt then let num = float tok.Lexeme this.Advance() let expr = AstFlt(loc, num) expr // Parse identifier or function call elif this.Check TokenKind.TIden then let result = if this.Lookahead(1).Kind <> TLpar then // parse variable let expr = AstVar(loc, tok.Lexeme) this.Advance() expr else // Parse function call this.ParseCall() result else let p = List.tryFind (fun (pred, _) -> pred() ) primaryExprList match p with | None -> failwith <| sprintf "Error: invalid token %A" tok | Some (pred, parser) -> let result = parser() result /// Parse function call with the following EBNF schema /// call: IDENTIFIER "(" expr ("," expr)* ")" /// call: IDENTIFIER LPAR expr (COMMA expr)* RPAR /// /// So this function parses func(), func(arg0), func(arg0, arg1, ..., argN-1) member private this.ParseCall(): ast = let loc = this.Location() let func = this.Lexeme() this.Advance() this.Expect(TLpar) if this.Match TRpar then printfn " [TRACE] Call functiion with no arguments." let result = AstCall(loc, func, []) result else // printfn " [TRACE] Call function with arguments" // Parse first function argument let a = this.ParseExpr() let mutable args: ast list = [ a ] while this.Match TComm do let a = self.ParseExpr() args <- args @ [a] // printf " [TRACE] args = %A " args this.Expect TRpar let result = AstCall(loc, func, args) result type RecursiveParser(input: string) as self = inherit ExprParser(input) do self.Initialize() member this.Parse() = this.ParseProgram() member private this.ParseProgram() = let mutable statements: ast list = [] let startLoc = this.Location() while not <| this.IsEof() do let expr = if this.IsComment() then this.ParseComment() else let result = this.ParseExpr() this.Expect TSemicolon result // printfn " [TRACE] (621) expr = %A " expr statements <- statements @ [expr] let endLoc = this.Location() let loc = mergeLocation startLoc endLoc let prog = AstProg(loc, statements) prog /// Convenient function for encapsulating the parser let parseExpr (input: string): ast = let p = ExprParser input p.Initialize() let expr = p.ParseExpr() expr // let mutable current = To /// Limited Non turing-complete calculator interpreter let rec eval (expr: ast): float = match expr with | AstInt (Val = x) -> float x | AstFlt (Val = x) -> x | AstVar (Val = x) -> failwith "Not supported" | AstUnop (Opr = opr; Rhs = rhs) -> let e = eval rhs match opr with | "-" -> -e | "+" -> +e | AstBinop(Opr = opr; Lhs = lhs; Rhs = rhs) -> let a = eval lhs let b = eval rhs match opr with | "+" -> a + b | "-" -> a - b | "*" -> a * b | "/" -> a / b | "^" -> a ** b | _ -> failwith "Invalid operator" | AstCall(Ident = func; Args = args) -> if args.Length <> 1 then failwith "Function with invalid number of arguments" else let e = eval args[0] match func with | "exp" -> System.Math.Exp(e) | "log" -> System.Math.Log(e) | "log10" -> System.Math.Log10 e | "sqrt" -> System.Math.Sqrt e | "sign" -> System.Math.Sign e | _ -> failwith "Invalid function" | _ -> failwith "Not supported" let repl() = let mutable Break = false printfn " -------START REPL ----------" while not Break do printf " $> " let input = System.Console.ReadLine().Trim() if input <> "" then let expr = parseExpr input let result = eval expr printfn " => AST = %s " <| expr.ToStr() printfn " => result = %f " result else printfn ""
Running:
$ >> dotnet fsi --use:parser.fsx Microsoft (R) F# Interactive version 12.0.0.0 for F# 6.0 Copyright (c) Microsoft Corporation. All Rights Reserved. For help type #help;; >> let e = parseExpr "20 + hypoth(x^2 + 3 * x / z ^ 2, y, z * 2) / k" ;; >> e.ToStr() ;; val it: string = "(+ 20 (/ (CALL hypoth (+ (^ x 2) (/ (* 3 x) (^ z 2))) y (* z 2)) k))" >> let e = parseExpr "4 + 3 * log(10 + 3 * sqrt(5 * 2 ^ 3 ^ 3 / 100))" ;; >> e.ToStr() ;; val it: string = "(+ 4 (* 3 (CALL log (+ 10 (* 3 (CALL sqrt (/ (* 5 (^ 2 (^ 3 3))) 100)))))))" >> eval e ;; val it: float = 30.87855699 >> let e = parseExpr "3 * x + 20 * z ^ x ^ 2" ;; val e: ast = AstBinop ({ File = "" Start = { Offset = 0 Line = 1 Column = 1 } End = { Offset = 21 Line = 1 Column = 22 } }, "+", AstBinop ({ File = "" Start = { Offset = 0 Line = 1 Column = 1 } End = { Offset = 4 Line = 1 Column = 5 } }, "*", AstInt ({ File = "" Start = { Offset = 0 Line = 1 Column = 1 } End = { Offset = 0 Line = 1 Column = 1 } }, 3), AstVar ({ File = "" Start = { Offset = 4 Line = 1 Column = 5 } End = { Offset = 4 Line = 1 Column = 5 } }, "x")), AstBinop ({ File = "" Start = { Offset = 8 Line = 1 Column = 9 } End = { Offset = 21 Line = 1 Column = 22 } }, "*", AstInt ({ File = "" Start = { Offset = 8 Line = 1 Column = 9 } End = { Offset = 9 Line = 1 Column = 10 } }, 20), AstBinop ({ File = "" Start = { Offset = 13 Line = 1 Column = 14 } End = { Offset = 21 Line = 1 Column = 22 } }, "^", AstVar ({ File = "" Start = { Offset = 13 Line = 1 Column = 14 } End = { Offset = 13 Line = 1 Column = 14 } }, "z"), AstBinop ({ File = "" Start = { Offset = 17 Line = 1 Column = 18 } End = { Offset = 21 Line = 1 Column = 22 } }, "^", AstVar ({ File = "" Start = { Offset = 17 Line = 1 Column = 18 } End = { Offset = 17 Line = 1 Column = 18 } }, "x"), AstInt ({ File = "" Start = { Offset = 21 Line = 1 Column = 22 } End = { Offset = 21 Line = 1 Column = 22 } }, 2)))))
Test the Recursive descent parser that uses an embedded operator precedence climbing parser.
let s = " // Comment: compute the value of x x = 3 * 4 + 5; // compute the value z y = 2 * z + x / 10; // Compute the last dependency z z = - y + sqrt(- x^2 + y^2 / 5 + 20) * z ^ 2 + log10(x^2 + 3 * x); " let p = RecursiveParser s ;; >> p ;; val it: RecursiveParser = FSI_0002+RecursiveParser // Get the AST Abstract Syntax Tree from the sample source code. >> let ast = p.Parse() ;; val ast: ast = AstProg ({ File = "" Start = { Offset = 12 Line = 2 Column = 10 } End = { Offset = 269 Line = 8 Column = 7 } }, [AstComment ({ File = "" Start = { Offset = 12 Line = 2 Column = 10 } End = { Offset = 46 Line = 3 Column = -1 } }, " Comment: compute the value of x "); AstBinop ({ File = "" Start = { Offset = 57 Line = 3 Column = 10 } End = { Offset = 69 ... ... ... ... ... .. ... ... ... ... ... .. ... ... ... ... ... .. >> ast.Location() ;; val it: Location = { File = "" Start = { Offset = 12 Line = 2 Column = 10 } End = { Offset = 269 Line = 8 Column = 7 } } // View the AST in a more terse/concise way by using Lisp or S-Expression notation >> ast.ToStr() ;; val it: string = "(Program (comment " Comment: compute the value of x ") (= x (+ (* 3 4) 5)) (comment " compute the value z ") (= y (+ (* 2 z) (/ x 10))) (comment " Compute the last dependency z") (= z (+ (+ (- y) (* (CALL sqrt (+ (+ (^ (- x) 2) (/ (^ y 2) 5)) 20)) (^ z 2))) (CALL log10 (+ (^ x 2) (* 3 x))))) )"