The KaShell Programming Language

KaShell is an evolving design for a programming language with a compact syntax similar to shell and friendly for interactive use, and with semantics similar to optionally-typed languages like Scheme.

The prototype is a variant of Kawa. To try it, install Kawa, and then start up kawa with the --kashell option.

KaShell was previously known as Q2. There is some old documentation/ideas here.

Basic syntax

Whitespace and indentation are signficant.

Commands are similar to shell syntax: A simple command has the form of a command followed by the argument expressions, separated by spaces:

expt 2 3

This calls the expt function with the given arguments. Parentheses are not needed, except for grouping:

expt 2 (sqrt 9)

Such a command is an example of a phrase. The function name and each argument is a word.

word ::= identifier | literal | ....
   | ( phrase )
phrase ::= word*

Phrases can be separated by newline or semicolons.

A procedure call is a phrase whose first word evaluates to a procedure value. (It can be a single-word phrase, if there are no arguments.)

A syntactic form is phrase whose first word is a predefined syntactic keyword or an in-scope macro.

Identifiers

An identifier is used to name things in a program. The allowed characters in an identifier is bigger that in most programming languages and roughly follows Scheme.

There are no reserved identifiers, though there are syntactic keywords predefined in the default scope.

The recommend style for multi-part names is to use hyphens between the parts: array-rank.

There will be some syntax to include otherwise-disallowed characters in an identifier. This has not been decided or implemented but I’m leaning towards backslash followed by a string template. For example \{1.5} would be an identifier (with the 3 characters "1", ".", and "5") rather than a number.

Compound identifiers have two parts, separated by a colon (and no whitespace). The first part is a namespace (an identifier), and the second part is a name within that namespace.

Indentation

Indentation is significant:

foo 1 2 3
   bar 4 5
       3 + 3
   baz 10 11

is equivalent to:

foo 1 2 3 (bar 4 5 (3 + 3)) (baz 10 11)

Comments

A hash-sign # followed by at least one space comments out the rest of the line.

A hash-sign followed by an exclamation point #! is also a comment.

Syntax for nestable comments hasn’t been decided yet. Candidates include #[comment#] or #[comment]# or plain #[comment‘.

Numbers

KaShell implements the Kawa Scheme “numeric tower”, including exact integers and rationals, floating-point reals, and complex numbers. (Syntax of literals may change slightly from Kawa Scheme.) Quaternions are also supported.

A general radix can be specified:

radixrradix-digits

For example:

16rFFFF
2r110011

We may add exact decimal numbers, possibly with repeating fractional part. These are mathematically equivalent to exact rationals, but are typically easier to read and write.

Quantities are a product of a real number and a unit. For example: 3cm + 2in evaluates to 8.08cm (the second quantity is converted to the unit of the first). A designed extension will be able to do unit-checking at compile-time based on this design.

Arithmetic

The usual infix and operator precedence rules apply. For example, the following evaluates as expected to 22:

10 + 2 * 6

Note that spaces are (generally) required.

However, note that infix operators like + are not reserved syntax. They are predefined syntatic keywords (with associated precedence information), and there will be a way to add or replace operators.

Variables and definitions

All variables must be defined before using them, to catch typos. However, the syntax to define a variable is quite compact - you just need to add ^ after the variable:

twenty^ = 10 + 5 + 5

Initially, we will restrict the left side to be a pattern, and the right side to be an expression:

pattern = expression

You can do simple pattern matching:

[x^ y^] = [3 4]

(In the future, the = operator may be extended to bi-directonal unification.)

Variables defined using = are write-one “logic” variables, and so they may not be re-assigned (though this is not currently enforced). Their scope is the entire current block (or function). Lexical override is not allowed - the can be only a single definition in any scope.

You can declare regular mutable variables with the := operator, but with pattern restricted to a single identifier with an optional type:

identifier^ := expression
identifier^type := expression

For example:

counter^ := 0
counter := counter + 1

Logic programming [possible future]

Check out Picat and Alice.

Also check out Kanren/MiniKanren/cKanren.

Optional type specifiers [not working yet]

You can add an optional type specifier after the ‘^‘ in a definition:

pi^float = 3.14

Patterns

(Not yet implemented.)

Patterns are conceptually similar to Kawa, but with a different syntax. The most noticable differences is that ‘^’ is used to separate a variable name from it type-specifiers, and that a plain identifier is not a valid pattern - it must be followed by a ‘^’.

A pattern is one of:

identifier^

This is the simplest and most common form of pattern. The identifier is bound to a new variable that is initialized to the incoming value.

The ^ must be followed by a space or a closing delimiter (such as a right bracket).

_

This pattern just discards the incoming value. It is equivalent to a unique otherwise-unused identifier.

identifier ^ type
pattern ^ type

The incoming value is coerced to a value of the specified type, and then the coerced value is matched against the sub-pattern, or bound to the identifier.

No spaces are allowed on either side of the ^.

pattern-literal

Matches if the value is equal? to the pattern-literal.

Functions

lambda-form ::= (| parameter-list |) phrase
fn name lambda-form*

Conditional operator

The ?> is syntatically an infix operator but it integrates with the phrase-parsing to provide a ternary if-the-else operator:

(3 > 4 ?> "it is true"; "if is false")

or:

x > 0 ?>
   display x
   display " is positive"
   newline
x < 0 ?>
   display x
   display " is negative"
   newline
display x
display " is zero"
newline

[This is a hack that needs further thought and specification.]

Vectors and arrays

Use square brackets to construct (immutable) vectors:

[3 (2 + 2) 5]

A vector is a function from an integer to an element.

[3 4 5] 2

evalutes to 5.

You can use a vector index to select elements:

[10 11 12] [2 1]

evaluates to [12 11].

There is support for multi-dimensional arrays but specifics (such as syntax and operator names) have not been decided.

Strings

A string is an immutable sequence (vector) of characters (Unicode code points). You can index it (like a vector) to get a character.

(Not yet implemented: A character is also a string of length 1, so "X" 0 yields the same "X". This removes the need for distinct character literal syntax.)

There are two kinds of string literals - using delimited by traditional double-quotes, or by braces:

qstring ::= "qstring-element*"
bstring ::= &{ bstring-element*}

Double quoted string literals

A qstring is the traditional syntax with double quotes: "Hello". It supports all the standard C-style or JSON escapes. Most C-style escapes are supported: "Hello!\n". ECMAScript 2015 “Unicode code point escapes” seems a reasonable extension: \u{hex-digits}. (They may be a way to continue line using some escape sequence, details not yet decided.)

Brace string literals

A bstring is written using curly braces: {Hello}. Braces nest: {string with {braces}.}. These maybe multi-line and there are various escape sequences, like Kawa template string, though backslash is used as the escape character rather than &.

{L\aelig;rdals\oslash;yri} evaluates to "Lærdalsøyri".

{Adding 3 and 4 yields \(3 + 4).} evaluates to "Adding 3 and 4 yields 7.".

You can also add formatting specifiers.

You can nest bstrings and qstrings by prefixing them with a backslash (not implemented).

Object constructor syntax

An identifier allowed by a brace-literal is a conveniece syntax for constructing complex objects:

URI{http://example.com/}

The constructor can also contain expressions in parentheses (which is evaluated), or bracket literals that contain multiple expressions. There may be no (unescaped) spaces between the parts of an object literal.

The concept and implementation are similar to Kawa’s and SRFI-108’s Named quasi-literal constructors. However, the syntax is different in using backslash as the escape character, and not requiring an initial backslash.

Rich text objects [partially implemented]

A rich text is an enhanced string, with embedded objects and formatting. It is syntatic sugar for a kind of object constructor. It has the form of a single-quote followed by a bstring.

'{Some text *strong* and \em{emphasized}.}

A subset of Markdown syntax is recognized, including *, _ and blank lines for paragrph separator. Beyond that, general object literal syntax is used.

The above is equivalent to:

text{Some text \text:b{strong} and \text:em{emphasized}.}

Evaluating either expression yields a text object, which is a tree-structure that generalies strings. The text object can then be converted to various formats depending on context. For example:

write-pdf -filename=hello.pdf '{Hello!}

or

as-html '{Some text *strong* and \em{emphasized}.}

which yields "<p>Some text <b>strong</b> and <em>emphasized</em>.</p>"

It is intended that text literals be used to document programs. Tools that pretty-print programs or extract API information should format these documentation strings.

The DomTerm terminal emulator allows “printing” HTML as rich text. When printing a text value in a DomTerm REPL it should implicitly call as-html and show that.

Keywords

(Not yet implemented.)

One problem with the existing (Kawa) keyword syntax is that it does not support (tab-)completion, because it does not start with a special character. To fix that we can “merge” keyword syntax with command option syntax, by prefixing with ‘-’. For example:

--name=John

corresponds to Kawa-Scheme’s

name: "John" ;; or maybe: name: 'John

Either a single ‘-’ or double ‘--’ are allowed. They have the same effect in evaluation node, but are different in non-eval (quoted) mode.

An operand (compare Kawa syntax) can be one of:

-identifier=word
--identifier=word

The word is implicitly quoted. No space is allowed before or after the ‘=’.

-identifier: expression
--identifier: expression

A number keyword-argument pair, with expression evaluated a call time. Space is required after the ‘:’.

-identifier
--identifier

Equivalent to: --identifier: #t. (Assuming identifier does not start with no-.)

-no-identifier
--no-identifier

Equivalent to: --identifier: #f.

In non-eval (quoted) mode, the reverse mapping is performed (more-or-less - details/restrictions to come).

Running programs [not implemented yet]

The run macro quasi-quotes its arguments, and then executes the resulting string list as a process invocation, as if using the Kawa run-process function.

run date --utc

The result is a process object. A process can be coerced to a string (or more generally a blob), which is the result of standard output from the process. A process “written” to the REPL coerces it to a string.

The run macro can be left out if the following word has the form of a fully-qualified filename (i.e. starting with /). Also, if the following word is not in the lexical scope, but if there is (at compile-time) an executable file by that name in the PATH then run is also implied.

Filename globbing is performed.

Enclosed expressions are evaluated (at run-time). If they evaluate to a string without newlines, the result is interpolated in the command argument. If the result is a multi-line string or a sequence, more complex rules (TBD) are in play.