Wednesday, September 22, 2010

Socrates: A Proposal for Purely Symbolic Python

The Game

Inspired by Socrates' winning goal for the Greeks, I've recently been considering a more purely symbolic, mathematical form of python. After having seen the concept of Chinese Python and playing around with APL a little, I had the eureka shot of applying mathematical symbols to the language itself. In homage to the spirit of python, I shall name this symbolic python: Socrates.

Ground Rules

A principle we have in Socrates for substituting mathematical symbols for python keywords and built-ins is that the correspondence should be one-to-one, non-overlapping, automated, and invertible. By one-to-one we mean there should be one and only one unique Unicode symbol for each keyword. By non-overlapping, we mean that the character should not overlap any current python usage, which means we cannot use any of the existing python character set, it must be separate and distinct from any ASCII characters. By automated, we mean that an automatic conversion routine should exist to convert between existing python keywords and symbolic keywords. By invertible, we mean that we should be able to go from existing keywords to symbolic keywords, and back again, without limitation.

Why Python?

Python is an ideal source language for Socrates experimentation. It is noted for its clarity and lack of extraneous punctuation and grouping characters, making it ideally suited to symbolic enhancement. It also offers advantages in mapping mathematical equations and algorithms to source code, with native support for complex numbers and its powerful splice notation. In addition, creating a dialect of python using symbols for keywords is much more efficient than creating a language from scratch, taking advantage of the large installed base and library set of the language. This makes python an ideal environment for symbolic expression.

What This Proposal Is NOT

Socrates is not a request to change the Python language. It is not something that requires altering the Cpython source code. It is simply an overlay, if you will, a filter, a viewport, a different way of seeing the exact same code. A preprocessor or better yet an editor or IDE should be able to take Socrates and transform it into standard python for compilation and evaluation. Likewise, standard python coming into an editor, IDE, or debugger should be substitutable with Socrates automatically. Now this gets a little interesting when you try to involve the exec, compile, eval, and execfile functions, they could be overridden to do the symbol transformation through the inclusion of a import statement for a symbol python module, tricky with exec, or we could just leave that out of scope. Generally Socrates should be completely transparent to the interpreter and all libraries. Only an end-user will ever see any difference.

Steps for Ease of Programming

Socrates aims to be easy for the programmer. One of the things that plagued APL was the difficulty of directly entering a large character set. With graphical UIs and emacs or vi escape codes it's not especially difficult these days to enter special characters. However, programmers like to type fast when they're in the zone and don't want to have to remember even more special sequences than they already do. Thus I think the easiest way to implement symbol python from an editor perspective is to have active-replace as you type, so you can type standard python and the symbolic characters will automatically be exchanged with standard ones. For instance, you start typing "if" and then after you have a space the if gets magically replaced by "→" character. This can be implemented in many IDEs and is quite readily implemented on the web in javascript as well. Likewise, Socrates can be more readily understood when you can hover with the mouse, or caret in an editor, and the standard python for the symbol pops up after a short interval for easy interpretation.

Keyword Sources

The official list of python keywords is maintained and defined in the Python Reference Document. The keyword list for the python, although version dependent as I'm using 2.6.1, can easily be obtained through the following short program:

import keyword
print keyword.kwlist


Given this list, we can construct a symbol table mapping from the keywords to equivalent international mathematical symbols. Some mappings are quite obvious, such as logical "and" being the "∧" symbol. Others are more arbitrary and inexact, such as "while" mapping to "∃", which doesn't really mean exactly what the exists symbol does in formal logic. In this regard what we are doing is similar to when mathematical symbols are applied across different fields, whereby a symbol may have a related but not identical meaning. For instance, the symbol for arithmetic mean,"|…|", can also be used for set cardinality.

For the list of Unicode supported special characters I've used the Table of Mathematical Symbols, Greek Letters in Mathematics, and Table of Logic Symbols, all at Wikipedia. This should display properly in your browser if you have Unicode enabled, which most browsers these days have by default.

Keyword Mapping

Each symbol listed below has standard python, the selected symbol, and rationale for the symbol selection. This list is somewhat arbitrary, and not necessarily the best or ideal mapping, so if you have better recommendations let me know.

KeywordSymbolRationale
Expressions
andFrom formal logic
orFrom formal logic
not¬From formal logic
inElement exists in list, as in set theory
not inNegation of exists in
isCongruence, is the same object
is notNot technically correct since this is isomorphism, more a mnemonic
NoneEmpty set
Multi-Arg Expressions
ifAs in formal logic, except before the statement as in python syntax
elseEllipsis, actually its own character not three dots
lambdaλNot purely symbolic since it's Greek, but can't really be anything else
Simple Statements
assertIff: the statement is true if and only iff the following is true
passQED, this expression is done, go to the next one
delDrop the following from memory, from the Webb operator
printAlternative form of QED, the output as result
returnFunction return, those of you old enough may remember this from Smalltalk
yieldReturn for generator functions
raiseLike function return, but with two arrows
breakTensor, but looks like a breakpoint
continueXOR, but we're using it like a graphical breakpoint continuation
importUnion the existing module list with this module
asAlternative form of "definition"
fromFrom a subset of this module, perform an import
globalΓInclude this in the "modular group" of all variable definitions
execΦThe work function in physics, similar to the APL execute symbol
Compound Statements
elifDifferent notation for "if" to distinguish from "→"
whileWhile the true condition exists, execute the body
forFor all elements of the list, execute the body
tryTop statement - try this statement first
exceptFor something raised "independent" of the execution path
finallyBecause every evaluation "entails" the following statement
with"Infer" the following in order to execute the statement below
defFunction definition, although this is in front of the function name
classFunction composition, although in front, and more a set than a composition of functions

Examples

Let's see what this looks like in practice for some small sample programs. Here's the Fibonacci sequence in Socrates:

↦ fib(n):
    → n == 0:
        ↑ 0
    ⇒ n == 1:
        ↑ 1
    …:
        ↑ fib(n-1) + fib(n-2)

Let's try this with the 8-Queens problem from the Python Wiki, which demonstrates many of the symbols in a longer program:

BOARD_SIZE = 8

∘ BailOut(Exception):
    □

↦ validate(queens):
    left = right = col = queens[-1]
    ∀ r ∈ reversed(queens[:-1]):
        left, right = left-1, right+1
        → r ∈ (left, col, right):
            ⇈ BailOut

↦ add_queen(queens):
    ∀ i ∈ range(BOARD_SIZE):
        test_queens = queens + [i]
        ⊤:
            validate(test_queens)
            → len(test_queens) == BOARD_SIZE:
                ↑ test_queens
            …:
                ↑ add_queen(test_queens)
        ⊥ BailOut:
            □
    ⇈ BailOut

queens = add_queen([])
‣ queens
‣ "\n".join(". "*q + "Q " + ". "*(BOARD_SIZE-q-1) ∀ q ∈ queens)

Where Do We Go From Here?

Mostly I did Socrates as a Gedankenexperiment, a thought exercise for those of you whose German is rusty, just to see what it would look like and to go through the task of considering mathematical modeling of programming functions. Some of the symbols I'm happy with, some I'm not to fond of, it could use some refinement. If anyone would like I can attempt a mapping of the built-in functions next, and maybe a python conversion routine, some editor macros and javascript transformation functions. We'll see if anyone is interested or it's just my own pet eccentricity.







6 comments:

  1. So you've changed Python to a non-ASCII syntax. What does this make easier? What problems does it help prevent?

    ReplyDelete
  2. This is fun and interesting! Ignore the haters.

    What about a python program that does the conversion automatically?

    ReplyDelete
  3. 2007 called, they want their April Fools' joke back (aka PEP 3117).

    ReplyDelete
  4. Interesting... from the zen of python
    Readability counts.
    For people who aren't versed in math this is going to be problematic.

    ReplyDelete
  5. I wouldn't wish this language on my worst enemy. It is fun to look at though!

    ReplyDelete
  6. At first glance, this looked like trouble.

    However, I am a native English (American?!)
    speaker, and an avid Pythonista.

    Then I looked at Chinese Python, and
    realized that Socrates could serve as
    the foundation of a "lingua Franca"
    version of Python, so that any other
    version (English, Chinese, etc.) could
    be translated to and from.

    If Chinese Python is useful (to +-20%
    of the Earth's population), Socrates
    could prove very handy indeed!

    ReplyDelete