Showing posts with label apl. Show all posts
Showing posts with label apl. Show all posts

Wednesday, September 22, 2010

Socrates: A Proposal for Purely Symbolic Python

The Game

Inspired by Socrates' winning goal for the Greeks, I've recently been considering a more purely symbolic, mathematical form of python. After having seen the concept of Chinese Python and playing around with APL a little, I had the eureka shot of applying mathematical symbols to the language itself. In homage to the spirit of python, I shall name this symbolic python: Socrates.

Ground Rules

A principle we have in Socrates for substituting mathematical symbols for python keywords and built-ins is that the correspondence should be one-to-one, non-overlapping, automated, and invertible. By one-to-one we mean there should be one and only one unique Unicode symbol for each keyword. By non-overlapping, we mean that the character should not overlap any current python usage, which means we cannot use any of the existing python character set, it must be separate and distinct from any ASCII characters. By automated, we mean that an automatic conversion routine should exist to convert between existing python keywords and symbolic keywords. By invertible, we mean that we should be able to go from existing keywords to symbolic keywords, and back again, without limitation.

Why Python?

Python is an ideal source language for Socrates experimentation. It is noted for its clarity and lack of extraneous punctuation and grouping characters, making it ideally suited to symbolic enhancement. It also offers advantages in mapping mathematical equations and algorithms to source code, with native support for complex numbers and its powerful splice notation. In addition, creating a dialect of python using symbols for keywords is much more efficient than creating a language from scratch, taking advantage of the large installed base and library set of the language. This makes python an ideal environment for symbolic expression.

What This Proposal Is NOT

Socrates is not a request to change the Python language. It is not something that requires altering the Cpython source code. It is simply an overlay, if you will, a filter, a viewport, a different way of seeing the exact same code. A preprocessor or better yet an editor or IDE should be able to take Socrates and transform it into standard python for compilation and evaluation. Likewise, standard python coming into an editor, IDE, or debugger should be substitutable with Socrates automatically. Now this gets a little interesting when you try to involve the exec, compile, eval, and execfile functions, they could be overridden to do the symbol transformation through the inclusion of a import statement for a symbol python module, tricky with exec, or we could just leave that out of scope. Generally Socrates should be completely transparent to the interpreter and all libraries. Only an end-user will ever see any difference.

Steps for Ease of Programming

Socrates aims to be easy for the programmer. One of the things that plagued APL was the difficulty of directly entering a large character set. With graphical UIs and emacs or vi escape codes it's not especially difficult these days to enter special characters. However, programmers like to type fast when they're in the zone and don't want to have to remember even more special sequences than they already do. Thus I think the easiest way to implement symbol python from an editor perspective is to have active-replace as you type, so you can type standard python and the symbolic characters will automatically be exchanged with standard ones. For instance, you start typing "if" and then after you have a space the if gets magically replaced by "→" character. This can be implemented in many IDEs and is quite readily implemented on the web in javascript as well. Likewise, Socrates can be more readily understood when you can hover with the mouse, or caret in an editor, and the standard python for the symbol pops up after a short interval for easy interpretation.

Keyword Sources

The official list of python keywords is maintained and defined in the Python Reference Document. The keyword list for the python, although version dependent as I'm using 2.6.1, can easily be obtained through the following short program:

import keyword
print keyword.kwlist


Given this list, we can construct a symbol table mapping from the keywords to equivalent international mathematical symbols. Some mappings are quite obvious, such as logical "and" being the "∧" symbol. Others are more arbitrary and inexact, such as "while" mapping to "∃", which doesn't really mean exactly what the exists symbol does in formal logic. In this regard what we are doing is similar to when mathematical symbols are applied across different fields, whereby a symbol may have a related but not identical meaning. For instance, the symbol for arithmetic mean,"|…|", can also be used for set cardinality.

For the list of Unicode supported special characters I've used the Table of Mathematical Symbols, Greek Letters in Mathematics, and Table of Logic Symbols, all at Wikipedia. This should display properly in your browser if you have Unicode enabled, which most browsers these days have by default.

Keyword Mapping

Each symbol listed below has standard python, the selected symbol, and rationale for the symbol selection. This list is somewhat arbitrary, and not necessarily the best or ideal mapping, so if you have better recommendations let me know.

KeywordSymbolRationale
Expressions
andFrom formal logic
orFrom formal logic
not¬From formal logic
inElement exists in list, as in set theory
not inNegation of exists in
isCongruence, is the same object
is notNot technically correct since this is isomorphism, more a mnemonic
NoneEmpty set
Multi-Arg Expressions
ifAs in formal logic, except before the statement as in python syntax
elseEllipsis, actually its own character not three dots
lambdaλNot purely symbolic since it's Greek, but can't really be anything else
Simple Statements
assertIff: the statement is true if and only iff the following is true
passQED, this expression is done, go to the next one
delDrop the following from memory, from the Webb operator
printAlternative form of QED, the output as result
returnFunction return, those of you old enough may remember this from Smalltalk
yieldReturn for generator functions
raiseLike function return, but with two arrows
breakTensor, but looks like a breakpoint
continueXOR, but we're using it like a graphical breakpoint continuation
importUnion the existing module list with this module
asAlternative form of "definition"
fromFrom a subset of this module, perform an import
globalΓInclude this in the "modular group" of all variable definitions
execΦThe work function in physics, similar to the APL execute symbol
Compound Statements
elifDifferent notation for "if" to distinguish from "→"
whileWhile the true condition exists, execute the body
forFor all elements of the list, execute the body
tryTop statement - try this statement first
exceptFor something raised "independent" of the execution path
finallyBecause every evaluation "entails" the following statement
with"Infer" the following in order to execute the statement below
defFunction definition, although this is in front of the function name
classFunction composition, although in front, and more a set than a composition of functions

Examples

Let's see what this looks like in practice for some small sample programs. Here's the Fibonacci sequence in Socrates:

↦ fib(n):
    → n == 0:
        ↑ 0
    ⇒ n == 1:
        ↑ 1
    …:
        ↑ fib(n-1) + fib(n-2)

Let's try this with the 8-Queens problem from the Python Wiki, which demonstrates many of the symbols in a longer program:

BOARD_SIZE = 8

∘ BailOut(Exception):
    □

↦ validate(queens):
    left = right = col = queens[-1]
    ∀ r ∈ reversed(queens[:-1]):
        left, right = left-1, right+1
        → r ∈ (left, col, right):
            ⇈ BailOut

↦ add_queen(queens):
    ∀ i ∈ range(BOARD_SIZE):
        test_queens = queens + [i]
        ⊤:
            validate(test_queens)
            → len(test_queens) == BOARD_SIZE:
                ↑ test_queens
            …:
                ↑ add_queen(test_queens)
        ⊥ BailOut:
            □
    ⇈ BailOut

queens = add_queen([])
‣ queens
‣ "\n".join(". "*q + "Q " + ". "*(BOARD_SIZE-q-1) ∀ q ∈ queens)

Where Do We Go From Here?

Mostly I did Socrates as a Gedankenexperiment, a thought exercise for those of you whose German is rusty, just to see what it would look like and to go through the task of considering mathematical modeling of programming functions. Some of the symbols I'm happy with, some I'm not to fond of, it could use some refinement. If anyone would like I can attempt a mapping of the built-in functions next, and maybe a python conversion routine, some editor macros and javascript transformation functions. We'll see if anyone is interested or it's just my own pet eccentricity.







Friday, September 17, 2010

Symbology, APL, and Chinese Python


In mathematics we make heavy use of symbology.  For instance, we use "+" instead of "plus", "∑" instead of "sum", "π" instead of "pi".  This aids mathematics in being clear, compact, and universal across all human languages.

In programming languages, we generally only use English.  There are a few exceptions for grouping characters like "()" and "{}", punctuation like ":", and the basic mathematical "+", "-", "/".  However the vast majority of the language is in English, such as "if", "else", "for", "while", "import".  Python is no exception to this tendency.

English speakers may at this point simply claim that everyone must learn English to program, and that is that.  However this is somewhat unrealistic, given the long and difficult course of study required to understand English well for non-native speakers, and furthermore promotes pushback with the charge of cultural imperalism.  Even among those programmers who can get by in English, it is still much easier and more comfortable to use one's native language, as evidenced by the large number of translations available for standard programming works.

German is a typical example.  The language is not so dissimilar from English that with the aid of a dictionary a programming text could not be understood, however it is much slower for all but the most gifted learners than reading a German translation.   I've seen a bookstore filled with German programming books of almost any language and system you can imagine.  I've seen SAP source code written in ABAP with English control words but all variables and comments in German.

With Chinese the difference is even more striking.  Not only is it even more difficult for a Chinese speaker to understand English with a dictionary than for a German speaker to understand, given the similar language backgrounds, but there is the added difficulty of international character sets, input methods, and the "ASCII-first" mentality.  In addition, with China becoming a greater economic and social power, there are over ten times the number of Chinese speakers on the Internet than German, with an increase in demand for understandable code.  In fact there is even a python fork where the control words are entirely in Chinese:

Chinese Python

Here is an extract below, for help in understanding, "回答" means "answer", "有/没有" means "have / doesn't have", "读入"  means "read", "写" means "write", "如" means "if", "不然" means "else", and "否则" means "otherwise":

回答 = 读入('你认为中文程式语言有存在价值吗 ? (有/没有)')
 
如 回答 == '有':
 写 '好吧, 让我们一起努力!'
不然 回答 == '没有':
 写 '好吧,中文并没有作为程式语言的价值.'
否则:
 写 '请认真考虑后再回答.'

It may be a bit of a struggle, but if you look at it a few times you can see it's a simple program which happens to be asking about the value of Chinese as a programming language.  It asks the user for input, and then based on the user answer, has an if block to write three different responses on the output.

Chinese Python is a particularly good example for English speakers because it shows how different and difficult it is to think and compose programs in a very dissimilar written language.  This may help give a perspective for the difficulty faced by Chinese speakers in working with English programs.  I had a friend from Hong Kong whose way of dealing with this problem in programming classes was just to lookup the Chinese words for the English in his English programming textbook and write all the Chinese words above the English words which then made it much easier to review and study.

So on the one hand we have the monolingual approach of only using English keywords and forcing everyone to learn English, on the other hand we have the fragmentation approach whereby every major language does its own fork so it can have control words in its native tongue. Between these two extremes I could see a compromise of multilingual keywords for the same code base.  You could have a language-locale header for every source file, as with XML encodings, and then process keywords from that language lookup table for the given source code, the *.py file.  In the compiled binary, the *.pyc, you could just store the keyword ID.  This would let you have multiple source code language files in the same project, and furthermore let a person view the source and debug a binary in their own language, with the language keywords of that particular user at runtime.  So the Chinese Python example above would look, to someone with an English overlay, like:

回答 = raw_input('你认为中文程式语言有存在价值吗 ? (有/没有)')
 
if 回答 == '有':
 print '好吧, 让我们一起努力!'
elif 回答 == '没有':
 print '好吧,中文并没有作为程式语言的价值.'
else:
 print '请认真考虑后再回答.'

As you can see this is much more understandable to English users.  We could have similar overlays for English, Chinese, German, French, Spanish, Italian, Japanese, etc.  Right-justified languages such as Arabic or Hebrew may be slightly more difficult, but with an abstracted user display class, could be made to work seamlessly.

Besides allowing language overlays in Python, it is also possible to adopt universal mathematical symbols as the keyword language.  This could be an additional language overlay, but alternatively the default keyword language, understandable across all languages.  The language APL took this concept to a new level in the 1960s, even creating new symbols for highly compact matrix operations.

APL

An example APL program below that finds all primes between the interval 1..R:

(~R∊R∘.×R)/R←1↓⍳R
Which, although very inefficient, is rather brilliant and elegant once you study APL.  However such dense symbology, especially with custom-made symbols, is difficult to understand and contrary to ease-of-reading which is one of the goals of python.

Instead, let us apply the same symbology principle but only to python keywords, using existing mathematical symbols that any computer science graduate would be familiar with from formal logic.  Furthermore we will use symbols readily available in ASCII so we do not need to use any custom characters.

Using the substitutions:

<= raw_input (from unix stdin)
=> print (from unix stdout)
? if
?: elif
:: else

Substituting this into our earlier example results in:

回答 = <=('你认为中文程式语言有存在价值吗 ? (有/没有)')
 
? 回答 == '有':
 => '好吧, 让我们一起努力!'
?: 回答 == '没有':
 => '好吧,中文并没有作为程式语言的价值.'
:::
 => '请认真考虑后再回答.'

As python evolves, and encounters more of the non-English speaking community, we may expect keywords and built-in functions to become more and more symbolic, improving legibility for a worldwide audience.