Friday, September 17, 2010

Symbology, APL, and Chinese Python

In mathematics we make heavy use of symbology.  For instance, we use "+" instead of "plus", "∑" instead of "sum", "π" instead of "pi".  This aids mathematics in being clear, compact, and universal across all human languages.

In programming languages, we generally only use English.  There are a few exceptions for grouping characters like "()" and "{}", punctuation like ":", and the basic mathematical "+", "-", "/".  However the vast majority of the language is in English, such as "if", "else", "for", "while", "import".  Python is no exception to this tendency.

English speakers may at this point simply claim that everyone must learn English to program, and that is that.  However this is somewhat unrealistic, given the long and difficult course of study required to understand English well for non-native speakers, and furthermore promotes pushback with the charge of cultural imperalism.  Even among those programmers who can get by in English, it is still much easier and more comfortable to use one's native language, as evidenced by the large number of translations available for standard programming works.

German is a typical example.  The language is not so dissimilar from English that with the aid of a dictionary a programming text could not be understood, however it is much slower for all but the most gifted learners than reading a German translation.   I've seen a bookstore filled with German programming books of almost any language and system you can imagine.  I've seen SAP source code written in ABAP with English control words but all variables and comments in German.

With Chinese the difference is even more striking.  Not only is it even more difficult for a Chinese speaker to understand English with a dictionary than for a German speaker to understand, given the similar language backgrounds, but there is the added difficulty of international character sets, input methods, and the "ASCII-first" mentality.  In addition, with China becoming a greater economic and social power, there are over ten times the number of Chinese speakers on the Internet than German, with an increase in demand for understandable code.  In fact there is even a python fork where the control words are entirely in Chinese:

Chinese Python

Here is an extract below, for help in understanding, "回答" means "answer", "有/没有" means "have / doesn't have", "读入"  means "read", "写" means "write", "如" means "if", "不然" means "else", and "否则" means "otherwise":

回答 = 读入('你认为中文程式语言有存在价值吗 ? (有/没有)')
如 回答 == '有':
 写 '好吧, 让我们一起努力!'
不然 回答 == '没有':
 写 '好吧,中文并没有作为程式语言的价值.'
 写 '请认真考虑后再回答.'

It may be a bit of a struggle, but if you look at it a few times you can see it's a simple program which happens to be asking about the value of Chinese as a programming language.  It asks the user for input, and then based on the user answer, has an if block to write three different responses on the output.

Chinese Python is a particularly good example for English speakers because it shows how different and difficult it is to think and compose programs in a very dissimilar written language.  This may help give a perspective for the difficulty faced by Chinese speakers in working with English programs.  I had a friend from Hong Kong whose way of dealing with this problem in programming classes was just to lookup the Chinese words for the English in his English programming textbook and write all the Chinese words above the English words which then made it much easier to review and study.

So on the one hand we have the monolingual approach of only using English keywords and forcing everyone to learn English, on the other hand we have the fragmentation approach whereby every major language does its own fork so it can have control words in its native tongue. Between these two extremes I could see a compromise of multilingual keywords for the same code base.  You could have a language-locale header for every source file, as with XML encodings, and then process keywords from that language lookup table for the given source code, the *.py file.  In the compiled binary, the *.pyc, you could just store the keyword ID.  This would let you have multiple source code language files in the same project, and furthermore let a person view the source and debug a binary in their own language, with the language keywords of that particular user at runtime.  So the Chinese Python example above would look, to someone with an English overlay, like:

回答 = raw_input('你认为中文程式语言有存在价值吗 ? (有/没有)')
if 回答 == '有':
 print '好吧, 让我们一起努力!'
elif 回答 == '没有':
 print '好吧,中文并没有作为程式语言的价值.'
 print '请认真考虑后再回答.'

As you can see this is much more understandable to English users.  We could have similar overlays for English, Chinese, German, French, Spanish, Italian, Japanese, etc.  Right-justified languages such as Arabic or Hebrew may be slightly more difficult, but with an abstracted user display class, could be made to work seamlessly.

Besides allowing language overlays in Python, it is also possible to adopt universal mathematical symbols as the keyword language.  This could be an additional language overlay, but alternatively the default keyword language, understandable across all languages.  The language APL took this concept to a new level in the 1960s, even creating new symbols for highly compact matrix operations.


An example APL program below that finds all primes between the interval 1..R:

Which, although very inefficient, is rather brilliant and elegant once you study APL.  However such dense symbology, especially with custom-made symbols, is difficult to understand and contrary to ease-of-reading which is one of the goals of python.

Instead, let us apply the same symbology principle but only to python keywords, using existing mathematical symbols that any computer science graduate would be familiar with from formal logic.  Furthermore we will use symbols readily available in ASCII so we do not need to use any custom characters.

Using the substitutions:

<= raw_input (from unix stdin)
=> print (from unix stdout)
? if
?: elif
:: else

Substituting this into our earlier example results in:

回答 = <=('你认为中文程式语言有存在价值吗 ? (有/没有)')
? 回答 == '有':
 => '好吧, 让我们一起努力!'
?: 回答 == '没有':
 => '好吧,中文并没有作为程式语言的价值.'
 => '请认真考虑后再回答.'

As python evolves, and encounters more of the non-English speaking community, we may expect keywords and built-in functions to become more and more symbolic, improving legibility for a worldwide audience.

1 comment:

  1. If you want to learn and get a good sense for APL, and you're on Windows, the best I've found is Dyalog. You can evaluate free for 90 days, and they have a good IDE with all the special characters on a toolbar with all their definitions. Definitely the best tool for learning APL.