tag:blogger.com,1999:blog-15396951103858929352024-03-18T11:30:12.232-07:00Python CloudAll things python programming in the cloud.John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-1539695110385892935.post-47235488259739853392011-04-10T09:06:00.000-07:002011-04-10T11:42:31.624-07:00Reviving the Ancient Game of Liubo 六博 as an HTML5 Multiplayer App with Google AppEngine on Python and the Channel API<h3>What is Liubo(六博)?</h3><br />
<p>One of the oldest recorded board games, Liubo, 六博, literally "six sticks" in Chinese, has a mysterious origin and even more mysterious disappearance. Emerging early in Chinese history, it was the most played game during the Han Dynasty and remained so until the emergence of Go, 围棋, in ancient China. In spite of its abundant preservation in the archeological record, and numerous historical writings describing the game, the exact rules of the game remain unknown to this very day.<br />
</p><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://upload.wikimedia.org/wikipedia/commons/thumb/5/55/Met%2C_Earthenware_figures_playing_liubo%2C_Han_Dynasty.JPG/300px-Met%2C_Earthenware_figures_playing_liubo%2C_Han_Dynasty.JPG" imageanchor="1" style="clear:right; float:right; margin-left:1em; margin-bottom:1em"><img border="0" height="177" width="300" src="http://upload.wikimedia.org/wikipedia/commons/thumb/5/55/Met%2C_Earthenware_figures_playing_liubo%2C_Han_Dynasty.JPG/300px-Met%2C_Earthenware_figures_playing_liubo%2C_Han_Dynasty.JPG" /></a></div><br />
<br />
<h3>Liubo: Resurrection</h3><br />
<p>Fortunately, we need not leave this game in perpetual obscurity. Jean-Louis Cazaux, an author of many books on the history of chess and ancient board games, has done an <a href="http://history.chess.free.fr/liubo.htm">extensive study</a> of the historical and archeological record and posited a reconstruction of the rules. Furthermore, he has subjected these rules to extensive playtesting to discover a game that is quite entertaining to play.<br />
After some exchanges with the author I've been able to implement the game he envisioned on the web with both multiplayer and AI capability.<br />
</p><p>To play the game, you first sign in using an OpenID such as Google or Yahoo. Then you press "Play" and you will be placed into the game queue for another opponent. If you wish, you may press "Skip to AI" and play the AI instead. Then you will randomly play as either white or black, white goes first as in chess. Two sets of yin-yang sticks are then thrown automatically on the right side of the screen, one mark for yin and two marks for yang. Total the yang plus one for each set of three sticks, and you have your move number, for instance 3-2. This means you move 3 the first move, then 2 the second. To move, you may either enter a stone onto the board at the move number, or if the piece has not moved this turn, you may advance the stone. Advancements go counter-clockwise around the board; press the Moves button to see the move order. At two special positions, numbers 6 and 11, you may move inward towards the center and across the board. If you land directly on the center, your piece is promoted to an "owl" if you don't have one already. This owl can then capture other pieces and put them into your prison; for regular stones, captures only return the piece to the other player. You may have only one owl at a time. Capture all the opponent stones and you win; capture the opponent owl if you have no owl and you also win; have five stones on the board and no owl while your opponent has an owl and you also win; let the other side run out of the 10 minute move clock and you also win. After each game, your elo rank and that of your opponent (including the AI, "SiliconDragon") is adjusted. You may see the top rated players with the Ratings button. You can find the game here:<br />
</p><p><a href="http://liubo-game.appspot.com/">Click to Play Liubo 六博</a><br />
</p><br />
<h3>HTML5 Multiplayer Design</h3><br />
<p>When starting the design, I first did a static layout of the board in HTML and CSS. I wanted a design that would scale to different screen sizes so I could use it on iPad as well as various desktop sizes. Also I wanted it to work on various browsers including IE8, so I focused on using CSS drawing techniques with borders and screen percentages and tested it on various platforms. Using gradients and fallback CSS I was able to make the board, throwing sticks and pieces without using any images at all.<br />
</p><br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9l3wY_pkz1Y1uwGxLILKSD4OwQFcdq1Uh-CzAiLR34cee94SJFSBZmQ79Tzeuc_J9LohDHWdyNj19RWQAJhl3ESdlXLAA6leaZYuvQFhqbUir0BZ8yyf4_j09l62g-sRapN-hJTE8_qNB/s1600/liubo-gameplay.png" imageanchor="1" style="clear:right; float:right; margin-left:1em; margin-bottom:1em"><img border="0" height="205" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9l3wY_pkz1Y1uwGxLILKSD4OwQFcdq1Uh-CzAiLR34cee94SJFSBZmQ79Tzeuc_J9LohDHWdyNj19RWQAJhl3ESdlXLAA6leaZYuvQFhqbUir0BZ8yyf4_j09l62g-sRapN-hJTE8_qNB/s320/liubo-gameplay.png" /></a></div><br />
<p>Once the layout was done, there remained the huge effort of programming. There are many options in creating an interactive HTML5 app today, from WebSockets, to Flash compilation to Canvas, to game-specific libraries such as Impact.js. For speed and compatibility, I chose to use div-based HTML5 javascript with jQuery. Although node.js shows promise, I prefer the established ease of use of Google AppEngine and its python backend, so I went with that. Linking the client and server is the newly released Google Channel API.<br />
</p><br />
<h3>Implementing a Google Channel API Client</h3><br />
<p>The trickiest part of the whole project was getting the Channel API to work properly. My first mental reset was realizing it is a unidirectional, not bidirectional, API. That is, the Channel API requires that the server give a channel token to a client and then the client connect to the server, however only the server is allowed to send messages to the client. The client is <em>not</em> allowed to send messages to the server over the channel. Instead, the client must use normal HTTP requests to the server which then must be matched to a client token whereby the server can send a response over the channel.<br />
</p><br />
<p>Illustrating this procedure, first here is a python snippet initiated by client request for a new channel. It runs on the appengine and creates a new channel, stores it to the datastore, and then returns the channel token along with other user data in a JSON:</p><pre>def put_channel(self, user, profile):
profile.channeltoken =
channel.create_channel(make_channel_id(user.federated_identity()))
profile.channeldate = datetime.now()
profile.put()
profile_json = {
'screenname': profile.screenname,
'character': profile.character,
'rating': profile.rating,
'rank': compute_rank(profile.rating),
'numgames': profile.numgames,
'dateadded': profile.date_added.strftime('%Y-%m-%d'),
'logoutURL': users.create_logout_url('/'),
'channeltoken': profile.channeltoken
}
self.response.out.write(simplejson.dumps(profile_json))
</pre><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/England_expects...retouched.jpg/300px-England_expects...retouched.jpg" imageanchor="1" style="clear:right; float:right; margin-left:1em; margin-bottom:1em"><img border="0" height="225" width="300" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/England_expects...retouched.jpg/300px-England_expects...retouched.jpg" /></a></div><br />
<p>On the javascript side, we receive this channeltoken, set our channel id to this token, and then create a new channel. We then open a socket and setup socket handlers. The socket open handler requests a new game to join. The socket message handler processes each message sent from the server, such as starting a multiplayer game and receiving moves from an opponent.</p><pre>self.channel = new goog.appengine.Channel(self.channelid);
self.socket = self.channel.open();
self.socket.onopen = self.socketOnOpened;
self.socketOnOpened = function() {
// give it a chance to startup
setTimeout(game.requestJoin, 5000);
};
self.socketOnMessage = function(message) {
// console.log('received message data:'+message.data);
var messageArray = $.parseJSON(message.data);
if (!messageArray) {
alert('invalid message received: ' + message.data);
return;
}
for (var i=0; i<messageArray.length; i++) {
var messageline = messageArray[i];
var command = messageline.shift();
var data = messageline;
self.commandProcessor.processCommand(command, data);
}
};
</pre><br />
<p>Going again back to the server-side python code, we create a simple function for sending messages to the client via the channel. When creating a game, we first create a unique gamekey, then send it to the channel. Another important point to note about the Channel API is illustrated here: the server must have a different unique channel to <em>each</em> client it wishes to connect to. There are no broadcast channels or multiple subscribers; it is strictly one-to-one. So the server has created two channel ids, one for the white side and one for the black side, upon client connect. Then as the game progresses, messages are sent from the server to each client via its own respective channel. A unique gamekey, sent by each client during a request, allows the server to connect a request to a particular game, and thus provide a link between two clients in a multiplayer game:<br />
</p><br />
<pre>def send_message(self, channelid, json):
logging.info('sending message to:%s payload:%s',
channelid, simplejson.dumps(json))
channel.send_message(channelid, simplejson.dumps(json))
def create_game(self):
self.create_gamekey()
game = Game(parent=gamezero)
game.gamestatus = 'WJ' # waiting for join
game.gamecreated = datetime.now()
game.gamekey = newline_encode(self.gamekey)
game.gamestatus = 'GS' # game started
game.gamestarted = datetime.now()
game.put()
self.send_message(self.black_channelid, [['PB'],
['OP',game.white_playername],['GS',self.gamekey]])
# game started, white's move
self.send_message(self.white_channelid, [['PW'],
['OP',game.black_playername],['GS',self.gamekey],['WM']])
</pre><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/England_Expects_Signal.svg/480px-England_Expects_Signal.svg.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="156" width="480" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/England_Expects_Signal.svg/480px-England_Expects_Signal.svg.png" /></a></div><br />
<p>Again returning to the client javascript side, we receive the new gamekey and begin the game. Each time after the client player moves, an HTTP POST request is sent to the server. This request contains the gamekey and the moves for this turn. We don't wait for any reply from the server to our request; instead, our previously established socket listener waits for any commands to be received via the channel API, namely, the other players moves. This methodology is how the Channel API is meant to be used in interactive applications.<br />
</p><br />
<pre>self.startGame = function(data){
$('#gamejoinstatus').html('Starting game...');
clearTimeout(self.joinTimeout);
game.gamekey = data[0];
game.startGame();
}
self.sendMoves = function(moves) {
var params = {
gamekey: self.gamekey,
moves: $.JSON.encode(moves)
};
// console.log('POST moves to server:'+params.moves);
$.post('/games', params)
.error(function(){
$('#throwturn').html('Unable to send move');
});
}
</pre><br />
<p>Returning to the python server side, we find our move processor. After receiving the HTTP POST request from the client with the game key and validating moves, we simply relay the move to the other player via the other player's channel id. In addition, we check for a validated game over condition:</p><br />
<pre>def post(self):
self.user = users.get_current_user()
self.request.get('gamekey'), self.request.get('moves'))
self.gamekey = self.request.get('gamekey')
if not self.user or not self.gamekey:
self.error(403)
elif self.request.get('moves'):
self.moves = simplejson.loads(self.request.get('moves'))
if self.decode_gamekey():
self.process_moves()
else:
self.error(403)
def process_moves(self):
# just relay moves to the other side
self.send_message(self.other_channelid(), self.moves);
for move in self.moves:
command = move[0]
if command == 'GO': # gameover
status = move[1]
winner = move[2]
reason = move[3]
self.process_gameover(status, winner, reason)
break
</pre><br />
<p>Lastly, let's see how we actually process moves on the client side. I've adopted a simple json-based symmetrical send/receive protocol, so all messages back and forth are sent as json-encoded lists. Each item in the list is itself a list, consisting of a two-character string command code and zero or more data items. The heart of this is the client command processor that receives each different command type from the server and then dispatches it to the appropriate handler:<br />
</p><br />
<pre>self.processCommand = function(command, data) {
// console.log('Command and data received: '+command+' '+data);
switch (command) {
case 'WJ': self.waitForJoin(); break;
case 'OP': self.setOtherPlayer(data); break;
case 'PW': self.setPlayer('white'); break;
case 'PB': self.setPlayer('black'); break;
case 'GS': self.startGame(data); break;
case 'GO': self.gameOver(data); break;
case 'WT': self.receiveThrow('white', data); break;
case 'BT': self.receiveThrow('black', data); break;
case 'WM': self.takeTurn('white'); break;
case 'BM': self.takeTurn('black'); break;
case 'WS': self.moveStone('white', data); break;
case 'BS': self.moveStone('black', data); break;
case 'WP': self.promotionMove('white', data); break;
case 'BP': self.promotionMove('black', data); break;
case 'WB': self.receiveBonusThrow('white'); break;
case 'BB': self.receiveBonusThrow('black'); break;
case 'NR': self.newRating(data); break;
default:
}
};
</pre><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://t1.gstatic.com/images?q=tbn:ANd9GcQW6d4K4GK_kypoRWRcS2ZZU_I9hMd92IWP2sThxLMioZY_MDDZ3Q" imageanchor="1" style="clear:right; float:right; margin-left:1em; margin-bottom:1em"><img border="0" height="223" width="226" src="http://t1.gstatic.com/images?q=tbn:ANd9GcQW6d4K4GK_kypoRWRcS2ZZU_I9hMd92IWP2sThxLMioZY_MDDZ3Q" /></a></div><br />
<br />
<br />
<p>That pretty much covers how to use the Google Channel API in a multiplayer game. The good thing is I found it worked across all the platforms I tested, and at lower volumes it's completely free to use on the Google AppEngine. So you can get something up and running without spending any cash, only if it takes off do you need to front up some mulah. There are many other aspects to the game which I could elaborate if readers desire, but this has provided a good introduction to using the Channel API in your own game. Try it out:<br />
</p><br />
<br />
<p><a href="http://liubo-game.appspot.com/">Click to Play Liubo 六博</a><br />
</p><br />
<p>PS: I recommend the site <a href="http://html5games.com">HTML5Games</a> to see the latest in HTML Indie Gaming. <br />
</p>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com150tag:blogger.com,1999:blog-1539695110385892935.post-89443846011248659072011-02-21T08:47:00.000-08:002011-02-21T14:33:09.941-08:00How to Write an HTML5 Game in 30 Days with JQuery and Python<h3>Kicking Ass and Taking Names</h3><br />
<em>“I wanted to do in boxing what Bruce Lee was able to do in karate. Lee was an artist, and, like him, I try to get beyond the fundamentals of my sport. I want my fights to be seen as plays.” - Sugar Ray Leonard</em><br />
<br />
<p>Enough farting around. If you've been wanting to write an HTML5 game for some time now, but you were too busy trying to get a perfectly timed grenade jump on Halo 3 while munching away at the leftover Doritos that fell between the cracks in your couch cushion, then you've come to the right place. I'll show you how I did it - and in 30 days. Actually, I lied - it took exactly 33 days from starting to launch to produce <a href="http://towers-of-wolin.appspot.com">Towers of Wolin</a>. Here's how.<br />
</p><br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-RHc5nc6_A0YsfK_AOM-4c2iLiFlXyRIe0ZLUFatoUsOcAKRXoRj0QIv04EV8TU34aFXSCM1U4hvyFQvC8TAX_9ILzbnTugkDtwJILfAw7-rXXh2AP8tvNUO001aU_zhsf-OjmeOpgYvw/s1600/towers-of-wolin-sample.png" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="200" width="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-RHc5nc6_A0YsfK_AOM-4c2iLiFlXyRIe0ZLUFatoUsOcAKRXoRj0QIv04EV8TU34aFXSCM1U4hvyFQvC8TAX_9ILzbnTugkDtwJILfAw7-rXXh2AP8tvNUO001aU_zhsf-OjmeOpgYvw/s200/towers-of-wolin-sample.png" /></a></div><br />
<br />
<h3>You've Got To Have What it Takes</h3><br />
<em>“I'll think, If this is his first punch, how are the others gonna feel? That's the only fear I have for myself.” - Sugar Ray Leonard</em><br />
<br />
<p>Day one starts when you decide that in 30 days you're going to release a game. Read <a href="http://www.gamasutra.com/view/feature/2438/how_to_prototype_a_game_in_under_7_.php">this article</a> on Gamasutra about prototyping in 7 days. These guys were full time, I have a day job so I can only spend on average 2 hours a day (less on weekdays, more on weekends); this works out to around 30 days equivalent time, so that was my goal.<br />
</p><br />
<p>It's important to be realistic about it - in 30 days you're not going to make Starcraft III. Well in fact, you're never going to make Starcraft III unless you have tens of millions of dollars and hundreds of people and a huge marketing budget. What you can make is something<br />
small, simple, and fun. Think <a href="http://www.youtube.com/watch?v=hWTFG3J1CP8">Tetris</a>.<br />
</p><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://www.popgadget.net/images/crate_tetris.jpg" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="408" width="300" src="http://www.popgadget.net/images/crate_tetris.jpg" /></a></div><br />
<p>Most importantly, don't be afraid to fail. Don't be afraid that people will hate your game, you, your dog, and your IDE (I prefer vim). You know what? Fuck 'em. This is your game, and even if you're the only person in the world that likes it, it's your baby. That's reward enough.<br />
</p><br />
<h3>Week One: Game Design</h3><br />
<em>“You have to know you can win. You have to think you can win. You have to feel you can win.” - Sugar Ray Leonard</em><br />
<br />
<p>You've got four weeks, and little wiggle time at the end. Where do you start? Game design. Many programmers miss this. An awesome technical game with a fast reactive engine that's not fun to play is still a shitty game. Likewise, artists forget that a beautiful display of visual mastery can still be boring. You're making a game, not an art museum. You need a design that is, most of all, fun. Think Angry Birds.<br />
</p><br />
<p>Week one is Game Design Week. There are a million great game ideas out there. If you're a gamer, you probably already have tons of them. When I need inspiration I head on over to <a href="http://html5games.com/" target="_blank">HTML5games</a> where they have a huge collection of HTML5 games, all for free, that's given me lots of ideas. But don't just think video games, think board games, children's games like <a href="http://en.wikipedia.org/wiki/Tumbang_preso">Tubang Preso</a>, sports you've played. In the time constraint of a week, you need to focus on the simple.<br />
</p><br />
<p>When I was writing Towers of Wolin, I had just come off being dazzled by <a href="http://www.wordsquared.com/" target="_blank">WordSquared</a>, and listening to a friend discuss his latest <a href="http://www.gamecabinet.com/rules/DieSiedler.html" target="_blank">Settlers of Catan</a> match. It was fuzzy, and I briefly thought of doing a Settlers clone like<br />
<a href="http://gsettlers.dyndns.org/index.html" target="_blank">gSettlers</a> or<br />
<a href="http://sourceforge.net/projects/jsettlers/files/Dissertation/Version%201.0/Dissertation.pdf/download" target="_blank">JSettlers</a>, but I wanted to do my own game with a unique set of rules. So I started playing around some div tags and <a href="http://www.w3.org/TR/SVG/filters.html" target="_blank">SVG</a> graphics commands just to see what I could throw up on the screen, using source images from <a href="http://www.openclipart.org/" target="_blank">OpenClipArt</a>. If you're good at Photoshop or Inkscape you can do this stuff yourself, but I'm not so I used SVG commands instead. I tossed in some backgrounds from <a href="http://www.cgtextures.com/" target="_blank">CGTextures</a> and<br />
<a href="http://www.4freephotos.com/" target="_blank">4FreePhotos</a> and pretty soon I was looking at a board of hexagons with a sea background and a few different colored tiles. I wasn't sure what to do with it so I put on a few farms, fortresses, bridges, RPG stuff. I drew lines for roads, put things at the vertex. There were cool free fonts I saw at <a href="http://www.dafont.com/" target="_blank">Dafont</a>, sound effects at <a href="http://www.pacdv.com/sounds/free-music-24.html">PacDV</a>.<br />
Just messing around.<br />
</p><br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeUFXo3AJvOIMg69180xr61MHjOH0VumnnWl8aeWyEsX7ufjkcmECcHsLYIL6nR3Z_mYTjWaLKbyJOTquO5ChRSS3Mx9BOQc7mYmR_7IQPZnCzwenXs8fTG7nUAX9q-j_ewdXEE9UylZUR/s1600/this-kid-is-awesome.jpg" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="410" width="512" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeUFXo3AJvOIMg69180xr61MHjOH0VumnnWl8aeWyEsX7ufjkcmECcHsLYIL6nR3Z_mYTjWaLKbyJOTquO5ChRSS3Mx9BOQc7mYmR_7IQPZnCzwenXs8fTG7nUAX9q-j_ewdXEE9UylZUR/s1600/this-kid-is-awesome.jpg" /></a></div><br />
<p>Now after a few days of this I needed to figure out what the game was actually going to be. Originally I was thinking Massively Multiplayer Role Playing with resources, characters, an economy. Needless to say, such a game would take years if not decades for a single person to write. So I had to toss this out and think about what simple rules might be easy enough to implement but still fun enough to be worthwhile.<br />
</p><p>At this step in the game of making a game, Simplicity is the keyword. <a href="http://en.wikipedia.org/wiki/KISS_principle">KISS</a> - Keep It Simple, Stupid. At this point you'll have 1000 ideas, 100 of which are good, 10 of which are implementable. But you can only do 1 idea, so you have to toss all but one game idea, even the good ideas! You'll have a chance to do them later. Toss everything you can. I tossed characters, multiplayer, the economy, vertex drops, roads, until all I had left was a humble tower on a humble tile. There was only one action, the humble mouseclick. Click on a legal move tile and a fortress is built there. The enemy does the same. Whoever gets the most territory, wins. Game design complete.<br />
</p><p>Well, it wasn't really complete. I had to think about the level design - did I want to custom-design all my levels or do it automatic - I chose automatic using a <a href="http://en.wikipedia.org/wiki/Fractal_landscape">fractal</a>, since I'm lazy and I didn't want to play the same level over and over again, it's randomly generated each time. Also I had to think about progression - how does the game get harder, how do you advance levels - I wasn't sure but later during playtesting I just gave the AI an additional initial move with each level. There was a question of how all the graphics was going to look - some I didn't toss in until the end - but the basic sprites I needed were defined.<br />
</p><br />
<p>But in sum, it was enough to proceed to the next phase, programming.<br />
</p><br />
<h3>Week Two: Programming</h3><br />
<em>“Generally, the more weight you put on, the less effective you are.” - Sugar Ray Leonard</em><br />
<br />
<p>Only one week for programming? Well not exactly, you're going to be programming the whole thirty days. But your core coding needs to happen in a week. What I mean is, by the end of week two, you should have a basically playable game. It's going to have some quirks, bugs, maybe missing the splash screen and user login, but the core game mechanics should be present.<br />
</p><br />
<p>But how do we get from design to a running game skeleton? There are many paths to the goal, but first we've got to discuss platform. You should have already decided what platform you want to target - desktop, iPhone/iPad, android, browser, Xbox, Gameboy, PS3, whatever. They each have advantages and disadvantages: desktop is great for maximum power and graphics but hard to distribute; iPhone is easy to distribute and monetize but hard to port or expand; android gets you to lots of phones but limits screen space; browser can run anywhere but is harder to monetize; Xbox, Gameboy, PS3, any console device has a big market but production and distribution costs can be very high. For my game, I chose browser because I wanted to reach the maximum audience, I didn't want to front money for a Mac and a development license, and I have more experience with browser-based technologies.<br />
</p><br />
<p>Once you've picked a platform, you can start setting up the framework. I didn't know <a href="http://jquery.com/">JQuery</a> very well when I started, but it was the framework my friends at <a href="http://www.mindquilt.com">MindQuilt</a> had the most experience with, so I knew what it could do, and I'd heard good things about how javascript has matured in the past few years. On the server side, I'd already been writing some <a href="http://python.org">python</a>, and appreciated its lightweight approach compared to Java, especially with Django and <a href="http://appengine.google.com">Google App Engine</a>. Plus, I wanted to play around with some of these to learn new technologies and enhance my <a href="http://www.youtube.com/watch?v=u5wmParkppw">computer hacking skills</a>. So I got a basic app up without much difficulty.<br />
</p><br />
<p>Essentially your programming task is to convert the game design into functioning code. It's not quite that clean, because you'll revise the design and flesh out details as you're coding, but you need to think about how to implement all those fancy features in the model. A great site I recommend for this "making the dream reality" is <a href="http://www-cs-students.stanford.edu/~amitp/gameprog.html" target="_blank">Amit's Game Programming</a>. What helped me the most was his excellent article on <a href="http://www-cs-students.stanford.edu/~amitp/game-programming/grids/">Grids</a>. This gave me the algorithms I needed to implement a hexagonal grid, transform world to screen coordinates, and view adjacent tiles. I spent a day stuck until I realized that the origin of an HTML screen is top left, not bottom left as in Cartesian coordinates. Sometimes, we can all be dumb.<br />
</p><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://www-cs-students.stanford.edu/~amitp/game-programming/grids/hex-grid-metrics-labeled.png" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="346" width="373" src="http://www-cs-students.stanford.edu/~amitp/game-programming/grids/hex-grid-metrics-labeled.png" /></a></div><br />
<p>With a browser you'll need to decide on HTML divs, canvas, or flash. I tossed flash right out because I don't want proprietary stuff and it's a dead end on iPad. In comparison, getting the game to run on the iPad for me was a simple one-line inclusing of the <a href="http://code.google.com/p/jquery-ui-for-ipad-and-iphone/" target="_blank">iPad Plugin</a><br />
for jQuery. Canvas is trouble on IE but otherwise lets you do almost anything you can imagine: I started with that but quickly shifted to divs when I saw an article that divs can actually be faster in some cases. There are some graphics-intensive games that can only be done in canvas, but for the tile-based game I was writing, divs work just fine and have probably a 10x or more developer productivity. With divs, you have the browser doing most of the work for you with events, lots of library support for moving things around, images, overlapping, dialogs. Canvas is more like writing a dos game back in the 1980s, you have to do everything yourself from scratch. In my case, divs were the better option.<br />
</p><br />
<p>It still gave me trouble with sound - I used <a href="http://www.jplayer.org/" target="_blank">JPlayer</a> which is great on FF and Chrome but problematic on other browsers - I'm not sure what the best technology would be for a very sound-intensive game. The site <a href="media.io">MediaIo</a> worked great for audio conversion, though.<br />
</p><br />
<p>By the way, if you're going with a browser-based approach I really recommend looking at the javascript on <a href="http://wordsquared.com">wordsquared</a>, not only did they do an excellent job but the code is clear and well-written and you can see how they do lots of the "tricks" that make the site look and work good.<br />
</p><br />
<p>Enough talk, now for the meat (or soy, for you vegans, or Gagh, for you Klingons). Here's a chunk of actual jQuery javascript code from the Board class which takes a set of hexagonal tiles, and gives them a fractal elevation attribute. I use this elevation attribute later to assign territory classes: less than 0 for water, +5 for mountains, grasslands in between, etc. The background images for different territory I assign later, I converted them to pngs from svg using <a href="http://www.fileformat.info/convert/image/svg2raster.htm">fileformat.info</a>. <br />
<br />
<pre style="font-size:10px;">self.fractalNoiseTiles = function() {
// fractal noise for tile elevations, to simulate terrain
var persistence = 0.5;
var octaves = 4;
for (var octave = 1; octave < (octaves + 1); octave++) {
// add and smooth for each octave
self.smoothElevation(self.tiles);
self.fractalAddElevation(self.tiles, persistence, octave);
}
};
self.smoothElevation = function(tiles) {
// apply smoothing function to elevations
$.each(tiles, function(coordStr, tile) {
var newElevation = tile.attr('elevation') / 2.0;
var adjacentCoords = self.hexAdjacentCoords(tile.attr('coords'));
// find adjacent 6 coordinates
$.each(adjacentCoords, function(i, coordStr) {
if (coordStr in tiles) {
// if it doesn't exist, assume it's sea, so elevation 0
newElevation += tiles[coordStr].attr('elevation') / 6.0;
}
});
tile.attr('elevation', newElevation);
// set current value to smoothed adjacent values
});
};
self.fractalAddElevation = function(tiles, persistance, octave) {
var scaleFactor = Math.pow(persistance, octave);
$.each(tiles, function(coordStr, tile) {
var amplitude = scaleFactor * (-1 + 2 * Math.random());
// -1 to 1, continuous
var newElevation = tile.attr('elevation')*1 + amplitude;
tile.attr('elevation', newElevation);
});
}
</pre>
</p><p>If you're doing a single player game, as mine is, you'll need an AI. Even with a multiplayer game, you'll want to have a basic pluggable AI just for testing purposes. Now as the IBM Watson match demonstrated, AI can be a fascinating field, but also very complex and difficult. Stick to obvious things you know to avoid getting lost in cascading levels of complexity and end up in Limbo. My first AI for the game, for instance, was very simple - simply move to a random legal position. Then I enhanced this by moving to the coast first, to try and block off enemy entry points, falling back on random moving if this was impossible. Finally I made a more sophisticated AI that took into account not just good moves offensively but also how to defensively block the enemy. Still the AI is not very good, particularly in its ignorance of disclosures. But here's the code for two movers, the Coastal and Random AI movers, in my javascript Player class, to give you a feel for what an HTML5 div-based AI looks like, and how easy it can be.
</p><pre style="font-size:10px;">self.coastalMover = function() {
// prefer inner coast
var numMoves = $('.valid-move')
.not('.perimeter-tile')
.filter(self.board.isAdjacentTo('sea-tile')).length;
if (numMoves > 0) {
var randomMove = Math.floor(Math.random()*numMoves);
$('.valid-move')
.not('.perimeter-tile')
.filter(self.board.isAdjacentTo('sea-tile'))
.eq(randomMove)
.each(self.board.makeMove(self.turnComplete));
return;
}
// prefer perimeter
var numMoves = $('.valid-move.perimeter-tile').length;
if (numMoves > 0) {
var randomMove = Math.floor(Math.random()*numMoves);
$('.valid-move.perimeter-tile')
.eq(randomMove)
.each(self.board.makeMove(self.turnComplete));
return;
}
// otherwise make a random move
self.randomAIMover();
}
self.randomAIMover = function() {
var numValidMoves = $('.valid-move').length;
var randomMove = Math.floor(Math.random()*numValidMoves);
$('.valid-move')
.eq(randomMove)
.each(self.board.makeMove(self.turnComplete));
}
</pre><p>Along with the frontend code you will need backend, server-side code. I've heard good things about Node.js, but what I had available was python on Google App Engine so I went with that. It was a lot harder to get the scoring and login code to work than I expected, but that is due more to my ignorance than any real difficulty. Once I got the hang of it, it was a breeze, especially compared to the heavy-handed lugging I'm used to with jboss and war files. Here's an extract of my backend python code for the Scores class, which grabs the top 10 scores out of the datastore and, if the user is logged in, adds on the top score and ranking for that individual user.
</p><pre style="font-size:10px">def get(self):
scores = db.GqlQuery( \
"select * from Score order by score desc limit 10")
# fetch top scores
scores_list = [];
i = 1;
for score in scores:
username = 'anonymous'
if (score.user is not None):
username = score.user.nickname()
single_score = {
'user': username,
'score': self.format_score(score.score),
'date': score.date.isoformat(' '),
'place': self.format_ordinal(i)
}
scores_list.append(single_score)
i += 1
# append your top score if logged in
user = users.get_current_user()
if user:
score = db.GqlQuery( \
"select * from Score where user=:1 order by score desc limit 1", \
user).get()
# fetch your top score
if score is not None:
scores_list.append( \
{'user':'','score':'','date':'','place':''})
# spacer row
place = db.GqlQuery( \
"select __key__ from Score where score > :1",\
score.score).count() + 1
# fetch your top score
single_score = {
'user': 'Personal Best',
'score': self.format_score(score.score),
'date': score.date.isoformat(' '),
'place': self.format_ordinal(place)
}
scores_list.append(single_score)
self.response.out.write(simplejson.dumps(scores_list))
</pre><p>With your basically playable game up and running - not ready for prime time, not even close - but playable, you're reading for serious testing.
</p><h3>Week Three: Playtesting</h3><em>“We're all given some sort of skill in life. Mine just happens to be beating up on people.” - Sugar Ray Leonard</em>
<p>Okay it's really just "testing", or if you're fancy and want a higher salary, "Quality Assurance", but to make it more exciting we'll call it "playtesting". That is, you get to play your game.
</p><div class="separator" style="clear: both; text-align: center;"><a href="http://transmaterialasia.files.wordpress.com/2006/11/pruitt-igoe-demolition-color1.jpg" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="190" width="300" src="http://transmaterialasia.files.wordpress.com/2006/11/pruitt-igoe-demolition-color1.jpg" /></a></div><p>Now playtesting isn't just about seeing if it works. Sure, you need to find the bugs and fix them, you need to do that no doubt. But what it's even more about is discovery. The best QA people you'll find don't just run the test scripts and report the results. Instead, they prod around the limits of the application until it breaks, and try to figure out what this implies about what else might be broken, and why. Where the user experience is lousy, where I'm doing something with menus when I just want to click, and vitally, whether or not it's fun. Most importantly, discover how this implies the game could be improved to be more playable. This is what you need to do while playing the game.
</p><p>When the tile game was up and running I started clicking away, saw and fixed some obvious bugs, and found out that the game wasn't very fun. It was just too easy to beat the AI. I added a few tweaks for more aggressive playing, but still it got to be boring fast, and took too long to play after it was obvious I would win (or lose). So I added an Autoplay button to speed up play when it looked like you were going to be win - but be careful! - one time I caught myself with hubris and ended up losing after pressing Autoplay. Still, just territory wasn't enough since it didn't give the kind of strategic advantage I was looking for if you capture narrow mountain passes, block off a plain, things like that. So I added the concept of "enclosures", that is, completely surrounding an area with your pieces and barrier land, which then automatically become filled with your own pieces. Not only did this add the strategic aspect, but it also sped up the game. Now, it did end up being the most difficult, time consuming, and bug filled piece of code I had to write for the whole game, but doing the enclosure calculation was not only intellectually satisfying, it ended up substantially improving playability.
</p><p>There was a lot of iteration during this week of playtesting. I don't want to say agile, because I've heard that term ad naseum until it now has the same amount of meaning as "achieving key objectives with maximum leverage", but there was a lot of tweaks to the graphics, the runtime, the sounds, the rules. As the saying goes, 1% inspiration, 99% perspiration.
</p><h3>Week Four: Refinement</h3><em>“My ambition is not to be just a good fighter. I want to be great, something special.” - Sugar Ray Leonard</em>
<p>Lots of good game ideas die a premature death because their authors give up towards the final stretch. That's what week four is all about - refinement. Getting your game over the hump from something that resembles a college compsci doodle, to a professional looking game. Now we're not aiming for Bungee-level storyline, thematic videos, and a model-staffed convention booth.
</p><div class="separator" style="clear: both; text-align: center;"><a href="http://bantrel.com/markets/popups/images/Refining.png" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="300" width="300" src="http://bantrel.com/markets/popups/images/Refining.png" /></a></div><p>Refinement is what separates the men from the boys, the women from the girls, the mature transsexuals from boytoys. It's the roadwork that you have to put into the game if you want it to succeed. It's not fun, but it must be done. Do you have a splash screen? A score screen? Rules? An About page? How fast does it scroll? Does it flicker? Are the graphics lined up perfectly, or are there gaps? Does it work on different targeted platforms: iPad, Firefox, Chrome, Internet Explorer?
</p><p>I had to work on all sorts of little details for the game in the refinement stage, most taking longer than I thought. I had developed in Chrome, which is the fastest browser with the best support for most HTML5 features, and it worked fine. Then I tried Firefox and it didn't work at all. I was using the jQuery rotate plugin, which doesn't work on Firefox, so I had to rethink the way I was handling tiles. Then I tried to view it on iPad, which looked horrible, so I spent a day redesigning my top-level div structure until it displayed properly. Multiple sounds had trouble on the iPad so I had to disable most sounds there, touch didn't work so I had to get the jQuery iPad plugin working properly. I didn't have a splash screen, so I had to write one, which was harder than I thought given that it has to appear before the game, after a level is completed, after the game is won or lost, with the rules, all at the right time in the right order. Scoring was even more difficult since it involved using some google APIs I wasn't familiar with, required me to upload a datastore index file, heck I even had to write a custom thousands-separator method for python since one isn't built in.
</p><p>I thought about efficiency. There's a lot you can do to make a game load and run faster. I looked at the images I was using, switching to jpg instead of png for photographic-type images due to its better compression in this area. I consolidated all my javascript files into a single compact, minimized file with <a href="http://code.google.com/closure/compiler/" target="_blank">Google Closure Compiler</a>. I made my CSS smaller and faster with the online <a href="http://refresh-sf.com/yui/" target="_blank">YUICompressor</a> and then switched my index.html to use the optimized versions. Some things I was using custom images for I found I could instead use existing CSS effects and jQuery functions. I was able to load public domain images off of existing sites like <a href="http://picasaweb.google.com/" target="_blank">PicasaWeb</a> instead of overloading my app engine. Smaller tile sizes took up less image space and ended up being easier to play with as well.
</p><p>The refinement step is the biggest pain, the most frustrating, the least fun, but also the most necessary. Your users will appreciate it, and so will you.
</p><h3>Week Five: Release</h3><em>“A fighter never knows when it's the last bell. He doesn't want to face that.” - Sugar Ray Leonard</em>
<p>You're game is done. Thirty days have passed - or in my case, thirty-three, because I couldn't get scores working. My fault for "leaving it to later". Next time, I'm including all elements of my game before starting playtesting, no matter how "easy" it's going to be so I don't shoot myself in the foot. But finally it was done, uploaded to google appengine, and ready to go. The game had been "knocked out", and like a boxer I was rather exhausted at the end of it all. Quite a relief.
</p><div class="separator" style="clear: both; text-align: center;"><a href="http://www.blackpast.org/files/blackpast_images/robinson_sugar_ray.jpg" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="259" width="300" src="http://www.blackpast.org/files/blackpast_images/robinson_sugar_ray.jpg" /></a></div><p>But even being done isn't being done. I got a suggestion to add the rules section so it was clear how to play the game. I showed it to a few close confidants to get feedback. I submitted it to an HTML5 game site for inclusion on their index. And I wrote this blog post. Marketing is the fifth week in a nutshell. Let the games begin.
</p><p><b>UPDATE:</b> by popular demand, here is a link to the playable game: <a href="http://towers-of-wolin.appspot.com">Towers of Wolin Playable Game</a>. And here is a link to the complete sources: <a href="https://docs.google.com/leaf?id=0Byb_6xqE_1ZxNmViODQ1M2ItZGQxOC00NjlhLWI1NDYtYzg3NDQyNmMyYjkx&hl=en">Towers of Wolin Source Files</a>.
</p>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com356tag:blogger.com,1999:blog-1539695110385892935.post-56847022043406381312010-11-16T11:34:00.000-08:002010-11-16T11:34:58.006-08:00GFS the Google File System in 199 Lines of PythonGFS, the Google File System, sits as the backbone of the entire Google infrastructure. However, for many it is a mystery, especially for those lucky enough to be more acquainted with high-level python code than low-level C operating system sources. But have no fear, we shall break through the veil and describe an implementation of GFS in 199 lines of python. Naturally, you may want to read about the theory and design of GFS in the original <a href="http://research.google.com/archive/gfs-sosp2003.pdf">Google Research GFS paper</a>. But we will aim to give the core concepts, with real working python code, in this article. Complete runnable python 2.6 source code, if you wish it, is available at the end of this post.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZ3D1ZrW3PbG0_LxBIeeTxt92SCiN-9HvnJycS3wAa0Cb7bHm0PpdTDi_3qV3gdOr0XK45BQqh0oHcFLali_2gl0kapSMpyooW8WCaq-aE8lD-d0FFUhGLL7syhmWIXg4BU9gA5njM6pxi/s1600/MP010117-Twits-MPFC-90.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZ3D1ZrW3PbG0_LxBIeeTxt92SCiN-9HvnJycS3wAa0Cb7bHm0PpdTDi_3qV3gdOr0XK45BQqh0oHcFLali_2gl0kapSMpyooW8WCaq-aE8lD-d0FFUhGLL7syhmWIXg4BU9gA5njM6pxi/s320/MP010117-Twits-MPFC-90.jpg" width="320" /></a></div><br />
A brief summary of GFS is as follows. GFS consists of three components: a client, a master, and one or more chunkservers. The client is the only user-visible, that is programmer-accessible, part of the system. It functions similarly to a standard POSIX file library. The master is a single server that holds all metadata for the filesystem. By metadata we mean the information about each file, its constituent components called chunks, and the location of these chunks on various chunkservers. The chunkservers are where the actual data is stored, and the vast majority of network traffic takes place between the client and the chunkservers, to avoid the master as a bottleneck. We will give more detailed descriptions below by going through the GFS client, master, and chunkserver as implemented in python classes, and close with a test script and its output.<br />
<br />
The client class is the only user-visible portion of the GFS library. It mediates all requests between the client for filesystem access and the master and chunkservers for data storage and retrieval. It is important to note that GFS appears very familiar to programmers of normal filesystems, there is no distributed knowledge required. All of this is abstracted away behind the client implementation. Of course there are some exceptions to this such as the localized chunk knowledge used to allocate processing of files most efficiently, such as in the map reduce algorithm, but we have avoided such complexity in this implementation. What is most critical is to note how the normal read, write, append, exist, and delete calls are available in their common forms, and how these are implemented by the client class; we also simplify open, close and create by subsuming them under the previous methods. The gist of each method is the same: ask the master for the metadata including chunk IDs and chunk locations on the chunkservers, then update any necessary metadata with the master, and finally transaction actual data flow only with the chunkservers.<br />
<br />
<pre><font face="Lucida,Courier New"><font color="#C00000">class</font> <font color="#000000">GFSClient</font><font color="#0000C0">:</font>
<font color="#C00000">def</font> <font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">master</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font> <font color="#0000C0">=</font> <font color="#000000">master</font>
<font color="#C00000">def</font> <font color="#000000">write</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># filename is full namespace path</font>
<font color="#C00000">if</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">exists</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># if already exists, overwrite</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">delete</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font>
<font color="#000000">num_chunks</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">num_chunks</font><font color="#0000C0">(</font><font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">)</font>
<font color="#000000">chunkuuids</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">alloc</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">,</font> <font color="#000000">num_chunks</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">write_chunks</font><font color="#0000C0">(</font><font color="#000000">chunkuuids</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">write_chunks</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">chunkuuids</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">chunks</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font> <font color="#000000">data</font><font color="#0000C0">[</font><font color="#000000">x</font><font color="#0000C0">:</font><font color="#000000">x</font><font color="#0000C0">+</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">chunksize</font><font color="#0000C0">]</font> \
<font color="#C00000">for</font> <font color="#000000">x</font> <font color="#C00000">in</font> <font color="#000000">range</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">,</font> <font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">chunksize</font><font color="#0000C0">)</font> <font color="#0000C0">]</font>
<font color="#000000">chunkservers</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">get_chunkservers</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">for</font> <font color="#000000">i</font> <font color="#C00000">in</font> <font color="#000000">range</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">,</font> <font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">chunkuuids</font><font color="#0000C0">)</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># write to each chunkserver</font>
<font color="#000000">chunkuuid</font> <font color="#0000C0">=</font> <font color="#000000">chunkuuids</font><font color="#0000C0">[</font><font color="#000000">i</font><font color="#0000C0">]</font>
<font color="#000000">chunkloc</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">get_chunkloc</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font>
<font color="#000000">chunkservers</font><font color="#0000C0">[</font><font color="#000000">chunkloc</font><font color="#0000C0">]</font><font color="#0000C0">.</font><font color="#000000">write</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">,</font> <font color="#000000">chunks</font><font color="#0000C0">[</font><font color="#000000">i</font><font color="#0000C0">]</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">num_chunks</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">size</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#0000C0">(</font><font color="#000000">size</font> <font color="#0000C0">//</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">chunksize</font><font color="#0000C0">)</font> \
<font color="#0000C0">+</font> <font color="#0000C0">(</font><font color="#0080C0">1</font> <font color="#C00000">if</font> <font color="#000000">size</font> <font color="#0000C0">%</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">chunksize</font> <font color="#0000C0">></font> <font color="#0080C0">0</font> <font color="#C00000">else</font> <font color="#0080C0">0</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">write_append</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">if</font> <font color="#C00000">not</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">exists</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">raise</font> <font color="#000000">Exception</font><font color="#0000C0">(</font><font color="#004080">"append error, file does not exist: "</font> \
<font color="#0000C0">+</font> <font color="#000000">filename</font><font color="#0000C0">)</font>
<font color="#000000">num_append_chunks</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">num_chunks</font><font color="#0000C0">(</font><font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">)</font>
<font color="#000000">append_chunkuuids</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">alloc_append</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">,</font> \
<font color="#000000">num_append_chunks</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">write_chunks</font><font color="#0000C0">(</font><font color="#000000">append_chunkuuids</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">exists</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">exists</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">read</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># get metadata, then read chunks direct</font>
<font color="#C00000">if</font> <font color="#C00000">not</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">exists</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">raise</font> <font color="#000000">Exception</font><font color="#0000C0">(</font><font color="#004080">"read error, file does not exist: "</font> \
<font color="#0000C0">+</font> <font color="#000000">filename</font><font color="#0000C0">)</font>
<font color="#000000">chunks</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font><font color="#0000C0">]</font>
<font color="#000000">chunkuuids</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">get_chunkuuids</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font>
<font color="#000000">chunkservers</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">get_chunkservers</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">for</font> <font color="#000000">chunkuuid</font> <font color="#C00000">in</font> <font color="#000000">chunkuuids</font><font color="#0000C0">:</font>
<font color="#000000">chunkloc</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">get_chunkloc</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font>
<font color="#000000">chunk</font> <font color="#0000C0">=</font> <font color="#000000">chunkservers</font><font color="#0000C0">[</font><font color="#000000">chunkloc</font><font color="#0000C0">]</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font>
<font color="#000000">chunks</font><font color="#0000C0">.</font><font color="#000000">append</font><font color="#0000C0">(</font><font color="#000000">chunk</font><font color="#0000C0">)</font>
<font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">reduce</font><font color="#0000C0">(</font><font color="#C00000">lambda</font> <font color="#000000">x</font><font color="#0000C0">,</font> <font color="#000000">y</font><font color="#0000C0">:</font> <font color="#000000">x</font> <font color="#0000C0">+</font> <font color="#000000">y</font><font color="#0000C0">,</font> <font color="#000000">chunks</font><font color="#0000C0">)</font> <font color="#008000"># reassemble in order</font>
<font color="#C00000">return</font> <font color="#000000">data</font>
<font color="#C00000">def</font> <font color="#000000">delete</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">delete</font><font color="#0000C0">(</font><font color="#000000">filename</font><font color="#0000C0">)</font><font color="#000000"></font></font></pre><br />
The master class simulates a GFS master server. This is where all the metadata is stored, the core node of the entire system. Client requests initiate with the master, then after metadata is retrieved, they talk directly to the individual chunkservers. This avoids the master being a bottleneck as the metadata is typically short and low latency. The metadata is implemented as a series of dictionaries, although in a real system you'd have filesystem backing of the dicts. The notification of chunkservers becoming available and unavailable via heartbeats, chunkserver authentication and localization info for efficient storage are all simplified here so that the master itself is allocating chunkservers. However we still preserve the direct client read/write to the chunkservers, bypassing the master, to show how the distributed system is working.<br />
<br />
<pre><font face="Lucida,Courier New"><font color="#C00000">class</font> <font color="#000000">GFSMaster</font><font color="#0000C0">:</font>
<font color="#C00000">def</font> <font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">num_chunkservers</font> <font color="#0000C0">=</font> <font color="#0080C0">5</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">max_chunkservers</font> <font color="#0000C0">=</font> <font color="#0080C0">10</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">max_chunksperfile</font> <font color="#0000C0">=</font> <font color="#0080C0">100</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunksize</font> <font color="#0000C0">=</font> <font color="#0080C0">10</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkrobin</font> <font color="#0000C0">=</font> <font color="#0080C0">0</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font> <font color="#0000C0">=</font> <font color="#0000C0">{</font><font color="#0000C0">}</font> <font color="#008000"># file to chunk mapping</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunktable</font> <font color="#0000C0">=</font> <font color="#0000C0">{</font><font color="#0000C0">}</font> <font color="#008000"># chunkuuid to chunkloc mapping</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkservers</font> <font color="#0000C0">=</font> <font color="#0000C0">{</font><font color="#0000C0">}</font> <font color="#008000"># loc id to chunkserver mapping</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">init_chunkservers</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">init_chunkservers</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">for</font> <font color="#000000">i</font> <font color="#C00000">in</font> <font color="#000000">range</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">,</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">num_chunkservers</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">chunkserver</font> <font color="#0000C0">=</font> <font color="#000000">GFSChunkserver</font><font color="#0000C0">(</font><font color="#000000">i</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkservers</font><font color="#0000C0">[</font><font color="#000000">i</font><font color="#0000C0">]</font> <font color="#0000C0">=</font> <font color="#000000">chunkserver</font>
<font color="#C00000">def</font> <font color="#000000">get_chunkservers</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkservers</font>
<font color="#C00000">def</font> <font color="#000000">alloc</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">,</font> <font color="#000000">num_chunks</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># return ordered chunkuuid list</font>
<font color="#000000">chunkuuids</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">alloc_chunks</font><font color="#0000C0">(</font><font color="#000000">num_chunks</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font><font color="#0000C0">[</font><font color="#000000">filename</font><font color="#0000C0">]</font> <font color="#0000C0">=</font> <font color="#000000">chunkuuids</font>
<font color="#C00000">return</font> <font color="#000000">chunkuuids</font>
<font color="#C00000">def</font> <font color="#000000">alloc_chunks</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">num_chunks</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">chunkuuids</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font><font color="#0000C0">]</font>
<font color="#C00000">for</font> <font color="#000000">i</font> <font color="#C00000">in</font> <font color="#000000">range</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">,</font> <font color="#000000">num_chunks</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">chunkuuid</font> <font color="#0000C0">=</font> <font color="#000000">uuid</font><font color="#0000C0">.</font><font color="#000000">uuid1</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">chunkloc</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkrobin</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunktable</font><font color="#0000C0">[</font><font color="#000000">chunkuuid</font><font color="#0000C0">]</font> <font color="#0000C0">=</font> <font color="#000000">chunkloc</font>
<font color="#000000">chunkuuids</font><font color="#0000C0">.</font><font color="#000000">append</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkrobin</font> <font color="#0000C0">=</font> <font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkrobin</font> <font color="#0000C0">+</font> <font color="#0080C0">1</font><font color="#0000C0">)</font> <font color="#0000C0">%</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">num_chunkservers</font>
<font color="#C00000">return</font> <font color="#000000">chunkuuids</font>
<font color="#C00000">def</font> <font color="#000000">alloc_append</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">,</font> <font color="#000000">num_append_chunks</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># append chunks</font>
<font color="#000000">chunkuuids</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font><font color="#0000C0">[</font><font color="#000000">filename</font><font color="#0000C0">]</font>
<font color="#000000">append_chunkuuids</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">alloc_chunks</font><font color="#0000C0">(</font><font color="#000000">num_append_chunks</font><font color="#0000C0">)</font>
<font color="#000000">chunkuuids</font><font color="#0000C0">.</font><font color="#000000">extend</font><font color="#0000C0">(</font><font color="#000000">append_chunkuuids</font><font color="#0000C0">)</font>
<font color="#C00000">return</font> <font color="#000000">append_chunkuuids</font>
<font color="#C00000">def</font> <font color="#000000">get_chunkloc</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">chunkuuid</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunktable</font><font color="#0000C0">[</font><font color="#000000">chunkuuid</font><font color="#0000C0">]</font>
<font color="#C00000">def</font> <font color="#000000">get_chunkuuids</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font><font color="#0000C0">[</font><font color="#000000">filename</font><font color="#0000C0">]</font>
<font color="#C00000">def</font> <font color="#000000">exists</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">True</font> <font color="#C00000">if</font> <font color="#000000">filename</font> <font color="#C00000">in</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font> <font color="#C00000">else</font> <font color="#000000">False</font>
<font color="#C00000">def</font> <font color="#000000">delete</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">filename</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># rename for later garbage collection</font>
<font color="#000000">chunkuuids</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font><font color="#0000C0">[</font><font color="#000000">filename</font><font color="#0000C0">]</font>
<font color="#C00000">del</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font><font color="#0000C0">[</font><font color="#000000">filename</font><font color="#0000C0">]</font>
<font color="#000000">timestamp</font> <font color="#0000C0">=</font> <font color="#000000">repr</font><font color="#0000C0">(</font><font color="#000000">time</font><font color="#0000C0">.</font><font color="#000000">time</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">)</font>
<font color="#000000">deleted_filename</font> <font color="#0000C0">=</font> <font color="#004080">"/hidden/deleted/"</font> <font color="#0000C0">+</font> <font color="#000000">timestamp</font> <font color="#0000C0">+</font> <font color="#000000">filename</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font><font color="#0000C0">[</font><font color="#000000">deleted_filename</font><font color="#0000C0">]</font> <font color="#0000C0">=</font> <font color="#000000">chunkuuids</font>
<font color="#C00000">print</font> <font color="#004080">"deleted file: "</font> <font color="#0000C0">+</font> <font color="#000000">filename</font> <font color="#0000C0">+</font> <font color="#004080">" renamed to "</font> <font color="#0000C0">+</font> \
<font color="#000000">deleted_filename</font> <font color="#0000C0">+</font> <font color="#004080">" ready for gc"</font>
<font color="#C00000">def</font> <font color="#000000">dump_metadata</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">print</font> <font color="#004080">"Filetable:"</font><font color="#0000C0">,</font>
<font color="#C00000">for</font> <font color="#000000">filename</font><font color="#0000C0">,</font> <font color="#000000">chunkuuids</font> <font color="#C00000">in</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">filetable</font><font color="#0000C0">.</font><font color="#000000">items</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">print</font> <font color="#000000">filename</font><font color="#0000C0">,</font> <font color="#004080">"with"</font><font color="#0000C0">,</font> <font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">chunkuuids</font><font color="#0000C0">)</font><font color="#0000C0">,</font><font color="#004080">"chunks"</font>
<font color="#C00000">print</font> <font color="#004080">"Chunkservers: "</font><font color="#0000C0">,</font> <font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkservers</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#004080">"Chunkserver Data:"</font>
<font color="#C00000">for</font> <font color="#000000">chunkuuid</font><font color="#0000C0">,</font> <font color="#000000">chunkloc</font> <font color="#C00000">in</font> <font color="#000000">sorted</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunktable</font><font color="#0000C0">.</font><font color="#000000">iteritems</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#000000">key</font><font color="#0000C0">=</font><font color="#000000">operator</font><font color="#0000C0">.</font><font color="#000000">itemgetter</font><font color="#0000C0">(</font><font color="#0080C0">1</font><font color="#0000C0">)</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">chunk</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkservers</font><font color="#0000C0">[</font><font color="#000000">chunkloc</font><font color="#0000C0">]</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#000000">chunkloc</font><font color="#0000C0">,</font> <font color="#000000">chunkuuid</font><font color="#0000C0">,</font> <font color="#000000">chunk</font><font color="#000000"></font></font></pre><br />
The chunkserver class is the smallest in this project. This represents an actual distinct box running in a massive datacenter, connected to a network reachable by the master and client. In GFS, the chunkservers are relatively "dumb" in that they know only about chunks, that is, the file data broken up into pieces. They don't see the whole picture of the entire file, where it is across the whole filesystem, the associated metadata, etc. We implement this class as a simple local storage, which you can examine after running the test code by looking at the directory path "/tmp/gfs/chunks". In a real system you'd want persistent storage of the chunk info for backup.<br />
<br />
<pre><font face="Lucida,Courier New"><font color="#C00000">class</font> <font color="#000000">GFSChunkserver</font><font color="#0000C0">:</font>
<font color="#C00000">def</font> <font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">chunkloc</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunkloc</font> <font color="#0000C0">=</font> <font color="#000000">chunkloc</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunktable</font> <font color="#0000C0">=</font> <font color="#0000C0">{</font><font color="#0000C0">}</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">local_filesystem_root</font> <font color="#0000C0">=</font> <font color="#004080">"/tmp/gfs/chunks/"</font> <font color="#0000C0">+</font> <font color="#000000">repr</font><font color="#0000C0">(</font><font color="#000000">chunkloc</font><font color="#0000C0">)</font>
<font color="#C00000">if</font> <font color="#C00000">not</font> <font color="#000000">os</font><font color="#0000C0">.</font><font color="#000000">access</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">local_filesystem_root</font><font color="#0000C0">,</font> <font color="#000000">os</font><font color="#0000C0">.</font><font color="#000000">W_OK</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">os</font><font color="#0000C0">.</font><font color="#000000">makedirs</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">local_filesystem_root</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">write</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">chunkuuid</font><font color="#0000C0">,</font> <font color="#000000">chunk</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">local_filename</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunk_filename</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font>
<font color="#C00000">with</font> <font color="#000000">open</font><font color="#0000C0">(</font><font color="#000000">local_filename</font><font color="#0000C0">,</font> <font color="#004080">"w"</font><font color="#0000C0">)</font> <font color="#C00000">as</font> <font color="#000000">f</font><font color="#0000C0">:</font>
<font color="#000000">f</font><font color="#0000C0">.</font><font color="#000000">write</font><font color="#0000C0">(</font><font color="#000000">chunk</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunktable</font><font color="#0000C0">[</font><font color="#000000">chunkuuid</font><font color="#0000C0">]</font> <font color="#0000C0">=</font> <font color="#000000">local_filename</font>
<font color="#C00000">def</font> <font color="#000000">read</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">chunkuuid</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">None</font>
<font color="#000000">local_filename</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">chunk_filename</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font>
<font color="#C00000">with</font> <font color="#000000">open</font><font color="#0000C0">(</font><font color="#000000">local_filename</font><font color="#0000C0">,</font> <font color="#004080">"r"</font><font color="#0000C0">)</font> <font color="#C00000">as</font> <font color="#000000">f</font><font color="#0000C0">:</font>
<font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">f</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">return</font> <font color="#000000">data</font>
<font color="#C00000">def</font> <font color="#000000">chunk_filename</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">chunkuuid</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">local_filename</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">local_filesystem_root</font> <font color="#0000C0">+</font> <font color="#004080">"/"</font> \
<font color="#0000C0">+</font> <font color="#000000">str</font><font color="#0000C0">(</font><font color="#000000">chunkuuid</font><font color="#0000C0">)</font> <font color="#0000C0">+</font> <font color="#004080">'.gfs'</font>
<font color="#C00000">return</font> <font color="#000000">local_filename</font><font color="#000000"></font></font></pre><br />
We use main() as a test for all the client methods, including exceptions. We first create a master and client object, then write a file. This write is performed by the client in the same way as the real GFS: first it gets the chunk metadata from the master, then writes chunks directly to each chunkserver. Append functions similarly. Delete is handled in the GFS fashion, where it renames the file to a hidden namespace and leaves it for later garbage collection. A dump displays metadata content. Note that this is a single-threaded test, as this demonstration program does not support concurrency, although that could be added with appropriate locks around the metadata.<br />
<br />
<pre><font face="Lucida,Courier New"><font color="#C00000">def</font> <font color="#000000">main</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#008000"># test script for filesystem</font>
<font color="#008000"># setup</font>
<font color="#000000">master</font> <font color="#0000C0">=</font> <font color="#000000">GFSMaster</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">client</font> <font color="#0000C0">=</font> <font color="#000000">GFSClient</font><font color="#0000C0">(</font><font color="#000000">master</font><font color="#0000C0">)</font>
<font color="#008000"># test write, exist, read</font>
<font color="#C00000">print</font> <font color="#004080">"\nWriting..."</font>
<font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">write</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">,</font> <font color="#004080">"""
This file tells you all about python that you ever wanted to know.
Not every README is as informative as this one, but we aim to please.
Never yet has there been so much information in so little space.
"""</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#004080">"File exists? "</font><font color="#0000C0">,</font> <font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">exists</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">)</font>
<font color="#008000"># test append, read after append</font>
<font color="#C00000">print</font> <font color="#004080">"\nAppending..."</font>
<font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">write_append</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">,</font> \
<font color="#004080">"I'm a little sentence that just snuck in at the end.\n"</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">)</font>
<font color="#008000"># test delete</font>
<font color="#C00000">print</font> <font color="#004080">"\nDeleting..."</font>
<font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">delete</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#004080">"File exists? "</font><font color="#0000C0">,</font> <font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">exists</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">)</font>
<font color="#008000"># test exceptions</font>
<font color="#C00000">print</font> <font color="#004080">"\nTesting Exceptions..."</font>
<font color="#C00000">try</font><font color="#0000C0">:</font>
<font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">)</font>
<font color="#C00000">except</font> <font color="#000000">Exception</font> <font color="#C00000">as</font> <font color="#000000">e</font><font color="#0000C0">:</font>
<font color="#C00000">print</font> <font color="#004080">"This exception should be thrown:"</font><font color="#0000C0">,</font> <font color="#000000">e</font>
<font color="#C00000">try</font><font color="#0000C0">:</font>
<font color="#000000">client</font><font color="#0000C0">.</font><font color="#000000">write_append</font><font color="#0000C0">(</font><font color="#004080">"/usr/python/readme.txt"</font><font color="#0000C0">,</font> <font color="#004080">"foo"</font><font color="#0000C0">)</font>
<font color="#C00000">except</font> <font color="#000000">Exception</font> <font color="#C00000">as</font> <font color="#000000">e</font><font color="#0000C0">:</font>
<font color="#C00000">print</font> <font color="#004080">"This exception should be thrown:"</font><font color="#0000C0">,</font> <font color="#000000">e</font>
<font color="#008000"># show structure of the filesystem</font>
<font color="#C00000">print</font> <font color="#004080">"\nMetadata Dump..."</font>
<font color="#C00000">print</font> <font color="#000000">master</font><font color="#0000C0">.</font><font color="#000000">dump_metadata</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#000000"></font></font></pre><br />
And putting it all together, here is the output of the test script run from the python interpreter. Pay special attention to the master metadata dump at the end, where you can see how the chunks are spread across chunkservers in jumbled order, only to be reassembled by the client in the order specified by the master metadata.<br />
<br />
<pre>$ python gfs.py
Writing...
File exists? True
This file tells you all about python that you ever wanted to know.
Not every README is as informative as this one, but we aim to please.
Never yet has there been so much information in so little space.
Appending...
This file tells you all about python that you ever wanted to know.
Not every README is as informative as this one, but we aim to please.
Never yet has there been so much information in so little space.
I'm a little sentence that just snuck in at the end.
Deleting...
deleted file: /usr/python/readme.txt renamed to /hidden/deleted/1289928955.7363091/usr/python/readme.txt ready for gc
File exists? False
Testing Exceptions...
This exception should be thrown: read error, file does not exist: /usr/python/readme.txt
This exception should be thrown: append error, file does not exist: /usr/python/readme.txt
Metadata Dump...
Filetable: /hidden/deleted/1289928955.7363091/usr/python/readme.txt with 30 chunks
Chunkservers: 5
Chunkserver Data:
0 f76734ce-f1a7-11df-b529-001d09d5b664 mation in
0 f7671750-f1a7-11df-b529-001d09d5b664 you ever
0 f7670bd4-f1a7-11df-b529-001d09d5b664
T
0 f767b656-f1a7-11df-b529-001d09d5b664 le sentenc
0 f7672182-f1a7-11df-b529-001d09d5b664 is as inf
0 f7672b0a-f1a7-11df-b529-001d09d5b664 se.
1 f767b85e-f1a7-11df-b529-001d09d5b664 e that jus
1 f76736b8-f1a7-11df-b529-001d09d5b664 so little
1 f767193a-f1a7-11df-b529-001d09d5b664 wanted to
1 f7670f3a-f1a7-11df-b529-001d09d5b664 his file t
1 f767236c-f1a7-11df-b529-001d09d5b664 ormative a
1 f7672cf4-f1a7-11df-b529-001d09d5b664 Never ye
2 f7671b2e-f1a7-11df-b529-001d09d5b664 know.
2 f7671142-f1a7-11df-b529-001d09d5b664 ells you a
2 f767ba48-f1a7-11df-b529-001d09d5b664 t snuck in
2 f7672556-f1a7-11df-b529-001d09d5b664 s this one
2 f76738e8-f1a7-11df-b529-001d09d5b664 space.
2 f7672ee8-f1a7-11df-b529-001d09d5b664 t has ther
3 f767bcb4-f1a7-11df-b529-001d09d5b664 at the en
3 f7671d40-f1a7-11df-b529-001d09d5b664 Not ev
3 f76730d2-f1a7-11df-b529-001d09d5b664 e been so
3 f7673adc-f1a7-11df-b529-001d09d5b664
3 f767135e-f1a7-11df-b529-001d09d5b664 ll about p
3 f7672740-f1a7-11df-b529-001d09d5b664 , but we a
4 f7672920-f1a7-11df-b529-001d09d5b664 im to plea
4 f767b3f4-f1a7-11df-b529-001d09d5b664 I'm a litt
4 f767bea8-f1a7-11df-b529-001d09d5b664 d.
4 f7671552-f1a7-11df-b529-001d09d5b664 ython that
4 f76732e4-f1a7-11df-b529-001d09d5b664 much infor
4 f7671f8e-f1a7-11df-b529-001d09d5b664 ery README
</pre><br />
Now of course we are lacking some of the complexities of GFS necessary for a fully functional system: metadata locking, chunk leases, replication, master failover, localization of data, chunkserver heartbeats, deleted file garbage collection. But what we have here demonstrates the gist of GFS and will help give you a better understanding of the basics. It can also be a starting point for your own explorations into more detailed distributed filesystem code in python.<br />
<br />
Full source code can be downloaded here: <a href="https://docs.google.com/leaf?id=0Byb_6xqE_1ZxNjA3MDM4N2QtYjhjMS00ZjdiLWI0ZDctMmRkN2M1NWEzOTdh&hl=en">gfs.py</a>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com19tag:blogger.com,1999:blog-1539695110385892935.post-33984184203877083362010-10-06T00:12:00.000-07:002010-11-16T09:43:54.615-08:00Google's MapReduce in 98 Lines of Python<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn6UpE6nuwWc9ikerIlSRtVFiRxSeGpcJFLdzKXf_WCxKB1PHFu1EgaTXxzP7O-271639hw7zZywVWRfEG055mzX3FYbcLtVNaZNt29oHrHwankDn9cOoMcfr4DxU6jzZxaaoiAwMtZXOH/s1600/monty-python-holy-grail.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn6UpE6nuwWc9ikerIlSRtVFiRxSeGpcJFLdzKXf_WCxKB1PHFu1EgaTXxzP7O-271639hw7zZywVWRfEG055mzX3FYbcLtVNaZNt29oHrHwankDn9cOoMcfr4DxU6jzZxaaoiAwMtZXOH/s320/monty-python-holy-grail.jpg" width="320" /></a></div>MapReduce is the magic sauce that makes Google run. Not just search but a large part of their infrastructure is programmed in this paradigm. If you want to see how this can be implemented in python, read on.<br />
<br />
Lately I've been not only learning more python but also learning about the MapReduce algorithm. Naturally I started with the many freely available web resources, from <a href="http://holumbus.fh-wedel.de/trac/wiki/MapReduceExamples">brief overviews</a> to instructive <a href="http://www.youtube.com/watch?v=yjPBkvYh-ss">video tutorials</a> to detailed <a href="http://research.google.com/people/sanjay/index.html">Google Research articles</a> on MapReduce.<br />
<br />
This is all well and fine but I wanted to know how to actually write MapReduce applications in python, and to obtain a better understanding of the magic behind the algorithm itself. I found a small python multi-server implementation of MapReduce named <a href="http://remembersaurus.com/mincemeatpy/mincemeat.py">mincemeat</a>, another called <a href="http://code.google.com/p/octopy/">octopy</a>, as well as interfaces to large non-python systems such as <a href="http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python">Hadoop</a>. However I was still unable to find a quick and dirty implementation of MapReduce that was high-level, concise, easy to run, easy to understand, and relevant. So I wrote one.<br />
<br />
To fulfill my requirement of high-level, I wanted to actually use python-native lists, tuples, queues, and threading. I didn't want to get stuck with low-level socket communication and endless writing and parsing of intermediate flat files. To fulfill my requirement of concise, I wanted the whole MapReduce algorithm to be implemented in a single python class of 100 lines of code or less, and we're talking properly well-formatted python code, not a series of Perl one-liners. To fulfill my requirement that it be easy to run, I should be able to put the whole thing in one python file and call it from the command line with a python interpreter. It should only use the python 2.6+ standard library so there is nothing you should have to install if you already have python. Nor should you need to setup a multi-server environment, multiple servers on a network, you should be able to run it on a single machine while still observing multiple simultaneous map and reduce work tasks. To fulfill my requirement that it be easy to understand, it should accurately and clearly implement the major steps of MapReduce, with comments where necessary. One should be able to insert print statements of the intermediate steps at any point in the computation and see exactly what is going on. To fulfill my requirement of relevant code, not just an academic exercise, I wanted actual python source code examples running the MapReduce class on real data.<br />
<br />
That being said, it is important to know what this implementation of MapReduce in python is not. It is not web-scale without significant changes - in 100 lines of code there is only so much one can do. It does not implement some significant reliability provisions Google uses such as distributed multiple-copy data and monitored task restarts. It does not handle mapper-reducer pairing and partitioning for grouped calculations. It ignores the important principle of locality to dispatch particular workers to machines where applicable data already resides. It bypasses the key efficiency step of combiners to aid in the speed of reducing tasks. It loads the data directly from the internet or local disk and stores intermediate results in memory instead of using a distributed filesystem like GFS or HDFS. It doesn't support differing numbers of parse, map, merge, sort, reduce, and output functions with varying resource allocations to each. And last but not least, it bases its multiple worker implementation on python threading with the Thread and synchronized Queue coordinating classes. This means that while in theory there is nothing to prevent an interface-identical implementation of these python standard classes which are distributed across thousands of machines, as written the code runs all the workers only on a single machine.<br />
<br />
But enough talk. It's time to show the goods. Let's see some code.<br />
<pre style="font-size: small"><font face="Lucida,Courier New"><font color="#C00000">import</font> <font color="#000000">threading</font>
<font color="#C00000">import</font> <font color="#000000">Queue</font>
<font color="#C00000">import</font> <font color="#000000">operator</font>
<font color="#C00000">import</font> <font color="#000000">urllib</font>
<font color="#C00000">import</font> <font color="#000000">re</font>
<font color="#C00000">class</font> <font color="#000000">MapReduce</font><font color="#0000C0">:</font>
<font color="#004080">''' MapReduce - to use, subclass by defining these functions,
then call self.map_reduce():
parse_fn(self, k, v) => [(k, v), ...]
map_fn(self, k, v) => [(k, v1), (k, v2), ...]
reduce_fn(self, k, [v1, v2, ...]) => [(k, v)]
output_fn(self, [(k, v), ...])
'''</font>
<font color="#C00000">def</font> <font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">None</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">num_worker_threads</font> <font color="#0000C0">=</font> <font color="#0080C0">5</font>
<font color="#C00000">class</font> <font color="#000000">SynchronizedDict</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># we need this for merging</font>
<font color="#C00000">def</font> <font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">lock</font> <font color="#0000C0">=</font> <font color="#000000">threading</font><font color="#0000C0">.</font><font color="#000000">Lock</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">d</font> <font color="#0000C0">=</font> <font color="#0000C0">{</font><font color="#0000C0">}</font>
<font color="#C00000">def</font> <font color="#000000">isin</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">k</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">with</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">lock</font><font color="#0000C0">:</font>
<font color="#C00000">if</font> <font color="#000000">k</font> <font color="#C00000">in</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">d</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">True</font>
<font color="#C00000">else</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">False</font>
<font color="#C00000">def</font> <font color="#000000">get</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">k</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">with</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">lock</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">d</font><font color="#0000C0">[</font><font color="#000000">k</font><font color="#0000C0">]</font>
<font color="#C00000">def</font> <font color="#000000">set</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># we don't need del</font>
<font color="#C00000">with</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">lock</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">d</font><font color="#0000C0">[</font><font color="#000000">k</font><font color="#0000C0">]</font> <font color="#0000C0">=</font> <font color="#000000">v</font>
<font color="#C00000">def</font> <font color="#000000">set_append</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># for thread-safe list append</font>
<font color="#C00000">with</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">lock</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">d</font><font color="#0000C0">[</font><font color="#000000">k</font><font color="#0000C0">]</font><font color="#0000C0">.</font><font color="#000000">append</font><font color="#0000C0">(</font><font color="#000000">v</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">items</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">with</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">lock</font><font color="#0000C0">:</font>
<font color="#C00000">return</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">d</font><font color="#0000C0">.</font><font color="#000000">items</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">create_queue</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">input_list</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># helper fn for queues</font>
<font color="#000000">output_queue</font> <font color="#0000C0">=</font> <font color="#000000">Queue</font><font color="#0000C0">.</font><font color="#000000">Queue</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">for</font> <font color="#000000">value</font> <font color="#C00000">in</font> <font color="#000000">input_list</font><font color="#0000C0">:</font>
<font color="#000000">output_queue</font><font color="#0000C0">.</font><font color="#000000">put</font><font color="#0000C0">(</font><font color="#000000">value</font><font color="#0000C0">)</font>
<font color="#C00000">return</font> <font color="#000000">output_queue</font>
<font color="#C00000">def</font> <font color="#000000">create_list</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">input_queue</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># helper fn for queues</font>
<font color="#000000">output_list</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font><font color="#0000C0">]</font>
<font color="#C00000">while</font> <font color="#C00000">not</font> <font color="#000000">input_queue</font><font color="#0000C0">.</font><font color="#000000">empty</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">item</font> <font color="#0000C0">=</font> <font color="#000000">input_queue</font><font color="#0000C0">.</font><font color="#000000">get</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">output_list</font><font color="#0000C0">.</font><font color="#000000">append</font><font color="#0000C0">(</font><font color="#000000">item</font><font color="#0000C0">)</font>
<font color="#000000">input_queue</font><font color="#0000C0">.</font><font color="#000000">task_done</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">return</font> <font color="#000000">output_list</font>
<font color="#C00000">def</font> <font color="#000000">merge_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">,</font> <font color="#000000">merge_dict</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># helper fn for merge</font>
<font color="#C00000">if</font> <font color="#000000">merge_dict</font><font color="#0000C0">.</font><font color="#000000">isin</font><font color="#0000C0">(</font><font color="#000000">k</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">merge_dict</font><font color="#0000C0">.</font><font color="#000000">set_append</font><font color="#0000C0">(</font><font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">)</font>
<font color="#C00000">else</font><font color="#0000C0">:</font>
<font color="#000000">merge_dict</font><font color="#0000C0">.</font><font color="#000000">set</font><font color="#0000C0">(</font><font color="#000000">k</font><font color="#0000C0">,</font> <font color="#0000C0">[</font><font color="#000000">v</font><font color="#0000C0">]</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">process_queue</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">input_queue</font><font color="#0000C0">,</font> <font color="#000000">fn_selector</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># helper fn</font>
<font color="#000000">output_queue</font> <font color="#0000C0">=</font> <font color="#000000">Queue</font><font color="#0000C0">.</font><font color="#000000">Queue</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">if</font> <font color="#000000">fn_selector</font> <font color="#0000C0">==</font> <font color="#004080">'merge'</font><font color="#0000C0">:</font>
<font color="#000000">merge_dict</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">SynchronizedDict</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">worker</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">while</font> <font color="#C00000">not</font> <font color="#000000">input_queue</font><font color="#0000C0">.</font><font color="#000000">empty</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#0000C0">(</font><font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">)</font> <font color="#0000C0">=</font> <font color="#000000">input_queue</font><font color="#0000C0">.</font><font color="#000000">get</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">if</font> <font color="#000000">fn_selector</font> <font color="#C00000">in</font> <font color="#0000C0">[</font><font color="#004080">'map'</font><font color="#0000C0">,</font> <font color="#004080">'reduce'</font><font color="#0000C0">]</font><font color="#0000C0">:</font>
<font color="#C00000">if</font> <font color="#000000">fn_selector</font> <font color="#0000C0">==</font> <font color="#004080">'map'</font><font color="#0000C0">:</font>
<font color="#000000">result_list</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">map_fn</font><font color="#0000C0">(</font><font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">)</font>
<font color="#C00000">elif</font> <font color="#000000">fn_selector</font> <font color="#0000C0">==</font> <font color="#004080">'reduce'</font><font color="#0000C0">:</font>
<font color="#000000">result_list</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">reduce_fn</font><font color="#0000C0">(</font><font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">)</font>
<font color="#C00000">for</font> <font color="#000000">result_tuple</font> <font color="#C00000">in</font> <font color="#000000">result_list</font><font color="#0000C0">:</font> <font color="#008000"># flatten</font>
<font color="#000000">output_queue</font><font color="#0000C0">.</font><font color="#000000">put</font><font color="#0000C0">(</font><font color="#000000">result_tuple</font><font color="#0000C0">)</font>
<font color="#C00000">elif</font> <font color="#000000">fn_selector</font> <font color="#0000C0">==</font> <font color="#004080">'merge'</font><font color="#0000C0">:</font> <font color="#008000"># merge v to same k</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">merge_fn</font><font color="#0000C0">(</font><font color="#000000">k</font><font color="#0000C0">,</font> <font color="#000000">v</font><font color="#0000C0">,</font> <font color="#000000">merge_dict</font><font color="#0000C0">)</font>
<font color="#C00000">else</font><font color="#0000C0">:</font>
<font color="#C00000">raise</font> <font color="#000000">Exception</font><font color="#0000C0">,</font> <font color="#004080">"Bad fn_selector="</font><font color="#0000C0">+</font><font color="#000000">fn_selector</font>
<font color="#000000">input_queue</font><font color="#0000C0">.</font><font color="#000000">task_done</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">for</font> <font color="#000000">i</font> <font color="#C00000">in</font> <font color="#000000">range</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">num_worker_threads</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># start threads</font>
<font color="#000000">worker_thread</font> <font color="#0000C0">=</font> <font color="#000000">threading</font><font color="#0000C0">.</font><font color="#000000">Thread</font><font color="#0000C0">(</font><font color="#000000">target</font><font color="#0000C0">=</font><font color="#000000">worker</font><font color="#0000C0">)</font>
<font color="#000000">worker_thread</font><font color="#0000C0">.</font><font color="#000000">daemon</font> <font color="#0000C0">=</font> <font color="#000000">True</font>
<font color="#000000">worker_thread</font><font color="#0000C0">.</font><font color="#000000">start</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">input_queue</font><font color="#0000C0">.</font><font color="#000000">join</font><font color="#0000C0">(</font><font color="#0000C0">)</font> <font color="#008000"># wait for worker threads to finish</font>
<font color="#C00000">if</font> <font color="#000000">fn_selector</font> <font color="#0000C0">==</font> <font color="#004080">'merge'</font><font color="#0000C0">:</font>
<font color="#000000">output_list</font> <font color="#0000C0">=</font> <font color="#000000">sorted</font><font color="#0000C0">(</font><font color="#000000">merge_dict</font><font color="#0000C0">.</font><font color="#000000">items</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#000000">key</font><font color="#0000C0">=</font><font color="#000000">operator</font><font color="#0000C0">.</font><font color="#000000">itemgetter</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">)</font><font color="#0000C0">)</font>
<font color="#000000">output_queue</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">create_queue</font><font color="#0000C0">(</font><font color="#000000">output_list</font><font color="#0000C0">)</font>
<font color="#C00000">return</font> <font color="#000000">output_queue</font>
<font color="#C00000">def</font> <font color="#000000">map_reduce</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># the actual map-reduce algoritm</font>
<font color="#000000">data_list</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">parse_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">data</font><font color="#0000C0">)</font>
<font color="#000000">data_queue</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">create_queue</font><font color="#0000C0">(</font><font color="#000000">data_list</font><font color="#0000C0">)</font> <font color="#008000"># enqueue the data so we can multi-process</font>
<font color="#000000">map_queue</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">process_queue</font><font color="#0000C0">(</font><font color="#000000">data_queue</font><font color="#0000C0">,</font> <font color="#004080">'map'</font><font color="#0000C0">)</font> <font color="#008000"># [(k,v),...] => [(k,v1),(k,v2),...]</font>
<font color="#000000">merge_queue</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">process_queue</font><font color="#0000C0">(</font><font color="#000000">map_queue</font><font color="#0000C0">,</font> <font color="#004080">'merge'</font><font color="#0000C0">)</font> <font color="#008000"># [(k,v1),(k,v2),...] => [(k,[v1,v2,...]),...]</font>
<font color="#000000">reduce_queue</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">process_queue</font><font color="#0000C0">(</font><font color="#000000">merge_queue</font><font color="#0000C0">,</font> <font color="#004080">'reduce'</font><font color="#0000C0">)</font> <font color="#008000"># [(k,[v1,v2,...]),...] => [(k,v),...]</font>
<font color="#000000">output_list</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">create_list</font><font color="#0000C0">(</font><font color="#000000">reduce_queue</font><font color="#0000C0">)</font> <font color="#008000"># deque into list for output handling</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">output_fn</font><font color="#0000C0">(</font><font color="#000000">output_list</font><font color="#0000C0">)</font>
</pre><br />
Well, there you have it. Google's MapReduce, with all my caveats of course, in less than 100 lines of python code, 98 lines to be precise which includes imports and spacing lines. Of course just the class itself isn't very useful without some examples to see how it works, so I've included a WordCount example below:<br />
<br />
<pre style="font-size: small"><font color="#C00000">class</font> <font color="#000000">WordCount</font><font color="#0000C0">(</font><font color="#000000">MapReduce</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">def</font> <font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">MapReduce</font><font color="#0000C0">.</font><font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">min_count</font> <font color="#0000C0">=</font> <font color="#0080C0">1</font>
<font color="#C00000">def</font> <font color="#000000">parse_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># break string into [(k, v), ...] tuples for each line</font>
<font color="#000000">data_list</font> <font color="#0000C0">=</font> <font color="#000000">map</font><font color="#0000C0">(</font><font color="#C00000">lambda</font> <font color="#000000">line</font><font color="#0000C0">:</font> <font color="#0000C0">(</font><font color="#000000">None</font><font color="#0000C0">,</font> <font color="#000000">line</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">splitlines</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">)</font><font color="#0000C0"></font>
<font color="#C00000">return</font> <font color="#000000">data_list</font>
<font color="#C00000">def</font> <font color="#000000">map_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">key</font><font color="#0000C0">,</font> <font color="#000000">str</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># return (word, 1) tuples for each word, ignore key</font>
<font color="#000000">word_list</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font><font color="#0000C0">]</font>
<font color="#C00000">for</font> <font color="#000000">word</font> <font color="#C00000">in</font> <font color="#000000">re</font><font color="#0000C0">.</font><font color="#000000">split</font><font color="#0000C0">(</font><font color="#004080">r'\W+'</font><font color="#0000C0">,</font> <font color="#000000">str</font><font color="#0000C0">.</font><font color="#000000">lower</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">bare_word</font> <font color="#0000C0">=</font> <font color="#000000">re</font><font color="#0000C0">.</font><font color="#000000">sub</font><font color="#0000C0">(</font><font color="#004080">r"[^A-Za-z0-9]*"</font><font color="#0000C0">,</font> <font color="#004080">r""</font><font color="#0000C0">,</font> <font color="#000000">word</font><font color="#0000C0">)</font><font color="#0000C0">;</font>
<font color="#C00000">if</font> <font color="#000000">len</font><font color="#0000C0">(</font><font color="#000000">bare_word</font><font color="#0000C0">)</font> <font color="#0000C0">></font> <font color="#0080C0">0</font><font color="#0000C0">:</font>
<font color="#000000">word_list</font><font color="#0000C0">.</font><font color="#000000">append</font><font color="#0000C0">(</font><font color="#0000C0">(</font><font color="#000000">bare_word</font><font color="#0000C0">,</font> <font color="#0080C0">1</font><font color="#0000C0">)</font><font color="#0000C0">)</font>
<font color="#C00000">return</font> <font color="#000000">word_list</font>
<font color="#C00000">def</font> <font color="#000000">reduce_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">word</font><font color="#0000C0">,</font> <font color="#000000">count_list</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># just sum the counts</font>
<font color="#C00000">return</font> <font color="#0000C0">[</font><font color="#0000C0">(</font><font color="#000000">word</font><font color="#0000C0">,</font> <font color="#000000">sum</font><font color="#0000C0">(</font><font color="#000000">count_list</font><font color="#0000C0">)</font><font color="#0000C0">)</font><font color="#0000C0">]</font>
<font color="#C00000">def</font> <font color="#000000">output_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">output_list</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># just print the resulting list</font>
<font color="#C00000">print</font> <font color="#004080">"Word"</font><font color="#0000C0">.</font><font color="#000000">ljust</font><font color="#0000C0">(</font><font color="#0080C0">15</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#004080">"Count"</font><font color="#0000C0">.</font><font color="#000000">rjust</font><font color="#0000C0">(</font><font color="#0080C0">5</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#004080">"______________"</font><font color="#0000C0">.</font><font color="#000000">ljust</font><font color="#0000C0">(</font><font color="#0080C0">15</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#004080">"_____"</font><font color="#0000C0">.</font><font color="#000000">rjust</font><font color="#0000C0">(</font><font color="#0080C0">5</font><font color="#0000C0">)</font>
<font color="#000000">sorted_list</font> <font color="#0000C0">=</font> <font color="#000000">sorted</font><font color="#0000C0">(</font><font color="#000000">output_list</font><font color="#0000C0">,</font> <font color="#000000">key</font><font color="#0000C0">=</font><font color="#000000">operator</font><font color="#0000C0">.</font><font color="#000000">itemgetter</font><font color="#0000C0">(</font><font color="#0080C0">1</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#000000">reverse</font><font color="#0000C0">=</font><font color="#000000">True</font><font color="#0000C0">)</font>
<font color="#C00000">for</font> <font color="#0000C0">(</font><font color="#000000">word</font><font color="#0000C0">,</font> <font color="#000000">count</font><font color="#0000C0">)</font> <font color="#C00000">in</font> <font color="#000000">sorted_list</font><font color="#0000C0">:</font>
<font color="#C00000">if</font> <font color="#000000">count</font> <font color="#0000C0">></font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">min_count</font><font color="#0000C0">:</font>
<font color="#C00000">print</font> <font color="#000000">word</font><font color="#0000C0">.</font><font color="#000000">ljust</font><font color="#0000C0">(</font><font color="#0080C0">15</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#000000">repr</font><font color="#0000C0">(</font><font color="#000000">count</font><font color="#0000C0">)</font><font color="#0000C0">.</font><font color="#000000">rjust</font><font color="#0000C0">(</font><font color="#0080C0">5</font><font color="#0000C0">)</font>
<font color="#C00000">print</font>
<font color="#C00000">def</font> <font color="#000000">test_with_monty</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#004080">"""The Meaning of Life is:
try and be nice to people,
avoid eating fat,
read a good book every now and then,
get some walking in,
and try and live together in peace and harmony
with people of all creeds and nations."""</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">map_reduce</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">def</font> <font color="#000000">test_with_nietzsche</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">min_count</font> <font color="#0000C0">=</font> <font color="#0080C0">700</font>
<font color="#000000">f</font> <font color="#0000C0">=</font> <font color="#000000">urllib</font><font color="#0000C0">.</font><font color="#000000">urlopen</font><font color="#0000C0">(</font><font color="#004080">"http://www.gutenberg.org/cache/epub/7205/pg7205.txt"</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">f</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">f</font><font color="#0000C0">.</font><font color="#000000">close</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">map_reduce</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
</pre><br />
Another one which will help you understand better how to use map keys throughout the computation is DistributedGrep, which really has a different feel from WordCount, which I've included below:<br />
<br />
<pre style="font-size: small"><font color="#C00000">class</font> <font color="#000000">DistributedGrep</font><font color="#0000C0">(</font><font color="#000000">MapReduce</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">def</font> <font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">MapReduce</font><font color="#0000C0">.</font><font color="#000000">__init__</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">matcher</font> <font color="#0000C0">=</font> <font color="#000000">None</font>
<font color="#C00000">def</font> <font color="#000000">parse_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">data</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># one list item per line with line number</font>
<font color="#000000">data_list</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font><font color="#0000C0">]</font>
<font color="#000000">line_num</font> <font color="#0000C0">=</font> <font color="#0080C0">1</font>
<font color="#C00000">for</font> <font color="#000000">line</font> <font color="#C00000">in</font> <font color="#000000">data</font><font color="#0000C0">.</font><font color="#000000">splitlines</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">data_list</font><font color="#0000C0">.</font><font color="#000000">append</font><font color="#0000C0">(</font><font color="#0000C0">(</font><font color="#000000">line_num</font><font color="#0000C0">,</font> <font color="#000000">line</font><font color="#0000C0">)</font><font color="#0000C0">)</font>
<font color="#000000">line_num</font> <font color="#0000C0">=</font> <font color="#000000">line_num</font> <font color="#0000C0">+</font> <font color="#0080C0">1</font>
<font color="#C00000">return</font> <font color="#000000">data_list</font>
<font color="#C00000">def</font> <font color="#000000">map_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">line_num</font><font color="#0000C0">,</font> <font color="#000000">line</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># return line if matches, include line num</font>
<font color="#000000">matcher</font> <font color="#0000C0">=</font> <font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">matcher</font>
<font color="#000000">matched_line</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font><font color="#0000C0">]</font>
<font color="#C00000">if</font> <font color="#000000">matcher</font><font color="#0000C0">.</font><font color="#000000">match</font><font color="#0000C0">(</font><font color="#000000">line</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">matched_line</font> <font color="#0000C0">=</font> <font color="#0000C0">[</font><font color="#0000C0">(</font><font color="#000000">line_num</font><font color="#0000C0">,</font> <font color="#000000">line</font><font color="#0000C0">)</font><font color="#0000C0">]</font>
<font color="#C00000">return</font> <font color="#000000">matched_line</font>
<font color="#C00000">def</font> <font color="#000000">reduce_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">line_num</font><font color="#0000C0">,</font> <font color="#000000">line_list</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># identity reducer</font>
<font color="#C00000">return</font> <font color="#0000C0">[</font><font color="#0000C0">(</font><font color="#000000">line_num</font><font color="#0000C0">,</font> <font color="#000000">line_list</font><font color="#0000C0">[</font><font color="#0080C0">0</font><font color="#0000C0">]</font><font color="#0000C0">)</font><font color="#0000C0">]</font> <font color="#008000"># we only ever have one line in the list</font>
<font color="#C00000">def</font> <font color="#000000">output_fn</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">,</font> <font color="#000000">output_list</font><font color="#0000C0">)</font><font color="#0000C0">:</font> <font color="#008000"># just print the resulting list</font>
<font color="#C00000">print</font> <font color="#004080">"LineNum"</font><font color="#0000C0">.</font><font color="#000000">rjust</font><font color="#0000C0">(</font><font color="#0080C0">8</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#004080">"Line"</font><font color="#0000C0">.</font><font color="#000000">ljust</font><font color="#0000C0">(</font><font color="#0080C0">70</font><font color="#0000C0">)</font>
<font color="#C00000">print</font> <font color="#004080">"_______"</font><font color="#0000C0">.</font><font color="#000000">rjust</font><font color="#0000C0">(</font><font color="#0080C0">8</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#004080">"____"</font>
<font color="#C00000">for</font> <font color="#0000C0">(</font><font color="#000000">line_num</font><font color="#0000C0">,</font> <font color="#000000">line</font><font color="#0000C0">)</font> <font color="#C00000">in</font> <font color="#000000">sorted</font><font color="#0000C0">(</font><font color="#000000">output_list</font><font color="#0000C0">,</font> <font color="#000000">key</font><font color="#0000C0">=</font><font color="#000000">operator</font><font color="#0000C0">.</font><font color="#000000">itemgetter</font><font color="#0000C0">(</font><font color="#0080C0">0</font><font color="#0000C0">)</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#C00000">print</font> <font color="#000000">repr</font><font color="#0000C0">(</font><font color="#000000">line_num</font><font color="#0000C0">)</font><font color="#0000C0">.</font><font color="#000000">rjust</font><font color="#0000C0">(</font><font color="#0080C0">8</font><font color="#0000C0">)</font><font color="#0000C0">,</font> <font color="#000000">line</font><font color="#0000C0">.</font><font color="#000000">ljust</font><font color="#0000C0">(</font><font color="#0080C0">70</font><font color="#0000C0">)</font>
<font color="#C00000">print</font>
<font color="#C00000">def</font> <font color="#000000">test_with_nietzsche</font><font color="#0000C0">(</font><font color="#000000">self</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">matcher</font> <font color="#0000C0">=</font> <font color="#000000">re</font><font color="#0000C0">.</font><font color="#000000">compile</font><font color="#0000C0">(</font><font color="#004080">r".*Jahre.*"</font><font color="#0000C0">)</font>
<font color="#000000">f</font> <font color="#0000C0">=</font> <font color="#000000">urllib</font><font color="#0000C0">.</font><font color="#000000">urlopen</font><font color="#0000C0">(</font><font color="#004080">"http://www.gutenberg.org/cache/epub/7205/pg7205.txt"</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">data</font> <font color="#0000C0">=</font> <font color="#000000">f</font><font color="#0000C0">.</font><font color="#000000">read</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">f</font><font color="#0000C0">.</font><font color="#000000">close</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">self</font><font color="#0000C0">.</font><font color="#000000">map_reduce</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
</pre><br />
Of course none of this is useful if you can't actually run the code, so here's a main function below with the two examples classes run in test execution:<br />
<br />
<pre style="font-size: small"><font color="#C00000">def</font> <font color="#000000">main</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#0000C0">:</font>
<font color="#000000">wc</font> <font color="#0000C0">=</font> <font color="#000000">WordCount</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">wc</font><font color="#0000C0">.</font><font color="#000000">test_with_monty</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">wc</font><font color="#0000C0">.</font><font color="#000000">test_with_nietzsche</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">dg</font> <font color="#0000C0">=</font> <font color="#000000">DistributedGrep</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#000000">dg</font><font color="#0000C0">.</font><font color="#000000">test_with_nietzsche</font><font color="#0000C0">(</font><font color="#0000C0">)</font>
<font color="#C00000">if</font> <font color="#000000">__name__</font> <font color="#0000C0">==</font> <font color="#004080">"__main__"</font><font color="#0000C0">:</font>
<font color="#000000">main</font><font color="#0000C0">(</font><font color="#0000C0">)</font><font color="#000000"></font></font>
</pre><br />
You can paste all the following code snippets above into one file and run that in any python 2.6+ interpreter, and it should output results something like the following:<br />
<br />
<pre style="font-size: small; color: black">$ python map_reduce.py
Word Count
______________ _____
and 6
in 2
of 2
people 2
try 2
Word Count
______________ _____
und 3992
der 2022
ich 1714
die 1459
ist 1179
das 1103
nicht 985
zu 947
es 872
aber 857
du 856
er 854
sie 786
ihr 769
den 751
ein 746
LineNum Line
_______ ____
168 Zehn Jahre kamst du hier herauf zu meiner Höhle: du würdest deines
209 Nicht fremd ist mir dieser Wanderer: vor manchen Jahre gieng er her
3198 Also vergiengen dem Einsamen Monde und Jahre; seine Weisheit aber
4585 und unverändert durch die Jahre.
9285 von grossem Jahre: das muss sich, einer Sanduhr gleich, immer wieder
9288 - so dass alle diese Jahre sich selber gleich sind, im Grössten und
9289 auch im Kleinsten, - so dass wir selber in jedem grossen Jahre uns
9816 - Und wieder liefen Monde und Jahre über Zarathustra's Seele, und er
9931 tausend Jahren - -
10801 Meine Liebe diente ihm lange Jahre, mein Wille gierig allem seinen
</pre><br />
To better understand the algorithm, I'd recommend going through just the first test case for WordCount which is small enough to see exactly what is happening at every step but real enough to see a genuine useful calculation in progress. You can insert print statements for the various queues, lists and tuples at each step to see exactly what is going on. In fact although I spent days studying MapReduce as an abstract algorithm and looking at code implementations, I didn't really understand it until I did this exercise of stepping through the code, which in my case was also debugging it.<br />
<br />
Then once you've done that with the first simple test case you can try the larger WordCount corpus which is a full-size work by Nietzsche in the original German. Since the data is too large to print out all at once, I recommend only printing the first 10 items or so for each step of the process so you can see what's going on with a bigger example. Then after that you can try DistributedGrep which is an entirely different algorithm, with both a different feel and implementation, so you can move beyond the introductory word counting and see how other types of processes can be implemented in MapReduce as well.<br />
<br />
So that's it folks, I hope you enjoy it. Improvements, additional requests, criticisms, and flames are all equally welcome. I'm especially interested in finding some more python example algorithms that can be implemented in MapReduce in a page or two, particularly ones outside what you might normally see in the standard corpus. If there's enough interest I'll do some additional examples myself and follow up in a subsequent post.<br />
<br />
PS Thanks to MoinMoin for the <a href="http://code.activestate.com/recipes/52298/">html color markup</a> program for python.<br />
<br />
Full source code can be downloaded here: <a href="https://docs.google.com/leaf?id=0Byb_6xqE_1ZxODVkMGIyNTMtYzNiMy00YTQzLThlZDYtODBmM2M5YTU3YWZj&hl=en">map_reduce.py</a>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com211tag:blogger.com,1999:blog-1539695110385892935.post-22692012346372370462010-10-01T00:58:00.000-07:002010-10-01T02:28:35.567-07:00No, I'm Not Interested In Your Startup<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimHgIdMBQgd7JVeot_AMSwF4tGaXvsGoBV_o-kZK_hHFfoBWpcEgVRVjODxbox9qLwlArEwbZbDYiZI_tE5mp15_aEa4bqGNLgVvpaGpSCeZpGwAhyphenhyphenY9VaGV5EZ3fgs4Bc6c_RJETKd9FU/s1600/Quaritch_Starbucks.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><img border="0" height="217" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcLVC38YoY-yHwbtB_gDvSUaSLXYnZp7aTkmiesnCBthk6bBj3hioZTEYbdsciY2xabinEZoi939EIEePzHXn7qi1ykrT1RM3D9Vrkg_FRkzRrYjGaJ7OsQuFZRTbsN-e0aLxnJPWj27OM/s320/Quaritch_Starbucks.jpg" style="margin-left: auto; margin-right: auto;" width="320" /></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Colonel Quaritch vs. Your App</td></tr>
</tbody></table><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcLVC38YoY-yHwbtB_gDvSUaSLXYnZp7aTkmiesnCBthk6bBj3hioZTEYbdsciY2xabinEZoi939EIEePzHXn7qi1ykrT1RM3D9Vrkg_FRkzRrYjGaJ7OsQuFZRTbsN-e0aLxnJPWj27OM/s1600/Quaritch_Starbucks.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a></div>You've got a great new app that uses the iPhone 4 hi-res camera to check the state of my cooking pasta and tell me exactly how much longer to cook it for optimum flavor. Your new Web 2.0 collaborative messaging app will enable an unprecedented level of office productivity in a cloud-hosted distributed web-scale architecture. You are thrilled to tell me about the next leap forward in social dynamics and how your app is an enabler for the next level of financial integration with social media which will future-proof antiquated institutions from disruptive technologies. Guess what? I don't give a fuck!<br />
<br />
We used to send men to the moon. We once figured out how to split the atom and release vast quantities of energy in the process. We invented a novel device called the transistor. We discovered penicillin. We developed vaccines that have saved the lives of about half of all children born in the last century. We've mapped the human genome. We gave everyone in democratic countries the right to vote regardless of skin color and ended apartheid. The Soviet Union is no more.<br />
<br />
But you want me to get excited about your little piece-of-shit app like it's the next greatest thing since the Magna Carta. Well sorry to bust your bubble ego boy, but it's not. In fact, more likely than not its useless and annoying. Even if you're successful, what have you done? Created the next facebook? Another colossal fucking waste of human energy whose main purpose seems to be to get people to spend their hard earned money on virtual farm buildings?<br />
<br />
If you wanna be excited about your work, fine. If you wanna get rich from some dumbass VCs, herd-following customers or credit-card funded users, go ahead. Nothing wrong with that. But don't pretend it's anything more than this. Don't pretend that your shitty app is going to change the world, because it's not. You already are going after the money and fame, don't also take the dignity away from people making an actual difference in the world. Leave them with that.<br />
<br />
<iframe src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=1153596539&fc1=000000&IS2=1<1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="align:right;padding-top:5px;width:131px;height:245px;padding-right:10px;"align="right" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe><iframe src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=B002SNA8JU&fc1=000000&IS2=1<1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="align:right;padding-top:5px;width:131px;height:245px;padding-right:10px;"align="right" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe><iframe src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=1438524129&fc1=000000&IS2=1<1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="align:right;padding-top:5px;width:131px;height:245px;padding-right:10px;"align="right" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com24tag:blogger.com,1999:blog-1539695110385892935.post-59933708227269239232010-09-24T08:31:00.000-07:002010-09-24T08:31:14.882-07:00Startup Financing Exposed Part I: Humble Beginnings<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVG7mcUTJRMw0nGgXi0h_F8t5_-MCWowrdKkhvPqRMhwHDJHQdgN3hbaGZNwzq8mh8lBYSifWxTelUCZKiwQ2hUKlua4gxTV-xn73wEnd_aw4SYtW9PBmkmOyn6fP72etG5lEyd5qnwWBf/s1600/300_corliss_0313.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="219" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVG7mcUTJRMw0nGgXi0h_F8t5_-MCWowrdKkhvPqRMhwHDJHQdgN3hbaGZNwzq8mh8lBYSifWxTelUCZKiwQ2hUKlua4gxTV-xn73wEnd_aw4SYtW9PBmkmOyn6fP72etG5lEyd5qnwWBf/s320/300_corliss_0313.jpg" width="320" /></a></div>"Audentis Fortuna Iuvat" - Fortune favors the bold - Virgil, Aeneid X: 284<br />
<br />
Today is the first of a series of essays examining startup financing in detail, from the entrepreneur's prospective, and also from the perspective of the small investor. Maybe you're thinking of starting a company, maybe you've started one and need more money to grow, or to survive. Maybe family, a friend, or a business contact is asking you for money to fund a startup, maybe they're coming back to ask you a second time.<br />
<br />
In Part I, we are looking at the Humble Beginnings of your startup. The first question is, how much money do you need, and where is it coming from?<br />
<br />
The focus here is with the software and software services sector, because that's the only thing I know anything about. But it may be applicable outside to other fields, your mileage may vary.<br />
<br />
<b>The Entrepreneur's Side</b><br />
<br />
Before you make the plunge and quit your job, or quit looking for a job, or quit doing the company you're running now, make a plan. A business plan. If you don't know what one is, don't sweat it, that's why we have Wikipedia. It doesn't have to be longer than a page. Just think about what you're going to do for the next year, where the money is going to come from, and how you're going to spend it. Break it down, month by month, because timing of cash flow is very important. You may not want to pay yourself, but unless you're independently wealthy or have a working spouse to support you, you'll need to eat. Plan that money too. Don't bother planning more than a year or two into the future because even the first year of planning is uncertain enough already. Look at the chart from Wikipedia concerning the startup financing cycle, which we'll be constantly referring to throughout this series. For now, just plan to break-even within the first two years or so.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9YZHZsTp2wDI2jWP7NJYk0lZeoO4hcEQ6A21mB5Fso699r3spTQBvvJ5228G38YnXTdFnNzWXiLx_ZeB4342YdvB_FLIA_70JAtKjU_yCjFriHaKCdBaNhpCIqdZQRsoPqXkxUzv41-J7/s1600/startup+financing+cycle.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9YZHZsTp2wDI2jWP7NJYk0lZeoO4hcEQ6A21mB5Fso699r3spTQBvvJ5228G38YnXTdFnNzWXiLx_ZeB4342YdvB_FLIA_70JAtKjU_yCjFriHaKCdBaNhpCIqdZQRsoPqXkxUzv41-J7/s1600/startup+financing+cycle.png" /></a></div><br />
Did you do your plan? Good. Now you know how much money you need. Perhaps you are thinking to yourself, I take the estimated revenue, subtract the estimated expenses, and that's my estimated profit, so if it's negative I need some money to tide me over. Also I need working capital as a cushion depending on payment timing, and initial capital for outlays like desks and computers. That's also good. But probably you can ignore the revenue side, at least for six months, because you're not going to make any revenue right away if you're in the software business. Consulting maybe you'll do better, but you've still maybe got a few months before the payment cycle goes in your favor. You don't want your business to end before it begins, so be realistic.<br />
<br />
<b>Be Cheap: An Interlude</b><br />
<div class="separator" style="clear: both; text-align: center;"></div><br />
Also, and I can't emphasize this enough, be cheap. Until the point that you're meeting customers regularly, avoid spending unnecessary money. Even better, avoid spending money at all. Don't rent an office when you can work from your home or apartment or parent's garage or basement. Don't buy a desk when you can bring an old one from your storage room that you haven't used in years. Don't get a new laptop that you "need" for development when your current one will do just fine. Don't launch an expensive advertising campaign before you have exhausted free word-of-mouth, blogging and social network lead generation. Don't get an accountant to handle your books when you can do it yourself by hand or with Quicken. Don't pay someone to open a company for you when you can file the form yourself, in fact don't open a company at all until you are ready to take some big checks from investors.<br />
<br />
<b>Business Plan</b><br />
<br />
Here's a business plan of an actual proposed software startup company in the iPad game / social media space. I've included it here so you can see what one actually looks like. It's a small, one-man, or one-woman, operation seeking to raise only a small amount. You can use this as a template for your own company, if you'd like. For the first funding round, keep it short and to the point, with a minimum of fluff. Hit the basics, toss in some financial projections, and end with a plea for funds. We add a disclaimer on the end just for safety. Naturally, if you want to raise several hundred thousand, or even more, you're going to need to provide a lot more detail, management bios, an experienced board, a lawyer's review, and all that. But for the humble beginnings, at the early stage, consider the plan below.<br />
<br />
<b>TreasureSys Business Plan</b><br />
<br />
<i>The Problem</i><br />
<br />
Scavenger hunting is a popular recreational activity found throughout the world. It is popular in many online forums, at many colleges, and celebrated with reality shows. However it is currently difficult to setup and conduct a scavenger hunt in the connected world of the internet and social media. Old-fashion pen-and-paper is the most common approach with its associated clumsiness. There is no way to update and track initiation of the hunt, progress, and achievements in real-time with in-the-field information collection. This presents a missed opportunity and a market gap which can be exploited.<br />
<br />
<i>The Solution</i><br />
<br />
The era of the tablet computer has arrived, with the iPad and emerging Android platform, offering an unprecedented venue with which to conduct scavenger hunts. The tablet provides a method to link together in-field maps, location updates, and hunt progress with other users and their associated social media networks. Along with this capability comes the potential for revenue from downloads of the application, downloads of add-on packs, and advertising on sites tracking of hunt progress, awards and real-time updates.<br />
<br />
<i>Strategic Plan</i><br />
<br />
We aim to be the premier provider of real-world interaction social tablet games. Our primary revenue generator will at first be the TreasureHunter application, a scavenger-hunt type game for the iPad, later for the Android Tablet platform. It will be linked to friends via social media networks including Twitter and Facebook. We plan to start as a low-overhead one-person startup until after the second year, when with ramping revenue we will hire a full-time developer to pursue ancillary revenue with add-on package downloads and ad-supported web interfaces. We continue after the fourth year with an additional hire to further explore the add-on space and a social-media tie-in, and also to allow the founder to shift focus to consumer marketing and partnerships.<br />
<br />
<i>Project Timeline</i><br />
<br />
In Year 1, we begin initial development of the iPad TreasureHunter application, with initial release within 3 months and full release within 6 months. The product will be exclusively marketed through the iTunes AppStore with no-cost social media, blog, network sites and press releases. Further enhancements are made through the end of Year 1, with research on an Android tablet simultaneously conducted. In Year 2, we begin development of the Android Tablet version of TreasureHunter, with a 6 month target release, while simultaneously supporting the installed base of iPad applications. Towards the end of Year 2 we also begin investigating ancillary revenue through TreasureHunter add-ons such as downloadable hunts and merchandising. In Year 3 with our primary products gaining steam we hire a full-time developer to focus on maintaining the existing applications and exploiting the ancillary revenue space. In Year 4 we begin to transition to growth in marketing our application with bringing on another developer at the end of the year to take over primary application programming from the founder. Year 5 is dedicated to expanding the market particularly around tie-ins to social network gaming.<br />
<br />
<i>Pro-Forma Projections</i><br />
<br />
<table><tbody>
<tr> <th></th> <th>Year 1</th> <th>Year 2</th> <th>Year 3</th> <th>Year 4</th> <th>Year 5</th> </tr>
<tr> <td>AppStore Downloads</td> <td align="right">1,000</td> <td align="right">3,000</td> <td align="right">6,000</td> <td align="right">10,000</td> <td align="right">20,000</td> </tr>
<tr> <td>Android Downloads</td> <td align="right">0</td> <td align="right">1,000</td> <td align="right">4,000</td> <td align="right">10,000</td> <td align="right">20,000</td> </tr>
<tr> <td>Total Downloads</td> <td align="right">1,000</td> <td align="right">4,000</td> <td align="right">10,000</td> <td align="right">20,000</td> <td align="right">40,000</td> </tr>
<tr> <td>Avg Price</td> <td align="right">$3</td> <td align="right">$3</td> <td align="right">$4</td> <td align="right">$4</td> <td align="right">$5</td> </tr>
<tr> <td>Download Revenue</td> <td align="right">$3,000</td> <td align="right">$12,000</td> <td align="right">$40,000</td> <td align="right">$80,000</td> <td align="right">$200,000</td> </tr>
<tr> <td>Ancillary Pct</td> <td align="right">0%</td> <td align="right">10%</td> <td align="right">20%</td> <td align="right">30%</td> <td align="right">40%</td> </tr>
<tr> <td>Ancillary Revenue</td> <td align="right">$0</td> <td align="right">$1,000</td> <td align="right">$8,000</td> <td align="right">$24,000</td> <td align="right">$80,000</td> </tr>
<tr> <td><b>Total Revenue</b></td> <td align="right">$3,000</td> <td align="right">$13,000</td> <td align="right">$48,000</td> <td align="right">$104,000</td> <td align="right"><b>$280,000</b></td> </tr>
<tr> <td>Headcount</td> <td align="right">1</td> <td align="right">1</td> <td align="right">1.5</td> <td align="right">2</td> <td align="right">3</td> </tr>
<tr> <td>Expenses</td> <td align="right">$24,000</td> <td align="right">$24,000</td> <td align="right">$48,000</td> <td align="right">$72,000</td> <td align="right">$120,000</td> </tr>
<tr> <td><b>Net Profit</b></td> <td align="right">-$21,000</td> <td align="right">-$11,000</td> <td align="right">$0</td> <td align="right">$32,000</td> <td align="right">$160,000</td> </tr>
<tr> <td><b>Accumulated Profit</b></td> <td align="right">-$21,000</td> <td align="right">-$32,000</td> <td align="right">-$32,000</td> <td align="right">$0</td> <td align="right">$160,000</td> </tr>
<tr> <td><b>Market Value (4x Sales)</b></td> <td align="right">$12,000</td> <td align="right">$52,000</td> <td align="right">$192,000</td> <td align="right">$416,000</td> <td align="right"><b>$1,120,000</b></td> </tr>
</tbody> </table><br />
<i>Funding Raise</i><br />
<br />
We are seeking $50,000 to fund company operations and working capital until break-even is reached in Year 3. Our financing offer is an 8% per-annum note convertible to 25% of the common shares with payback estimated in Year 5. According to projections, market value at the end of Year 5 of the notes, when converted, would be worth $280,000 which is over a 5x, or 40% annual compounded return.<br />
<br />
<i>Disclaimer</i><br />
<br />
The information contained in this Plan is strictly confidential and is supplied on the understanding that it will be held in confidence and not disclosed to third parties without the prior written consent of TreasureSys. The Plan is to be used solely for the purposes of evaluating TreasureSys' business. Any reproduction of this Plan, in whole or in part, without the prior written consent of TreasureSys is prohibited.<br />
<br />
In addition, this Business Plan contains certain "forward‐looking statements" that may be identified by the use of forward‐looking terminology such as "believes," "expects," "may," "will," "should," "seeks," "approximately," "intends," "plans," "estimates," or "anticipates" or comparable terminology. Although TreasureSys believes the assumptions underlying the forward‐looking statements contained in this Business Plan are reasonable, any one or more of the assumptions could be inaccurate, and therefore, there can be no assurance that the forward‐looking statements will prove to be accurate. The inclusion of such information should not be regarded as a representation that TreasureSys' objectives and plans will be achieved. TreasureSys also shall have no obligation to provide updates of any changes in this Business Plan or the underlying assumptions or projections.<br />
<br />
This Plan, in and of itself, is not an offer to sell or a solicitation to buy any securities of TreasureSys. Any such offer shall be made only through a definitive confidential Private Placement Memorandum, which, among other things, will describe in detail the risks involved in investing in any such securities. This Plan shall not be part of or be deemed to be included in any such offering material, unless such offering material incorporates this Business Plan into such offering material.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0H_-yA4CJmTNWP6MiTWlilTQ0CBA0Jo726FSH1JnaICegVptdaJIReHMn32TkmVT6Ur2o6Y0pYMU1JoEuPMqC9f0CosRydAvuK5Wccjf2LiUZf2VI9T1j7kzGoXJ16nlwFMl5Ld_wwMQa/s1600/All+About+the+Benjamins+%282002%29.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0H_-yA4CJmTNWP6MiTWlilTQ0CBA0Jo726FSH1JnaICegVptdaJIReHMn32TkmVT6Ur2o6Y0pYMU1JoEuPMqC9f0CosRydAvuK5Wccjf2LiUZf2VI9T1j7kzGoXJ16nlwFMl5Ld_wwMQa/s320/All+About+the+Benjamins+%282002%29.jpg" width="259" /></a></div><b>It's All About the Benjamins</b><br />
<br />
Now you should know about what you're going to need for the business. We're talking cash. You need cash, where are you going to get it? You might have savings. You might do some consulting "on the side" to survive. You might go around begging friends, family, colleagues, anyone who will listen. Unless you're rich enough not to need to read this anyway, you're probably going to be raising money in $1,000, $5,000, $10,000, $25,000 chunks, small to the big-time VCs but big for many individuals. And you need to be clear with people what they are getting for their money. Shares? A percentage of revenues? A loan? A convertible note?<br />
<br />
We'll talk more about specifics of raising the money in Part II, namely what the investment looks like financially and what forms you might need in order to actually close the investment round. See ya there!<br />
<br />
<iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=B000QXDED6&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe><iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=B000069DOB&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe><iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=0345428897&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com15tag:blogger.com,1999:blog-1539695110385892935.post-82099845886073751592010-09-22T06:09:00.000-07:002010-09-22T07:39:08.297-07:00Socrates: A Proposal for Purely Symbolic Python<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6bWt63-zRaFym-igO1otvMsUdjl4lCrH1_aqDwro8zLE1nrQhJ7_yjOrsHZzAUmQHxOO2rPR4oip9Ow52tm6WVWBPzgFDozP0JgbFm7b6rdFzu40AxczA6e1SYYcbG8-v6Jt_fe9JhFDU/s1600/monty-python-football-006.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6bWt63-zRaFym-igO1otvMsUdjl4lCrH1_aqDwro8zLE1nrQhJ7_yjOrsHZzAUmQHxOO2rPR4oip9Ow52tm6WVWBPzgFDozP0JgbFm7b6rdFzu40AxczA6e1SYYcbG8-v6Jt_fe9JhFDU/s320/monty-python-football-006.jpg" /></a></div><b>The Game</b><br />
<br />
Inspired by Socrates' <a href="http://www.math.harvard.edu/%7Eknill/mathmovies/swf/monty_python_football.html">winning goal</a> for the <a href="http://en.wikipedia.org/wiki/The_Philosophers%27_Football_Match">Greeks</a>, I've recently been considering a more purely symbolic, mathematical form of python. After having seen the concept of <a href="http://www.chinesepython.org/cgi_bin/cgb.cgi/english/english.html">Chinese Python</a> and playing around with <a href="http://en.wikipedia.org/wiki/APL_%28programming_language%29">APL</a> a little, I had the eureka shot of applying mathematical symbols to the language itself. In homage to the spirit of python, I shall name this symbolic python: <i>Socrates</i>.<br />
<br />
<b>Ground Rules</b><br />
<br />
A principle we have in Socrates for substituting mathematical symbols for python keywords and built-ins is that the correspondence should be one-to-one, non-overlapping, automated, and invertible. By one-to-one we mean there should be one and only one unique Unicode symbol for each keyword. By non-overlapping, we mean that the character should not overlap any current python usage, which means we cannot use any of the existing python character set, it must be separate and distinct from any ASCII characters. By automated, we mean that an automatic conversion routine should exist to convert between existing python keywords and symbolic keywords. By invertible, we mean that we should be able to go from existing keywords to symbolic keywords, and back again, without limitation.<br />
<br />
<b>Why Python?</b><br />
<br />
Python is an ideal source language for Socrates experimentation. It is noted for its clarity and lack of extraneous punctuation and grouping characters, making it ideally suited to symbolic enhancement. It also offers advantages in mapping mathematical equations and algorithms to source code, with native support for complex numbers and its powerful splice notation. In addition, creating a dialect of python using symbols for keywords is much more efficient than creating a language from scratch, taking advantage of the large installed base and library set of the language. This makes python an ideal environment for symbolic expression.<br />
<br />
<b>What This Proposal Is NOT</b><br />
<br />
Socrates is not a request to change the Python language. It is not something that requires altering the Cpython source code. It is simply an overlay, if you will, a filter, a viewport, a different way of seeing the exact same code. A preprocessor or better yet an editor or IDE should be able to take Socrates and transform it into standard python for compilation and evaluation. Likewise, standard python coming into an editor, IDE, or debugger should be substitutable with Socrates automatically. Now this gets a little interesting when you try to involve the exec, compile, eval, and execfile functions, they could be overridden to do the symbol transformation through the inclusion of a import statement for a symbol python module, tricky with exec, or we could just leave that out of scope. Generally Socrates should be completely transparent to the interpreter and all libraries. Only an end-user will ever see any difference.<br />
<br />
<b>Steps for Ease of Programming</b><br />
<br />
Socrates aims to be easy for the programmer. One of the things that plagued APL was the difficulty of directly entering a large character set. With graphical UIs and emacs or vi escape codes it's not especially difficult these days to enter special characters. However, programmers like to type fast when they're in the zone and don't want to have to remember even more special sequences than they already do. Thus I think the easiest way to implement symbol python from an editor perspective is to have active-replace as you type, so you can type standard python and the symbolic characters will automatically be exchanged with standard ones. For instance, you start typing "if" and then after you have a space the if gets magically replaced by "→" character. This can be implemented in many IDEs and is quite readily implemented on the web in javascript as well. Likewise, Socrates can be more readily understood when you can hover with the mouse, or caret in an editor, and the standard python for the symbol pops up after a short interval for easy interpretation.<br />
<br />
<b>Keyword Sources</b><br />
<br />
The official list of python keywords is maintained and defined in the <a href="http://docs.python.org/reference/index.html">Python Reference Document</a>. The keyword list for the python, although version dependent as I'm using 2.6.1, can easily be obtained through the following short program:<br />
<code><br />
import keyword<br />
print keyword.kwlist<br />
</code><br />
<br />
Given this list, we can construct a symbol table mapping from the keywords to equivalent international mathematical symbols. Some mappings are quite obvious, such as logical "and" being the "∧" symbol. Others are more arbitrary and inexact, such as "while" mapping to "∃", which doesn't really mean exactly what the exists symbol does in formal logic. In this regard what we are doing is similar to when mathematical symbols are applied across different fields, whereby a symbol may have a related but not identical meaning. For instance, the symbol for arithmetic mean,"|…|", can also be used for set cardinality.<br />
<br />
For the list of Unicode supported special characters I've used the <a href="http://en.wikipedia.org/wiki/Table_of_mathematical_symbols">Table of Mathematical Symbols</a>, <a href="http://en.wikipedia.org/wiki/Greek_letters_used_in_mathematics,_science,_and_engineering#.CE.9B.CE.BB_.28Lambda.29">Greek Letters in Mathematics</a>, and <a href="http://en.wikipedia.org/wiki/Table_of_logic_symbols">Table of Logic Symbols</a>, all at Wikipedia. This should display properly in your browser if you have Unicode enabled, which most browsers these days have by default.<br />
<br />
<b>Keyword Mapping</b><br />
<br />
Each symbol listed below has standard python, the selected symbol, and rationale for the symbol selection. This list is somewhat arbitrary, and not necessarily the best or ideal mapping, so if you have better recommendations let me know.<br />
<br />
<table border="1"><tbody>
<tr><th>Keyword</th><th>Symbol</th><th>Rationale</th></tr>
<tr><th colspan="3">Expressions</th></tr>
<tr><td>and</td><td>∧</td><td>From formal logic</td></tr>
<tr><td>or</td><td>∨</td><td>From formal logic</td></tr>
<tr><td>not</td><td>¬</td><td>From formal logic</td></tr>
<tr><td>in</td><td>∈</td><td>Element exists in list, as in set theory</td></tr>
<tr><td>not in</td><td>∉</td><td>Negation of exists in</td></tr>
<tr><td>is</td><td>≡</td><td>Congruence, is the same object</td></tr>
<tr><td>is not</td><td>≅</td><td>Not technically correct since this is isomorphism, more a mnemonic</td></tr>
<tr><td>None</td><td>∅</td><td>Empty set</td></tr>
<tr><th colspan="3">Multi-Arg Expressions</th></tr>
<tr><td>if</td><td>→</td><td>As in formal logic, except before the statement as in python syntax</td></tr>
<tr><td>else</td><td>…</td><td>Ellipsis, actually its own character not three dots</td></tr>
<tr><td>lambda</td><td>λ</td><td>Not purely symbolic since it's Greek, but can't really be anything else</td></tr>
<tr><th colspan="3">Simple Statements</th></tr>
<tr><td>assert</td><td>⇔</td><td>Iff: the statement is true if and only iff the following is true</td></tr>
<tr><td>pass</td><td>□</td><td>QED, this expression is done, go to the next one</td></tr>
<tr><td>del</td><td>↓</td><td>Drop the following from memory, from the Webb operator</td></tr>
<tr><td>print</td><td>‣</td><td>Alternative form of QED, the output as result</td></tr>
<tr><td>return</td><td>↑</td><td>Function return, those of you old enough may remember this from Smalltalk</td></tr>
<tr><td>yield</td><td>←</td><td>Return for generator functions</td></tr>
<tr><td>raise</td><td>⇈</td><td>Like function return, but with two arrows</td></tr>
<tr><td>break</td><td>⊗</td><td>Tensor, but looks like a breakpoint</td></tr>
<tr><td>continue</td><td>⊕</td><td>XOR, but we're using it like a graphical breakpoint continuation</td></tr>
<tr><td>import</td><td>∪</td><td>Union the existing module list with this module</td></tr>
<tr><td>as</td><td>≐</td><td>Alternative form of "definition"</td></tr>
<tr><td>from</td><td>⊂</td><td>From a subset of this module, perform an import</td></tr>
<tr><td>global</td><td>Γ</td><td>Include this in the "modular group" of all variable definitions</td></tr>
<tr><td>exec</td><td>Φ</td><td>The work function in physics, similar to the APL execute symbol</td></tr>
<tr><th colspan="3">Compound Statements</th></tr>
<tr><td>elif</td><td>⇒</td><td>Different notation for "if" to distinguish from "→"</td></tr>
<tr><td>while</td><td>∃</td><td>While the true condition exists, execute the body</td></tr>
<tr><td>for</td><td>∀</td><td>For all elements of the list, execute the body</td></tr>
<tr><td>try</td><td>⊤</td><td>Top statement - try this statement first</td></tr>
<tr><td>except</td><td>⊥</td><td>For something raised "independent" of the execution path</td></tr>
<tr><td>finally</td><td>⊧</td><td>Because every evaluation "entails" the following statement</td></tr>
<tr><td>with</td><td>⊢</td><td>"Infer" the following in order to execute the statement below</td></tr>
<tr><td>def</td><td>↦</td><td>Function definition, although this is in front of the function name</td></tr>
<tr><td>class</td><td>∘</td><td>Function composition, although in front, and more a set than a composition of functions</td></tr>
</tbody></table><br />
<b>Examples</b><br />
<br />
Let's see what this looks like in practice for some small sample programs. Here's the Fibonacci sequence in Socrates:<br />
<br />
<pre>↦ fib(n):
→ n == 0:
↑ 0
⇒ n == 1:
↑ 1
…:
↑ fib(n-1) + fib(n-2)
</pre><br />
Let's try this with the 8-Queens problem from the <a href="http://wiki.python.org/moin/SimplePrograms">Python Wiki</a>, which demonstrates many of the symbols in a longer program:<br />
<br />
<pre>BOARD_SIZE = 8
∘ BailOut(Exception):
□
↦ validate(queens):
left = right = col = queens[-1]
∀ r ∈ reversed(queens[:-1]):
left, right = left-1, right+1
→ r ∈ (left, col, right):
⇈ BailOut
↦ add_queen(queens):
∀ i ∈ range(BOARD_SIZE):
test_queens = queens + [i]
⊤:
validate(test_queens)
→ len(test_queens) == BOARD_SIZE:
↑ test_queens
…:
↑ add_queen(test_queens)
⊥ BailOut:
□
⇈ BailOut
queens = add_queen([])
‣ queens
‣ "\n".join(". "*q + "Q " + ". "*(BOARD_SIZE-q-1) ∀ q ∈ queens)
</pre><br />
<b>Where Do We Go From Here?</b><br />
<br />
Mostly I did Socrates as a Gedankenexperiment, a thought exercise for those of you whose German is rusty, just to see what it would look like and to go through the task of considering mathematical modeling of programming functions. Some of the symbols I'm happy with, some I'm not to fond of, it could use some refinement. If anyone would like I can attempt a mapping of the built-in functions next, and maybe a python conversion routine, some editor macros and javascript transformation functions. We'll see if anyone is interested or it's just my own pet eccentricity.<br />
<br />
<iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=B001P6FU4O&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe><br />
<br />
<iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=B00005O7NE&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe><br />
<br />
<iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=3642024742&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe><br />
<br />
<iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=0534128645&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com8tag:blogger.com,1999:blog-1539695110385892935.post-16919468135706135292010-09-19T07:15:00.000-07:002010-09-19T08:37:03.809-07:00Python Startups and the True Entrepreneur<div class="separator" style="clear: both; text-align: center;"><a href="http://assets1.likeme.net/3977/large/true_religion.png.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="http://assets1.likeme.net/3977/large/true_religion.png.jpg" /></a></div><span class="js-singleCommentText jsk-ItemBodyText">I've got lots of friends that are crazy about Python, about what it can do for development especially with AppEngine, Django and All That. True enough, delivering on these promises can happen and lead to new ideas in software that weren't possible before. With this realization comes the thought, then the firm idea, then the inception of a new startup company dedicated to the realization of this concept. I've got loads of friends thinking of making the jump, and a significant number that have, and python is always at least part of the discussion and sometimes the main topic.</span><br />
<span class="js-singleCommentText jsk-ItemBodyText"><br />
</span><br />
<span class="js-singleCommentText jsk-ItemBodyText">It's not just Python - this kind of thing happens all the time with software entrepreneurs. Of course it seldom turns out the way expected. Usually after six months or a year of trudging along, a trail of bitter tears and dashed hopes is all that's left. What sounds fun, exciting, with limitless potential turns into just another way to work for low wages and use up all your savings and the savings of friends, family and investors. Then it's back to the daily grind, perhaps wiser for it and with Awesome New Coding Skillz, but nonetheless back to workin for Da Man.<br />
<br />
But there is such a thing as a True Entrepreneur. I'm not one, but I know them. They are few and far between, and often not astoundingly successful. I'm talking about the people who wouldn't hold down a stable job, even if offered it, not because they don't have skills, but because they can't imagine not running their own business. Someone who would rather work minimum wage, even after years of college and experience, at their own company, rather than make $100 an hour working for The Man. Someone who, far from shying away from the risks of loosing money and skirting close to bankruptcy, actually thrives on this kind of gambling and stress.</span><br />
<br />
<span class="js-singleCommentText jsk-ItemBodyText"></span><br />
<span class="js-singleCommentText jsk-ItemBodyText"></span><br />
<span class="js-singleCommentText jsk-ItemBodyText"></span><br />
<span class="js-singleCommentText jsk-ItemBodyText"></span><br />
<span class="js-singleCommentText jsk-ItemBodyText"></span><br />
<span class="js-singleCommentText jsk-ItemBodyText">In microeconomics, you may have been taught that new businesses have a high rate of failure, between 75-95% depending on the study, which means only 5-25% of startups are still in business after a few years. Your chance of succeeding, not just surviving, is even less, where me might consider a startup successful if it grows to be worth $1 million, $10 million, $100 million, or more. We are further taught that the entrepreneur takes on such high risks because of the high chance of reward. Staying at a major company and becoming a senior executive may get you a high reward, maybe even $500 million, but it will never approach the potential of the most successful entrepreneurs, with multiple billions or even $20 billion at the top level.<br />
</span><br />
<span class="js-singleCommentText jsk-ItemBodyText">However success is not the primary motivator of the entrepreneurs I know; it is a factor, but not even I think the primary factor. The primary factor is self-actualization, which is a combination of ego, arrogance, self-delusion, and wish fullfillment. If you went simply by the odds, it would be a net loss to be an entrepreneur, like a mutual fund that invested in the lottery. Sure, you might win, you might even win big, and if you don't play you won't win, true, but statistically you'd be better off putting your money in Treasury bonds.</span><br />
<br />
<span class="js-singleCommentText jsk-ItemBodyText">A True Entrepreneur doesn't found a company just because they think it might succeed; in fact they would found the company even if they knew it would fail. They simply are compelled to do it, in the way a monk might be compelled to pray in the desert, in the way a Peace Corps volunteer is compelled to help the less fortunate, in the way Guido van Rossum was complled to make life better for developers with Python. It is a basic, primal, ultimately irrational and unexplainable inner drive.</span><br />
<span class="js-singleCommentText jsk-ItemBodyText"> <br />
If you're this kind of person, a True Entrepreneur, you will be compelled into your own business, be it coffee shops or software, and you will over your working life at least be able to eek out a living. If you're not, you either need to get on board with one of these, or stay as an employee or freelancer. I'm not saying this from the perspective of the entrepreneur, "come join the bandwagon with me!". Rather, I'm saying this from the outside, from having been on the inside but spending more time out of it, from choosing more often then not to stay out of the game even when many family and friends are in it. What I'm really trying to do is make sure people understand what they are getting into, so they know if this life - and it is a life - <iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=0887306292&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe>is the one for you.</span><br />
<br />
<span class="js-singleCommentText jsk-ItemBodyText"><br />
</span><br />
<div class="jsk-ItemFooter"><div><div class="js-singleCommentCtls"><span class="js-singleCommentDeletable"><a class="js-singleCommentDelete js-singleCommentControl jsk-SecondaryFontColor" href="http://www.blogger.com/post-edit.g?blogID=1539695110385892935&postID=1691946813570613529"></a></span><span class="js-singleCommentEditable" style="display: none;"><span class="jsk-SecondaryFontColor"> – </span><a class="js-singleCommentEdit js-singleCommentControl jsk-SecondaryFontColor" href="http://www.blogger.com/post-edit.g?blogID=1539695110385892935&postID=1691946813570613529">Edit</a></span><span class="js-singleCommentModeratable" style="display: none;"><span class="jsk-SecondaryFontColor"> – </span><a class="js-singleCommentModerate js-singleCommentControl jsk-SecondaryFontColor" href="http://www.blogger.com/post-edit.g?blogID=1539695110385892935&postID=1691946813570613529">Moderate</a></span></div></div></div>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com12tag:blogger.com,1999:blog-1539695110385892935.post-28663086246256348942010-09-17T13:05:00.000-07:002010-09-19T12:33:28.216-07:00Python, Pyjamas, and GWT<div class="separator" style="clear: both; text-align: center;"></div><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1vHWTKWcZbKlsnsdhUGk62WHVxctbwTRqDbyLx3-ekbKDQA1k7BhO-2Zzi5IOB0tbBXZ8AEW2yBJm1QNKNBtKr_s8UWPT9xN9XPNWvV6VCbAzTsgCT0sbAAAEx47QAyd2pJsky7HJ7qf5/s1600/cnsupermanpr1.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1vHWTKWcZbKlsnsdhUGk62WHVxctbwTRqDbyLx3-ekbKDQA1k7BhO-2Zzi5IOB0tbBXZ8AEW2yBJm1QNKNBtKr_s8UWPT9xN9XPNWvV6VCbAzTsgCT0sbAAAEx47QAyd2pJsky7HJ7qf5/s320/cnsupermanpr1.jpg" /></a>I've been looking at some Django apps lately, and the javascript side of the application is messy and a little disguisting. Something like GWT for python, whereby the javascript would all be generated from a unified python source tree. It turns out such a beast is in the works, <a href="http://pyjs.org/">Pyjamas for Python</a>. Once you get used to GWT, it's hard to go back to all that spaghetti multi-lingual JS linkage, JQuery or no.<br />
<br />
From the earliest programming systems it's generally been considered a good idea to have unified UI and debugging with your backend systems. An extreme on this level of integration might be smalltalk. Even where a system would have very well defined interfaces and separation, a unified language for the system was still often preferable, such as with large Ada projects. Proliferating the number of languages in a single project produces many problems, and Javascript is no exception. I'm hopeful that the Pyjamas guys can solve this issue for Python as well as it has been for Java with GWT.<iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=0132356139&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=FFFFFF&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com42tag:blogger.com,1999:blog-1539695110385892935.post-46818717333064326052010-09-17T06:26:00.000-07:002010-09-17T08:44:59.129-07:00Symbology, APL, and Chinese Python<div style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: left;"><img border="0" height="235" src="http://patchesnpins.com/FIRE%20014.jpg" width="320" /></div><br />
In mathematics we make heavy use of symbology. For instance, we use "+" instead of "plus", "∑" instead of "sum", "π" instead of "pi". This aids mathematics in being clear, compact, and universal across all human languages.<br />
<br />
In programming languages, we generally only use English. There are a few exceptions for grouping characters like "()" and "{}", punctuation like ":", and the basic mathematical "+", "-", "/". However the vast majority of the language is in English, such as "if", "else", "for", "while", "import". Python is no exception to this tendency.<br />
<br />
English speakers may at this point simply claim that everyone must learn English to program, and that is that. However this is somewhat unrealistic, given the long and difficult course of study required to understand English well for non-native speakers, and furthermore promotes pushback with the charge of cultural imperalism. Even among those programmers who can get by in English, it is still much easier and more comfortable to use one's native language, as evidenced by the large number of translations available for standard programming works.<br />
<br />
German is a typical example. The language is not so dissimilar from English that with the aid of a dictionary a programming text could not be understood, however it is much slower for all but the most gifted learners than reading a German translation. I've seen a bookstore filled with German programming books of almost any language and system you can imagine. I've seen SAP source code written in ABAP with English control words but all variables and comments in German.<br />
<br />
With Chinese the difference is even more striking. Not only is it even more difficult for a Chinese speaker to understand English with a dictionary than for a German speaker to understand, given the similar language backgrounds, but there is the added difficulty of international character sets, input methods, and the "ASCII-first" mentality. In addition, with China becoming a greater economic and social power, there are over ten times the number of Chinese speakers on the Internet than German, with an increase in demand for understandable code. In fact there is even a python fork where the control words are entirely in Chinese:<br />
<br />
<a href="http://en.wikipedia.org/wiki/ChinesePython">Chinese Python</a><br />
<br />
Here is an extract below, for help in understanding, "回答" means "answer", "<span class="st0">有/没有</span>" means "have / doesn't have", "读入" means "read", "写" means "write", "如" means "if", "不然" means "else", and "否则" means "otherwise":<br />
<br />
<pre class="de1">回答 = 读入<span class="br0">(</span><span class="st0">'你认为中文程式语言有存在价值吗 ? (有/没有)'</span><span class="br0">)</span>
如 回答 == <span class="st0">'有'</span>:
写 <span class="st0">'好吧, 让我们一起努力!'</span>
不然 回答 == <span class="st0">'没有'</span>:
写 <span class="st0">'好吧,中文并没有作为程式语言的价值.'</span>
否则:
写 <span class="st0">'请认真考虑后再回答.'</span></pre><br />
It may be a bit of a struggle, but if you look at it a few times you can see it's a simple program which happens to be asking about the value of Chinese as a programming language. It asks the user for input, and then based on the user answer, has an if block to write three different responses on the output.<br />
<br />
Chinese Python is a particularly good example for English speakers because it shows how different and difficult it is to think and compose programs in a very dissimilar written language. This may help give a perspective for the difficulty faced by Chinese speakers in working with English programs. I had a friend from Hong Kong whose way of dealing with this problem in programming classes was just to lookup the Chinese words for the English in his English programming textbook and write all the Chinese words above the English words which then made it much easier to review and study.<br />
<br />
So on the one hand we have the monolingual approach of only using English keywords and forcing everyone to learn English, on the other hand we have the fragmentation approach whereby every major language does its own fork so it can have control words in its native tongue. Between these two extremes I could see a compromise of multilingual keywords for the same code base. You could have a language-locale header for every source file, as with XML encodings, and then process keywords from that language lookup table for the given source code, the *.py file. In the compiled binary, the *.pyc, you could just store the keyword ID. This would let you have multiple source code language files in the same project, and furthermore let a person view the source and debug a binary in their own language, with the language keywords of that particular user at runtime. So the Chinese Python example above would look, to someone with an English overlay, like:<br />
<br />
<pre class="de1">回答 = raw_input<span class="br0">(</span><span class="st0">'你认为中文程式语言有存在价值吗 ? (有/没有)'</span><span class="br0">)</span>
if 回答 == <span class="st0">'有'</span>:
print <span class="st0">'好吧, 让我们一起努力!'</span>
elif 回答 == <span class="st0">'没有'</span>:
print <span class="st0">'好吧,中文并没有作为程式语言的价值.'</span>
else:
print <span class="st0">'请认真考虑后再回答.'</span></pre><br />
As you can see this is much more understandable to English users. We could have similar overlays for English, Chinese, German, French, Spanish, Italian, Japanese, etc. Right-justified languages such as Arabic or Hebrew may be slightly more difficult, but with an abstracted user display class, could be made to work seamlessly.<br />
<br />
Besides allowing language overlays in Python, it is also possible to adopt universal mathematical symbols as the keyword language. This could be an additional language overlay, but alternatively the default keyword language, understandable across all languages. The language APL took this concept to a new level in the 1960s, even creating new symbols for highly compact matrix operations.<br />
<br />
<a href="http://en.wikipedia.org/wiki/APL_%28programming_language%29">APL</a><br />
<br />
An example APL program below that finds all primes between the interval 1..R:<br />
<br />
<pre>(~R∊R∘.×R)/R←1↓⍳R
</pre>Which, although very inefficient, is rather brilliant and elegant once you study APL. However such dense symbology, especially with custom-made symbols, is difficult to understand and contrary to ease-of-reading which is one of the goals of python.<br />
<br />
Instead, let us apply the same symbology principle but only to python keywords, using existing mathematical symbols that any computer science graduate would be familiar with from formal logic. Furthermore we will use symbols readily available in ASCII so we do not need to use any custom characters.<br />
<br />
Using the substitutions:<br />
<br />
<= raw_input (from unix stdin)<br />
=> print (from unix stdout) <br />
? if<br />
?: elif<br />
:: else <br />
<br />
Substituting this into our earlier example results in:<br />
<br />
<pre class="de1">回答 = <<span class="br0">=(</span><span class="st0">'你认为中文程式语言有存在价值吗 ? (有/没有)'</span><span class="br0">)</span>
? 回答 == <span class="st0">'有':</span>
=> <span class="st0">'好吧, 让我们一起努力!'</span>
?: 回答 == <span class="st0">'没有'</span>:
=> <span class="st0">'好吧,中文并没有作为程式语言的价值.'</span>
:::
=> <span class="st0">'请认真考虑后再回答.'</span></pre><br />
As python evolves, and encounters more of the non-English speaking community, we may expect keywords and built-in functions to become more and more symbolic, improving legibility for a worldwide audience.<br />
<iframe align="right" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=dbco02-20&o=1&p=8&l=bpl&asins=B001EN71CW&fc1=000000&IS2=1&lt1=_blank&m=amazon&lc1=0000FF&bc1=000000&bg1=FFFFFF&f=ifr" style="height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"></iframe>John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com12tag:blogger.com,1999:blog-1539695110385892935.post-70429302305423675212010-09-15T01:28:00.000-07:002010-09-17T08:40:42.495-07:00Google AppEngine and Enterprise Support<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpSe3g59Hw0blCpWELoJpgI5g4NMKyjRGlMTWd0DgWO__toh-QWI2RGY3kQiNgELTYRSLWQqeYUVuY_KR5k1KFd3l1I9dmB1XHXzk7_Z8Nt4JDnKNq91LECTiKBmeLqbdovfTFZ0q8UOVz/s320/postcard_cheyenne_0627.jpg" style="margin-left: auto; margin-right: auto;" /></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The Real NORAD in Cheyenne, Wyoming</td></tr>
</tbody></table><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpSe3g59Hw0blCpWELoJpgI5g4NMKyjRGlMTWd0DgWO__toh-QWI2RGY3kQiNgELTYRSLWQqeYUVuY_KR5k1KFd3l1I9dmB1XHXzk7_Z8Nt4JDnKNq91LECTiKBmeLqbdovfTFZ0q8UOVz/s1600/postcard_cheyenne_0627.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a></div><br />
I just finished watching the Google webinar on their Python AppEngine:<br />
<br />
<a href="http://www.youtube.com/watch?v=3Ztr-HhWX1c&feature=related">Google Campfire One Python</a><br />
<br />
It's a good talk and the best part is they spend most of the time showing actual source code and running applications. It always annoys me when developers talk about their programs without showing the source code, or at best showing pseudocode, so it was refreshing to see a talk that was mostly source code.<br />
<br />
The premise of the AppEngine is promising: outsourcing admin work and making the whole deployment process very fast and easy. For small applications and few users it lets you get online with little fuss and no cost. Where I would be a little concerned is when you start talking about large important applications where you need much more than the 500 MB of storage they offer, where you need 24x7 on-call support when the admin console isn't doing what you think it is, all the big hairy problems that a large enterprise application is known for. This support aspect concerns me because that's what I've spent most of my professional life doing: supporting massive enterprise applications. Given that Google is known for their crappy customer service and often non-existent support (as I found when I tried to use Google Earth professionally), I'm worried.<br />
<br />
One way to alleviate some of these concerns would be to have a real customer support department at Google. I'm looking for something like you'd find at Motorola for their public safety radio monitoring, the Big Board room in Schaumburg: I've visited it, or the EDS, now HP, service monitoring room, a NORAD-type facility where you can pay for good monitoring and in-depth support to fix critical issues. It's relatively difficult to build this kind of organization, especially at Google where the culture of customer service is largely absent, so it would probably be easiest for Google to just buy an existing support company or partner with someone like HP that can provide this level of support. It will be quite expensive, but the large organizations demanding it have deep pockets.<br />
<br />
<br />
<br />
<br />
However even this high support level has the downside of being tied to Google administrative access to sensitive company data. If I was trying to sell AppEngine to big corporate clients, this is one of the primary concerns I'd have to address, especially in the software and telecommunications field where Google is a primary competitor to many potential users. Many executives simply won't allow sensitive data to be placed in the hands of a competitor, despite assurances, and there may be strong privacy concerns as well. In some jurisdictions, storing customer mobile call data on Google servers may not even be legal.<br />
<br />
Google could alleviate these concerns by providing infrastructure hardware and software for company-hosted AppEngine instances. This would allow a company to run their own Google AppEngine in their own datacenter. This is not without precedent. Google has taken this approach with search, selling the Google Search Appliance as a stand-alone indexer and search provider for intranets.<br />
<br />
<a href="http://www.google.com/enterprise/search/gsa.html">Google Search Appliance</a><br />
<br />
A Google AppEngine Appliance would provide the same level of customer control and assurances in the AppEngine space that is already found in the search space. Furthermore it would build a revenue stream that Google normally wouldn't have access to, the datacenter budgets of enterprises. Given that dozens or hundreds of AppEngine Appliances might be needed for a large application, or even several large clustered virtual machines hosting thousands of appliances, the potential is huge.<br />
<br />
Let's hope we see a Google AppEngine Appliance in the future. Then we enterprise programmers can start using this great python development tool in our everyday work.John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com1tag:blogger.com,1999:blog-1539695110385892935.post-74543162652649188632010-09-14T05:29:00.000-07:002010-09-17T08:41:08.135-07:00Webapp UI Style<div class="separator" style="clear: both; text-align: center;"><a href="http://rlv.zcache.com/lex_parsimoniae_tshirt-p2351540164598333661r9i_400.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="200" src="http://rlv.zcache.com/lex_parsimoniae_tshirt-p2351540164598333661r9i_400.jpg" width="200" /></a></div>There are two modes of thinking in webapp UI style. One is to make everything as flashy (or Flash-y), elaborate, and slick as possible. For some ad-branding sites this may be appropriate. Another approach is to keep a very clean, minimal, functional interface. The second approach is, as I think the success of Google shows, superior for most uses.<br />
<br />
A great example is a tool I just used, the <a href="http://www.coolutils.com/Online-Mail-Converter.php#">MSG to PDF</a> converter by <a href="http://www.coolutils.com/About.php">CoolUtils</a>. I had a problem in that I was mailed some Outlook MSG files and needed to read them, but I don't have Outlook. It doesn't work with Outlook Express either. The CoolUtils converter let me upload the MSG file and download a PDF, with a very simple form click, and furthermore it was very fast, taking only seconds. And the output was quite nice, with color and proper formatting.<br />
<br />
When writing your python webapps, apply Occam's Razor liberally. Don't make an extra button because you think it might someday be useful to somebody. If the user can't accomplish a simple task with one button-click, you're doing something wrong. Simplify. Map. Reduce. Your users will be happy.John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com0tag:blogger.com,1999:blog-1539695110385892935.post-85824372187332130172010-09-13T06:52:00.000-07:002010-09-17T08:41:27.459-07:00Map, Reduce, and Database Migration<a href="http://www.bbc.co.uk/scotland/education/geog/population/images/migration/migration-map-html-13.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><img alt="" border="0" src="http://www.bbc.co.uk/scotland/education/geog/population/images/migration/migration-map-html-13.gif" style="cursor: pointer; float: left; height: 240px; margin: 0pt 10px 10px 0pt; width: 289px;" /></a><br />
<br />
When I've been involved in database migration in the past, there has always been a big bottleneck in dependencies. Certain data has to be moved in a certain order, a table has to be moved all at once. Even if you drop and re-add constraints, there is significant time in re-enabling the constraints with the associated index construction; and the possibility that a constraint violation may occur and tank your entire migration process.<br />
<br />
With the advent of cloud DBs, we have to start thinking about how migration is different for this different type of database. We can take advantage of the lack of constraints and dependency to allow for highly parallel migrations. We can think about partial migrations, where independent data groups can be moved over time. Consider applying the map-reduce pattern to migrations themselves. Consider incremental migrations that, rather than the big-bang approach we are used to, becomes something more akin to a replication that can be run over time until the new system is stable and the old system can be recycled.John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com2tag:blogger.com,1999:blog-1539695110385892935.post-77577256463639320332010-09-13T06:37:00.001-07:002010-09-17T08:41:59.270-07:00Learning Python<a href="http://www.recessframework.org/files/map-reduce-butterflies.JPG" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><img alt="" border="0" src="http://www.recessframework.org/files/map-reduce-butterflies.JPG" style="cursor: pointer; float: left; height: 257px; margin: 0pt 10px 10px 0pt; width: 192px;" /></a><br />
<br />
So today is day 5 of learning python for me. I've found this excellent tutorial and now I've reached the <a href="http://docs.python.org/tutorial/datastructures.html">datastructures</a> section. It turns out python has filter, map and reduce as top-level functions which will end up being quite useful for cloud DB programming. That has got me thinking about the tie-in between python, cloud db, and the google app engine.John Arley Burnshttp://www.blogger.com/profile/02087488390377053543noreply@blogger.com0