Monday, September 13, 2010

Map, Reduce, and Database Migration



When I've been involved in database migration in the past, there has always been a big bottleneck in dependencies. Certain data has to be moved in a certain order, a table has to be moved all at once. Even if you drop and re-add constraints, there is significant time in re-enabling the constraints with the associated index construction; and the possibility that a constraint violation may occur and tank your entire migration process.

With the advent of cloud DBs, we have to start thinking about how migration is different for this different type of database. We can take advantage of the lack of constraints and dependency to allow for highly parallel migrations. We can think about partial migrations, where independent data groups can be moved over time. Consider applying the map-reduce pattern to migrations themselves. Consider incremental migrations that, rather than the big-bang approach we are used to, becomes something more akin to a replication that can be run over time until the new system is stable and the old system can be recycled.

2 comments: