Tuesday, May 8, 2012

Experiments in Parallelism: The Early Days

When I first started programming, life was simple.  A computer had a single processor that fetched an instruction from RAM, interpreted it and then executed it.  Memory was fast, disks were slow, and if you really needed some extra-fast storage, you had three 8-bit registers to keep your working data in.  There wasn't anything such as multi-threading or distributed computing - at least, not that I had access to.

Even back then, I had a fascination with massive distributed parallelism, despite the fact that my Apple IIe would never get me there.

It wasn't until my fourth year of university that I had the chance to use a real parallel computer for my honors thesis project. I was able to use a handful of Transputers - processors with four high-speed serial interconnects that could be joined together like carbon atoms.  I built a cube-shaped topology and used it to do an A* best-first search to solve the knapsack problem.  Back then, I used a language called Occam, that was based on Hoare's Communicating Sequential Processes [1].  I discovered two interesting things as part of this project.

Discovery #1 - The two-colored algorithm for distributed termination detection

The first thing I discovered was that if you have no global state, it is really hard to determine when you are done.  Whenever one of the processors finished all of its work, it would ask its neighbors for more to do.  That way, all the processors would stay busy until all the work was done.

The problem was that if you passed around a "are you done?" message asking each processor if it was done, each might answer "yes", but between the time that processor N answered "yes" and the time that all the remaining processors answered "yes", processor N might have found work to do from processor M.  Meanwhile, processor M finishes its work before the done request comes and it answers "yes, I'm done" as well.  In other words, the remaining work might play hide-and-seek with the "are you done?" messages.

The solution to this problem was something I found in the literature [2].  Basically, you form a Hamiltonian cycle out of the processors (that is, a circular path that goes through each processor exactly once).

Then you give a yellow token to one of the nodes and whenever it runs out of work, it passes the yellow token to the next node in the Hamiltonian circuit.  Eventually, the yellow token arrives back at the original node.

If the original node has been idle the whole time, it converts the token to a red token and passes it around again.  If the red token arrives back at the original node and it has still been idle the whole time, then processing is done.

On the other hand, if a node receives a token (either red or yellow) and it hasn't been idle, it converts the token back to a yellow token, and the process repeats.

Discovery #2 - Super-linear speedup

The second thing I discovered was something that was hard to explain at first.  I had expected that as the number of processors increased, the efficiency of each processor would get worse and worse.  In other words, eight processors might perform 6x as good as one processor.  In actual fact, I observed the opposite: eight processors performed >8x as fast as one processor.  This seemed to defy the laws of nature.

As it turns out, my sub-par choice of data structures [3], plus the non-deterministic jitter added by parallelism, led to a situation where the parallel algorithm was superior to the sequential algorithm.  In fact, I was able to show the same super-linear speedup by running eight processes on a single processor.  I was later able to confirm that other researchers had independently observed the same phenomenon. [4]

Conclusion

Parallelism leads to unexpected new problems and phenomena.  Modern tools and libraries have changed things a lot since my early days (more on this in a future post), but it is still surprising what you will discover.

---
[1] Interesting fact: the Go language uses the same paradigm. (http://golang.org/)
[2] Oliver Vornberger, Transputer networks for operations research problems, Journal of Microcomputer Applications, Volume 13, Issue 1, January 1990, Pages 69-79, ISSN 0745-7138, 10.1016/0745-7138(90)90007-T. (http://www.sciencedirect.com/science/article/pii/074571389090007T)
[3] Tip of the day: a heap is better than a binary tree for implementing a priority queue.
[4]  A search for "super-linear speedup" will give you lots of examples.

Tuesday, April 17, 2012

Google adopts OpenFlow in a big way

Google recently disclosed that they are have rolled out OpenFlow for all of their internal networking.

While I have long believed that Software Defined Networking (SDN) is the future, this is the first real proof I have seen that it is commercially viable at scale.

I believe SDN is going to revolutionize the networking industry, allowing us to leave behind all those "boat anchor" legacy concepts like Spanning Tree, and further blurring the lines between layer 2 and layer 3 in the data center.

Here is the original article:
http://www.wired.com/wiredenterprise/2012/04/going-with-the-flow-google/all/1

Monday, May 30, 2011

First Commercial Quantum Computer

This past week, D-Wave Systems announced the sale of the first commercial quantum computer to Lockheed Martin.

Although quantum computers are still in their infancy (we're talking ENIAC-like infancy here), their potential is astounding.  The most interesting thing is that certain problems that are hard for conventional computers should be very easy for quantum computers.  This could have some really unexpected results, such as defeating almost all of the cryptographic techniques we normally use to protect our information today.

I don't know if what D-Wave has is the beginning of something significant or not.  However, for a mere $10M, you could buy one and find out.

Tuesday, May 10, 2011

OpenStack

A couple of weeks ago, I went to the OpenStack Design Summit in Santa Clara.  It was an amazing event with hundreds of people (most of them developers) and significant representation from major industry players like Citrix, Cisco and Dell.

It is hard to believe that it was only last summer that Rackspace and NASA announced the project.  In that time, the project has had three releases.

The project has several major components, including a compute fabric called "Nova" and an S3-like blob store called "Swift".

By most accounts, Swift is ready for prime time (in fact, CloudScaling was involved in several large deployments).

Nova on the other hand is still an early-stage product.  Nevertheless, several major players have announced their intention to have product deployments in the near future, including Internap who plans to do so later this year.

All this activity has attracted significant investment, and most of the effort seems to be focused on: block storage as a service (think EBS, sort of), network as a service (check out Nicira), and higher level management and integration functions (basically all those "little" things that turn a technology into a real-world service).

One thing is for sure, this is a space to watch carefully.

DISCLAIMER: The opinions expressed here are my own and do not necessarily represent those of current or past employers.

Saturday, April 30, 2011

IPv6 on Comcast

I recently switched back to Comcast, and given that they have been talking up their IPv6 initiatives a lot, I decided to see how easy it was to get IPv6 connectivity. As it turns out, it was almost trivial.

While Comcast is planning to have dual-stack (both IPv6 and IPv4) support soon, they currently only support 6to4 in my area. The good news is that they support automatic 6to4 tunnel configuration.

Here is how I got it working with my Mac.
  1. I unplugged my router and connected my Mac directly to the cable modem (6to4 tunneling does not work behind a NAT gateway -- well sort of, but I'll save that for a future blog when I get it working).
  2. I restarted the cable modem (if you don't do this, it will stay bound to your router's MAC address and you won't get any connectivity).
  3. I waited for things to settle down (the cable modem has a bad habit of giving you a 192.168.100.x address space temporarily, but unplugging the ethernet cable from your computer and/or renewing your DHCP on your computer should fix it).
  4. You should now have regular IPv4 connectivity.
  5. Now the cool part. In the System Preferences network settings, I clicked the "+" to add a new network, and chose "6 to 4" for the interface type.
  6. I then opened a browser to ipv6.google.com and voila, it worked.
Of course, this was a temporary solution because I had to unplug my router. However, I've heard that several people have had success with the Apple Airport Extreme and several of the other routers listed on the 6to4 Wikipedia page.

I plan to do some more experiments with DD-WRT or Linux as the router. Stay tuned.

Saturday, April 16, 2011

Welcome

Welcome to my blog. I am interested in a broad range of technology and thought it would be worthwhile to share a few of my discoveries.