Tuesday, November 25, 2008


Sweet! A crap ton of feedback on our work and ideas for future work all for free! What a good idea to have the class read this. I agree entirely with Yanpei's observations about our work, that we should be looking for the general problems in MapReduce, and even more generally in distributed systems.

In the spirit of Matei's self criticism, you might find our summaries of the feedback we got from the OSDI reviewers interesting.

I would argue, however, that by looking at what Hadoop got wrong, we learn more than "Hadoop developers are bad coders" because they are not. Rather we learn what are the non-trivial parts of a disturbed system. In the mapreduce paradigm, backup tasks are SUPER important. So regardless of whether Hadoop had done a crappy job on them in the first place, research into the complexities of scheduling in real world distributed systems--and especially principled reasoning about lower bounds on running time of solutions to the problems we face--is an area of research that offers a delightful blend of theory and systems building (engineering?).

I'll take this work over Kurtis's science any day.

Some thoughts from 1) my head and 2) meetings with Matei and Anthony earlier today:
-how does the minimum makespan scheduling algorithm change when you introduce stragglers (multiple copies of some tasks)
-we need to build a model of the distribution of the jobs and also of the speeds of the nodes
-how does being able to bring up an extra node (i.e. EC2) effect the model?
-low hanging fruit: fix the dumb 1/3 1/3 1/3 task progress reporting problem in hadoop.

-rework the scheduler to use chukwa data to get a huge improvement in performance. what is the simplest way to use this data to be smarter?
-we would want to keep an extra field in each "task tracker" object on the jobtracker that just queries the task trackers about
-we would also want to keep track of a list of features which each has a number assigned with it that represents the average effect of that feature on task response time.
-are there papers on simple models for using feedback like this in scheduling?

-get chukwa running on all nodes of the R cluster
-get chukwa running on EC2
-get chukwa collecting xtrace stuff from hadoop

More middlebox work (policy aware layer 2)

I saw Dilip give a talk about Policy aware switching at syslunch last year, and have liked the idea ever since.

The premise is: Given that we middleboxes are here to stay and account for a huge percentage of network errors because they are so complex to configure, we should use a mechanism that doesn't require middleboxes to be placed on the physical path between end points. PLayer allows for policy to be specified at a centralized location using a standardized API.

I know that we just end up repeating ourselves with each new idea that we address in this class, but on the Yanpei scale, this one seems to be in the earlier stages of progress towards practical or popular adoption, which I guess is good for the authors since it leaves room for more papers as progress is made!

I loved that they included a section on formal analysis, and what i feared would be dense math that I would want to skip over turned out to be simple enough to make sense and add to the paper.

Random thought

I am also taking CS270 Algorithms this semester and think that exposure to nearly all of the theoretical algorithms we are studying would be much more effective if they were presented in the context of the systems that they are most useful in. This is why I love these networking papers so much, in spite of my/our ever sounding complaint that these aren't practital enough for Cisco to pick up.

Tuesday, November 18, 2008

DOT - An Architecture for Internet Data Transfer

The idea here is to have a service for data transfer over the Internet. They propose that a clean interface for data arbitrary data transfer which separates the mechanism for content negotiation from data transfer.

My objection is that such a unified service introduces a new single point of failure into the network.

Not to mention the standard criticism that this is a pie in the sky, with not practical, implementable solution that could be used to improve the data transfer mechanisms in use today.

Thursday, November 13, 2008


X-Trace and Andy, sitting in a tree...

Yeah, if I could marry X-Trace, I probably would.

Matei and I worked on X-Tracing Hadoop last year. See the project page here: http://radlab.cs.berkeley.edu/wiki/Projects/X-Trace_on_Hadoop

And right now, I am working with some other X-Trace folks on a CS270 project involving X-Trace graph processing that some of y'all might find interesting. The very earliest version of our project proposal (which will hopefully turn into a full blown project report over the next four weeks) is available for a quick read here: http://docs.google.com/Doc?id=dswvrwj_125hj6fh2dn

Thursday, November 6, 2008


Hey, I've used the i3 cluster, cool!

The idea here is that the original point to point design of the Internet is not conducive to certain applications such as multicast, anycast, and mobile hosts. The solution is a generic overlay network on which these networking models can easily be built. This avoids the problem that application level solutions encounter which is construction of redundant underlying mechanisms between multiple applications.

i3 works by presenting a service which allows sources to send to a sort of virtual host (i.e. a logical identifier), and then receivers can pull from that identifier.

This paper provides a very thorough look at the idea they are presenting. I really liked that the paper was so broad in its analysis of the idea, implementing several non-trivial applications (such as a mobile host solution and scalable multicast), discussing scalability, security, efficiency, deployment, etc. I also like that it built on another interesting Berkeley research project (Chord).

Wednesday, November 5, 2008

Delegation Oriented Architecture

Middle boxes, though scorned by Internet architect purists, are here to stay. Incrementally adapting the architecture of the Internet to play better with middle boxes can lead to improvements in performance, and more flexibility in

They propose two properties:
1) packets contain the destination, which is represented by a unique identifier (EID)
2) hosts can delegate intermediaries. I.e. middle boxes through which all traffic to the host must flow.

How it works
EIDs are resolved to IP addresses or to other EIDs. This is done by a resolution infrastructure, for which distributed has tables would work well.

Thoughts on the motivation
I am skeptical of their claim that Internet architect purists typically react to middleboxes with "scorn (because they violate important architectural principles) and dismay (because these violations make the Internet less flexible)."

They present two tenants of the architecture of the Internet which middleboxes violate.
1) all hosts should have a unique identifier (NAT obviously violates this).
2) only the host which was listed as the destination should inspect the packet's higher layer fields

However, I don't think that NAT is as bad as they claim. They argue that NAT "has hindered or halted the spread of newer protocols [such as one I have never heard of, SIP] ... and peer-to-peer systems." To me, these are not compelling reasons to bide by tenant 1, neither is the argument that Internet architects think that tenant 1 is elegant or pure.

In section 2.2. they present some more reasons why NAT is a pain. They claim that it is hard to set up servers behind a NAT. From my understanding though, NAT is a policy decision which is usually used to protect the user hosts of an organization, not its servers. Servers can, should, and are currently treated specially, often living outside of the NAT walls. I don't agree with the authors that this is a weakness of NAT.

Thoughts on the potential adoption of DOA:
From the paper: "DOA requires no changes to IP or IP routers but does require changes to host and intermediary software." -- Sadly, any modification to the Internet that requires a significant change to host software (except perhaps IP6 which is going to be do or die soon) will probably never leave the ground. Not only do they propose a new network stack on the end hosts, applications would also need to be ported to use the new interface (which they present as an extension to the Berkeley sockets interface). What is more, DOA also would require the creation of an entirely new resolution system for the EID->IP/EID mappings.

Finally, they themselves admit in 3.4 that they are not offering any new functionality, but rather a new architecture. While interesting from a research perspective, to me this admission symbolizes the reason DOA is not a practical reality.

Monday, November 3, 2008

DNS Caching

This paper used a combination of simulation and analysis of real world traces for a study of the use of DNS and the effects of caching on it.

They discovered that DNS lookups follow a ziphian distribution and that negative caching is useful.

Development of the DNS

I have set up BIND and MS DNS servers before, but never thought about DNS from a research perspective. Also, before reading this paper, I didn't realize that the HOSTS.TXT file was the primary address resolution mechanism for the Internet before DNS came around.

I thought that the paper was written well, focusing on the parts of the problem that were interesting from a [distributed] systems perspective, and thus has that sort of timeless feeling that so many of the classic systems paper seem to have, as valuable for its historical perspective as it is for the systems principles it presents.

The paper covers the basics of how DNS works, some of the design trade offs the engineers faced and finishes with three discussions, surprises, what worked, what didn't work. I very much enjoyed this layout.

Some other thoughts/nitpicks:
  • It is interesting that "provide tolerable performance" came last on the list of design assumptions behind DNS.
  • I thought the negative caching surprise was very cool, and it had never occurred to me that DNS used such a mechanism.
  • Their discussion of using datagrams (UDP?) and not TCP was confusing. On the one hand they see much better performance, but on the other hand they realize that they are reinventing much of TCP.
  • in 5.4, they say that when root servers return the name of a host they also pass the ip address of that host for free. which sort of queries does this apply to? wouldn't it be a name resolution query, in which case the ip address of the host was passed as input? I know I am simply missing something here.
  • Interesting point in 5.5 about how Administrators might choose short TTLs in a misguided move to get their changes (which are rare) to take effect rapidly.