Tuesday, March 31, 2009

Erlang, Scala

What is the problem? Is the problem real?

Desire for functional programming language for distributed systems.

What is the solution's main idea (nugget)?

All of the introductions to the language state early on that Erlang provides many of the features that an OS does such as garbage collection, scheduling and concurrent processes. They are aiming at systems with these properties:
  • Soft real time
  • Millions of lines of code
  • 0% downtime
  • portable (I wonder why?)
  • Lots of parallel processes
  • IPC
  • Clustered environment
  • JIT code loading
  • Rostbustness via 3 error detection primitives

Why is solution different from previous work? (Is technology different?, Is workload different?, Is problem different?)

This is the first programming language I have studied which incorporates so many features of an operating system.

Hard Tradeoffs

  • level of abstraction vs. flexibility

Will it be influential in 10 years?

With millions of lines of code written it and much of the telecom network depending on it, it is already influential.

Friday, March 27, 2009

DTrace, XTrace, Friday


DTrace: Dynamic Instrumentation of Production Systems

DTrace is a tracing tool with the following features:
-entirely dynamic, has zero cost when not turned on
-centralized, one stop shop for all kernel and user level tracing
-built into Solaris, and subsequently other systems.
-guarantees no silent trace data corruption
-arbitrary actions can be used to enable probes
-predicates (filtering) at the source
-virtualized per consumer (so no weird sharing of probes)

->Providers are kernel modules responsible for instrumenting the system. the framework calls them and they call back into the dtrace kernel to set up a probe
->Probes are created and then advertised to consumers, who can enable them , which creates an enabling control block (ECB). if more than one consumer enables a probe, then each one has an ECB associated with it and when the provider fires that probe then the DTrace framework loops through all ECBs for it and runs each one independently (this is where per ECB predicates kick in and provide filtering of which probes will be activated per consumer)
->each ECB has a list of actions for the probe, the actions are run and can store stuff in per CPU mem buffer that each consumer has associated with it. Actions are limited to restrict the amount of overhead and damage they can impose (e.g. can't change registers). An ECB always storest the same amount of data in the consumer's mem buffer. actions can fail because mem buffer is full, i.e. the data is simply droped and buffers drop count is ++.
->Two buffers are kept per consumer per cpu, one active, one inactive. INTERRUPTS ARE DISABLED IN BOTH PROBE PROCESSING AND BUFFER SWITCHING
->Actions and predicates are specified in a simple RISC instruction set, also a C-like language called D

What is the problem? Is the problem real?
Today, performance testing tools are aimed at developers but the perfomnace testing itself is being done more by systems integrators, and tools for this level fo perf. testing are scarse.

What is the solution's main idea (nugget)?
A complex but amazingly well engineered low level tracing tool which can capture system calls, function boundries, kernel synchronization primitives, and many other metrics.

Why is solution different from previous work? (Is technology different?, Is workload different?, Is problem different?)
They adhere to a zero overhead (when not enabled) policy, also this system is implemented more ubiquituously and at a deeper level than any other tracing tracing tool I know of.

Can you write straight assembly code for your actions/predicates? (probably.) Do permissions ever become an issue? It seems like you need to run this as root, are there scenarios that would require not giving someone root but still giving them (potentially restricted) dtrace functionality.


This paper Presents a path based tracing tool which requires the instrumentation of source code to propagate a small constant amount of trace metadata and emit log statements. The log statements can be reconstructed offline into an partial ordering so that the relative timing of many events (e.g. function call or returns), even those events collected from different architectural layers of the system, can be known deterministically.

What is the problem? Is the problem real?
Doing finger pointing and event correlation in a distributed system is difficult.

What is the solution's main idea (nugget)?
We can augment the data execution path to pass around a small fixed amount of metadata which can be used to correlate events in a system distributed physically or even administratively.

Why is solution different from previous work? (Is technology different?, Is workload different?, Is problem different?)
Previous work did not focus on the cross layer aspect of causal tracing.

One criticism I have is that code instrumentation is a heavyweight tool, especially when it comes to low level libraries like RPC libs, and they don't present any work on techniques for automating the pain of this implementation.
They also cite Pinpoint, which they describe as being very similar to X-Trace. In the paper they say "Our work focuses on recovering the task trees associated with multi-layer protocols, rather than the analysis of those recovered paths." The lack of focus on analysis of trace data is actually a weakness, not something which should be used to differentiate the systems. Also, the analysis of the trace data presupposes the feasible collection of it, which makes me wonder if the Pinpoint system simply took the collection as trivial or as less interesting than the analysis

Artemis, Scribe, Chukwa


A metrics and log aggregation tool built on top of Dryad. Mostly a debugging tool for Dryad jobs, but cast as a tool for cluster and cloud services management in general, the developer can choose from thousands of metrics which Microsoft Windows exports for collection and filtering. The idea is to witness the impact of a Dryad job on individual nodes, particularly to spot unexpected behavior. Drawbacks: not open sources, built on MS technologies (e.g. .Net for UI, not web based)


An open source project created at Facebook, this system shines in its simplicity (collect logs) and its effective reuse of the lower level Thrift RPC project. Once logs are collected, they can be queried in an ad-hoc fashion using Hive.


A scalable log, metrics, and generic data collection framework built by Yahoo on top of Hadoop MapReduce and HDFS. The idea is to store everything possible in its raw form and do analysis later. This is in contrast to Artemis which does filtering before collection and stores the results in a much smaller database than Chukwas HDFS archival store.

Azure, Google Apps Security, AWS

Microsoft Azure

This first link takes us to a list of white papers that have been written about the up coming Azure cloud service, the second link points us to the the most helpful of these white papers, called Introducing the Azure Services Platform. The components of the MS cloud software stack are:
  • Windows Azure - Still vague, but will probably be a virtual machine management layer similar to that provided by Amazon's EC2 service.
  • .NET Services - Access control, service management, application integration.
  • SQL Services - Their storage system (like AWS SimpleDB)
  • Live Services -Ways of integrating with windows Live applications (their office SaaS)
The details of these are still quite vague, so it is yet to be seen if they will pull it all together, but at a high level they seem to have the key elements covered. I am curious about which level they intend to provide access control at. I'm guessing it is at the cloud consumer level, similar to the mechanisms AWS uses to keep separate users isolated. Amazon doesn't provide much in the way of a Service Bus but I'm not sure what the difference is between MS's Service Bus and classic SOA.

It seems like they are going to face tough decisions about how much of Azure will be integrating current MS technologies (like .NET) and how much of Azure will be new infrastructure. Also, they are already failing at keeping things simple and easy to understand the way Amazon is.

Google Security

Comprehensive review of security and vulnerability protections

This is a google white paper about the obvious security measures that Google employs to protect the massive amounts of personal data they collect from their users. I was very disappointed that they spent the entire paper telling us that they employ all of the security measures that we would expect from even a novice company worried about getting sued by customers over privacy violations.

Amazons AWS

This White Paper on 'Cloud Architectures' and Best Practices of Amazon's web services begins with a high level description of a web grep application that uses SQS, SimpleDB, EC2, and Hadoop. The second part of the paper contains high level advice for developing cloud applications, including:
  • Use scalable and elastic components (obvious)
  • Use loosely coupled components, use queues (cool)
  • Think parallel (lame)
  • Use elastic components (redundant)
  • Handle failures as the common case (basically Recovery Oriented Computing)
One thing they ignore is the extremely high cost of running a single grep query using their system: 1 query against 10 million documents takes 6 minutes and costs 10 dollars. Compare this to a google search of somewhere between 20 and >60 billion documents which returns in tens of milliseconds and costs 0 dollars.

MapReduce, Dryad, DryadLINQ


Already a classic, this paper introduced the world to MapReduce at scale in the cluster. Today we are looking to the next generation of parallel computation frameworks. It is interesting to look at the problems they solve with an eye towards identifying problems that don't fit this model. They tackle grep, word count (and related url access count), inverted index (and related inverted hyperlink graph), term vector per host (you don't hear all that much about this MR job, I wonder if apache Nutch has implemented this?), and sort.

Dryad and DryadLINQ

Mihai Budiu spoke in class today and presented a wonderful overview of both Dryad and DryadLINQ. Dryad accepts as input an execution flow in the form of an arbitrary DAG, which can be viewed as a series of stages similar to Unix pipes. DryadLINQ is a new programming language approach to expressing parallel computations which run on Dryad. It features a very SQL like syntax and integrates seamlessly into Visual Studio .NET.

Monday, March 2, 2009

Pig latin: a not-so-foreign language for data processing

Summary of Pig latin: a not-so-foreign language for data processing

What is the problem? Is the problem real?
Programmers don't like Declarative languages, but MapReduce is too low level, Pig Latin is something in between.

What is the solution's main idea (nugget)?
Pig latin is a language targeted at experienced programmers to do ad hoc (read only, scan centric) analysis of extremely large data.

Why is solution different from previous work? (Is technology different?, Is workload different?, Is problem different?)
MapReduce was lower level (pig latin compiles into MR) and SQL is too declarative (or so they claim). Hive (which wasn't previous work anyway) aims at SQL compiled into MR, which is distinctly different from Pig latin.

I like that their system doesn't need to import that data using a loading stage like other database systems do, instead you give it as input a function specifying how to get tuples out of the input file. In particular, as they point out, this allows for easier interoperability with the many other big data manipulation tools in use at somewhere like Yahoo.

Hard Tradeoffs
  • Declarative vs performance/control

Will it be influential in 10 years?
If by some merciful act of god Yahoo! doesn't implode, yes probably.

It was bold of them to state that programmers prefer to analyze data by writing "imperitve scripts or code" as opposed to using a declarative language. They sort of shrug off automated query optimization, which constitutes an extremely large body of research, as insufficient.