Wednesday, February 25, 2009

PNUTs, SCADS, Hive

Notes on Tyson's presentation about PNUTS:
  • complaint: not enough control over replication
  • predicate to system, apply to the system before
  • hash or sorted tables
  • no foreign key constraints (not normalized?)
  • selection predicates, and projection from tables
  • no joins
  • per record timeline consistency, all replicas apply updates to record in same order. all updates go to a master, which moves around for performace
  • tyson likes that they took the right pieces out of the db to support the right pieces (read one record at a time)
  • no fine control over replication factor
  • rethink isolation?

Notes on Michael's SCADS talk:
  • Scale independence (borrowed idea from db papers, data independence)
    • No changes to app as you scale
    • latency stays constant
    • const/users stays constant
  • auto scale up/down
  • declarative performance consistency
  • perf. safe query language (cardinality constraints)
  • give developers axis that they understand:
    • performance SLA
    • read consistency (freshness)
    • write consistency
    • durability SLA
    • Session guarantees (monontonic reads, read your own writes)

Thursday, February 19, 2009

Dynamo

What is the problem? Is the problem real?
Highly available, partition tolerant, high performance value-key store.

What is the solution's main idea (nugget)?
Use various well known distibuted systems techniques to build a flexible storage solution.
-consistent hashing (sounds like Chord)
-object versioning (sounds like BigTable)
-"sloppy quorum" (from talk) for consistency
-"decentralized replica synchronization protocol"

Why is solution different from previous work? (Is technology different?, Is workload different?, Is problem different?)
Their demands are slightly different than the related work (e.g. Chord), most of which they build off of. They do it at scale and it really works, and is in production!

Questions/criticisms
How is this different from S3? - provides control over availability/performance tradeoff

Thursday, February 12, 2009

Google's BigTable

Summary
I'm going to be repeating myself during my summary. I love reading these papers. As I learn more about system, particularly distributed systems, these papers get even more beautiful.

Sixty Google products use big table: Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth.

Trade off the full relational model for simlicity and dynamic control, and some transparency into data locality through schemas (column groups?)

Arbitrary string indexes (for column and row names).

Data stored as uninterpreted string blobs.

Choice of storing on memory or disk

Data model:
-----------------
(row:string, column:string, time:int64) → string

Rows
-Table is sorted by row key
-Every read or write of data under a single row key is atomic
-Rows partitioned into tablets - mechanism of getting locality so pick good names (e.g. urls in reverse order)

Column famillies
-access control
-number of col. families should be small
-all cols in same family compressed together
-accounting

Timestamps
-auto or manual
-garbage collection based on col. family rules

Other infrastructure
-GFS
-"cluster management system"
-sstable - file format for GFS - smart structure - minimize disk accesses
-chubby - distributed lock service
-paxos, for master election

Architecture:
==========
Client library

Master:
-track tablet servers

Tablets server:
-stored in GFS, replication, organized and served up by tablet servers
-commit record for redo logging
-metadata table, composed of root tablet and others.
-root table location stored in chubby

Apps:
Examples of applications that are using it.
800TB for a web crawl

What is the problem? Is the problem real?
Highly scalable data center storage.

What is the solution's main idea (nugget)?
wide applicability, scalability, high performance, and high availability

Why is solution different from previous work? (Is technology different?, Is workload different?, Is problem different?)
SCALE. Database people did this in the 70's, just ask them. but not at this scale. 500 tablet servers, no doubt petabytes by now.

Not a beautiful new idea, but a beautiful application of old ideas, i.e. engineering. bullet proof system tackling before untouched problems/scales.

high level lessons:
--------------------------
-read the lessons section
-scale is hard. bit flip in layer3 router at yahoo, causing hard to debug sofware level bug. Solutin: build simple systems that do one thing well, and then glue them together.
-simplicity
-make files (SSTables) immutable, use garbage collection on them
-related work, they say other systems included too many features (chord) such as churn, heterogeneity, trust.
-distribute everything possible
-no single point of failure (chubby, master not bottleneck)
-scale = better perf. amdahls law, figure out how much you CAN parallelize, then do it. master doesn't hold all metadata
-measure everything
Conclusion:
-great engineering
-compression (lots of duplication with versioning = big wins)
-single commit log per tablet server
-client library caching tablet location

Monday, February 9, 2009

All I Need is Scale! - Slide deck review

Notes on eBay slide deck:
-F500 slide on advantages of elasticity. cost over time before and after image looks familiar to the abovetheclouds work we've been doing recently, paper to come soon.
-Actually, the "faux PaaS" all look somewhat familiar to our "challenges and opportunities" as well.
-Research Challenges slide is a good source for project ideas for this cloud class! PAR Lab folks are tackling the multicore via programming languages approach. I particularly liked the n-resource, m-constraint and model-driven architecture challenges.

Sunday, February 8, 2009

eBay's scaling Odyssey

Notes and thoughts as I read through the assigned eBay slides:

The first slide claims that they "roll 100,000+ lines of code every two weeks. That is pretty incredible, considering how little the basic functionality of eBay seems to have changed in the last 5 years. Where is all of that code going? Is it all under the cover changes used to support better scaling, or is it code to support new features that I just don't use?

I was using eBay last week, and it defaulted to a "new" interface, which did not allow me to use the back button (in Safari) to return to the page of search results after viewing an item. How could they get that wrong!? Sorry, just me venting about how, in spite of all of the energy and thought that they clearly put into their service, in my opinion, eBay seems to get better and worse over time, not simply better, as you would expect.

Interesting point partitioning session state from application level since user info and state must span diverse array of existing and future applications within the eBay framework (e.g. eBay and eBay motors).

They claim to support multiple levels of SLAs as a function of the "given logical consumer." I wonder exactly what differentiates one logical consumer from another, e.g. do more active eBay users get better response times? Or are the users they are referring to here internal to eBay?

I like that they use a different consistency model for different elements of their systems. I doubt that they use a single storage system that supports the diversity of their consistency requirements. I doubt it; I imagine they use a different system for each.

I like the sound of a Model-centric architecture because it sounds a lot like RAD Lab rhetoric.

I had to look up what an Erlang-B is.

I especially liked the Ronald Coase (Nobel Laureat) quote on the third to the last slide with regards to whether eBay will be moving into a "cloud" and think it has special relevance to the course on cloud computing for which this reading was assigned because it speaks to the economic justifications for the existence of corporations, which are very similar to the economic justifications for the existence of public and private computing clouds. eBay might find themselves moving in the direction of what we at berkeley have dubbed a "private cloud".

Wednesday, February 4, 2009

DCell: Recursive Data Center Networking

The main idea here is to capitalize more on the recursive nature of networks to provide better performance and fault tolerance.

Policy Aware Switching

I blogged here about the policy aware switching paper last semester when we read it for Randy's networking class.

A Scalable Commodity Data Center Network Architecture

What is the problem? Is the problem real?
Using high end interconnects like Infiniband and Myranet is too expensive so we want to build Datacenter networks out of commodity switches and routers.

What is the solution’s main idea (nugget)?
They design a fat tree network architecture for the datacenter entirely out of low cost commodity components with the scaling, economic, and backward compatibility properties that we desire.

Why is solution different from previous work? (Is technology different?, Is workload different?, Is problem different?)
I believe that the use of fat trees in the data center is a very old idea.

Here is a blog post I wrote last semster about a talk on this research that Amin Vahdat came and gave at the weekly Berkeley systems seminar.