Thursday, September 11, 2008

Fat Trees for a scalable data center network architecture

Amin Vahdat from UCSD spoke at the UC Berkeley Systems Seminar today, this talk was a SIGGCOM talk. I scribbled some random notes.

Neworking is super expensive. Marketing literature brags about being cheap when they offer rates of $4000 per port (which is crazy compared to the expense of the actual pc of ~$2000).

They propose something like a Network Of Inexpensive Switches (NOIS - not the official name). Essentially their idea is to use a Fat-Tree!
The idea has been around for 50 years.
Why hasn't it been done?

High level issues with fat-tree that need addressing
  • We have path diversity, but existing routing protocols don't take advantage of that.
  • Cabling explodes!
  • We use 8 port switches throughout, lots and lots of them! Too many?
How is Intel doing this?
  • Routing
    • Localized load balancing switch by switch
    • Utilize our global knowledge for routing! Logically centralized routing brain (replicated, of course). The routers report their statuses to the CS (Central Scheduler).
    • Two level look-up (two tier routing table)


  • Simulations
    • Used Click Modular software router
    • The numbers looked good? Need more workloads. They are building a real prototype


  • Multicast
    • Again, use central scheduling
    • Central scheduler sets up routing tables for multicasting knowledge (eliminating broadcasts to unnecessary "pods" at earliest available point in routes)
    • If multicast pipe grows too large, the CS will know because it is getting reports from all routers, in response to status updates


  • Cabling
    • 14 tons of cabling in a 27,000 node DC!
    • Addressed by using optical. This will cost lots and lots of money, but they think people will buy into it.


  • Power and Cooling
    • I didn't catch this part, I think he might have said "we're working on this still"???

  • Work in progress: they're building it as we speak.

My thoughts
  • This seems a little bit like a bunch of small hypercubes?
  • They just cop out and fall back to optical anyway? Won't this cancel out any price wins? Why can't we do better cabling under the floor? A huge matrix of many many ethernet cable equivalents. 

No comments: