[gu-l] (04/24/04) Distributed Computing System of Akamai
Technologies, Inc.
Takeshi Utsumi, Ph.D.
utsumi at columbia.edu
Sat Apr 24 15:39:34 EDT 2004
<<April 24, 2004>>
Archived distributions can be retrieved by clicking "Correspondence" in our
home page at <http://www.friends-partners.org/GLOSAS/>.
For those after 2/27/01, see or bookmark:
<http://www.friends-partners.org/pipermail/gu-l/> and click on "Date,"
For example. The most recent archives are the bottom line.
Pierluigi Ritrovato <ritrovato at crmpa.unisa.it>
Ralph. C. Huntsinger, Prof., Dr. <drralph at stormnet.com>
Yamasawa, Kiyohito, Dr.Eng. <Yamasaw at Gipwc.shinshu-U.ac.jp>
Dr. Yasuharu Suematsu <suematsu at nii.ac.jp>
Dear Pierluigi:
(1) Referring to our previous correspondences, our ³Globally Collaborative
Environmental Peace Gaming (GCEPG)² (*) project is to be a joint
demonstration project of your ³European Learning GRID Infrastructure
(ELeGI)² (**) project.
1. (*)
http://www.friends-partners.org/GLOSAS/Global_University/Global%20University
%20System/UNESCO_Chair_Book/Manuscripts/Part_IV_Global_Collaboration/Utsumi,
%20Tak/GCEPG_D10_Web/GCEPG_D10.htm
2. (**)
http://www.friends-partners.org/GLOSAS/Peace%20Gaming/ELeGI/12-3-03-Berlin/1
2-3-03_ELeGE_Workshop.html
(2) In your recent msg, you indicated your willingness to approach to the
EU-US Cooperation Workshops on Science and Technologies for Learning
organized by the Learning and Training Unit of the European Commission and
the NSF, for the
funding for this joint project.
You then said that, in such an event, the ultimate scheme of our joint
project needs to be sufficiently large enough for their consideration.
(3) For this, pls refer to Figure 8 of the Item (1)-1 above. I think that
our GCEPG network with parallel fashion of the Globally Distributed Climate
Simulation System (GDCSS) and Globally Distributed
Socio-Economic-Environmental Simulation System (GDSEESS) is large (maybe,
too large!? -- or maybe not see Item (6) below).
(4) However, in my following previous list distribution (***), I informed to
you a similar globally distributed climate simulation system already
developed by some of our colleagues at the U.K. Open University -- see
http://www.climateprediction.net/index.php
> (***) (04/18/04) Globally Collaborative Environmental Peace Gaming (GCEPG)
> Project
> http://www.friends-partners.org/pipermail/gu-l/2004q2/000271.html
(5) Pls read through ATTACHMENT I below.
As you see in it, our total system of the Figure 8 of the Item (1)-1 above
may be similar to the globally distributed computing system of Akamai
Technologies, Inc., though theirs is for retrieving web pages, but ours is
to have various simulation models in distributed computers which will be
interacting each other and most importantly game players in the loops of
the simulation cycles.
(6) My previous list distribution made the following comparisons;
> ³(01/08/04) Short paper on GUS with GCEPG Project²
> http://www.friends-partners.org/pipermail/gu-l/2004q1/000254.html
> 1. Earth Simulator (NEC/Japan) 35.86 TeraFlop/second (US$350 million)
> --> US$ 9.8 million/TeraFlop/second
> 2. Los Alamos Lab (IBM/ASCIQ) 13.88 TeraFlop/second (US$215 million)
> --> US$15.5 million/TeraFlop/second
> 3. Virginia Tech (Apple/X) 10.28 TeraFlop/second (US$ 5.2
> million) --> US$ 0.5 million/TeraFlop/second
It seems that Akamai¹s approach will bring much lower and better
cost-effectiveness figure of US$/TeraFlop/second than Appple/X my recent
trip to Japan found that Japanese scientists have started to realize that
they made a big mistake with the Earth Simulator.
On the other hand, as said in ATTACHMENT I, those supercomputers are at
single locations for the use by scientists, but ours (and Akamai¹s) are
distributed inexpensive servers around the world for layman¹s use
hopefully including K-12 students¹ learning purposes even with streaming
videoconferencing see;
http://www.akamai.com/en/html/about/press/press58.html
As said elsewhere before, once this system is deployed, in addition to our
GCEPG project, it can be used for many globally joint research and
development with the use of virtual reality and virtual laboratories in
various fields, thus bringing profound paradigm shift to the conventional
academic ³Ivory Tower² approach.
Although we need some more exploration of Akamai¹s system, we may be able to
utilize their global infrastructure without re-inventing a wheel, and
concentrating on the development of interface software and simulation
models.
(7) When you will meet with your friend of the EC/NSF program, you may
mention of these.
Looking forward to hearing the results of your mtg with him,
Best, Tak
P.S.:
Dear Prof. Yamasawa and Dr. Suematsu:
(a) Pls note the above.
ATTACHMENT I
(Underlines and colors are T. Utsumi¹s emphases.)
<<April 21, 2004April 21, 2004>>Excerpt from;
http://www.technologyreview.com/articles/wo_garfinkel042104.asp?trk=nl
Google and Akamai: Cult of Secrecy vs. Kingdom of Openness
The king of search is tapping into what may be the largest grid of computers
on the planet. And it remains extraordinarily secretive about its core
technologiesperhaps because it senses a potential competitor in dotcom era
flameout Akamai.
By Simson Garfinkel
April 21, 2004
³You should never trust this number,² said Martin Farach-Colton, a professor
of computer science at Rutgers University, speaking a little more than a
year ago. ³People make a big deal about it, and it¹s not true.²
Farach-Colton was giving a public lecture about his two-year sabbatical
working at Google. The number that he was disparaging was in the middle of
his PowerPoint slide:
> * 150 million queries/day
The next slide had a few more numbers:
> * 1,000 queries/sec (peak)
> * 10,000+ servers
> * More than 4 tera-ops/sec at daily peak
> * Index: 3 billion Web pages
> * 4 billion total docs
> * 4+ petabytes disk storage
A few people in the audience started to giggle: the Google figures
didn't add up.
I started running the numbers myself. Let's see: ³4 tera-ops/sec² means
4,000 billion operations per second; a top-of-the-line server can do perhaps
two billion operations per second, so that translates to perhaps 2,000
serversnot 10,000. Four petabytes is 4x1015 bytes of storage; spread that
over 10,000 servers and you'd have 400 gigabytes per server, which again
seems wrong, since Farach-Colton had previously said that Google puts two
80-gigabyte hard drives into each server.
And then there is that issue of 150 million queries per day. If the system
is handling a peak load of 1,000 queries per second, that translates to a
peak rate of 86.4 million queries per dayor perhaps 40 million queries per
day if you assume that the system spends only half its time at peak
capacity. No matter how you crank the math, Google's statistics are not
self-consistent.
³These numbers are all crazily low,² Farach-Colton continued. ³Google always
reports much, much lower numbers than are true."
Whenever somebody from Google puts together a new presentation, he
explained, the PR department vets the talk and hacks down the numbers.
Originally, he said, the slide with the numbers said that 1,000 queries/sec
was the ³minimum² rate, not the peak. ³We have 10,000-plus servers. That¹s
plus a lot.²
Just as Google¹s search engine comes back instantly and seemingly
effortlessly with a response to any query that you throw it, hiding the true
difficulty of the task from users, the company also wants its competitors
kept in the dark about the difficulty of the problem. After all, if Google
publicized how many pages it has indexed and how many computers it has in
its data centers around the world, search competitors like Yahoo!, Teoma,
and Mooter would know how much capital they had to raise in order to have a
hope of displacing the king at the top of the hill.
Google has at times had a hard time keeping its story straight. When vice
president of engineering Urs Hoelzle gave a talk about Google¹s Linux
clusters at the University of Washington in November of 2002, he repeated
that figure of 1,000 queries per secondbut he said that the measure was
made at 2:00 a.m. on December 25, 2001. His point, obvious to everybody in
the room, is that even by November 2002, Google was doing a lot more than
1,000 queries per secondjust how many more, though, was anybody¹s guess.
The facts may be seeping out. Last Thanksgiving, the New York Times reported
that Google had crossed the 100,000-server mark. If true, that means Google
is operating perhaps the largest grid of computers on the planet. ³The
simple fact that they can build and operate data centers of that size is
astounding,² says Peter Christy, co-founder of the NetsEdge Research Group,
a market research and strategy firm in Silicon Valley. Christy, who has
worked in the industry for more than 30 years, is astounded by the scale of
Google¹s systems and the company¹s competence in operating them. ³I don¹t
think that there is anyone close.²
It¹s this ability to build and operate incredibly dense clusters that is as
much as anything else the secret of Google¹s success. And the reason,
explains Marissa Mayer, the company¹s director of consumer Web products, has
to do with the way that Google started at Stanford.
Instead of getting a few fast computers and running them to the max, Mayer
explained at a recruiting event at MIT, founders Sergey Brin and Larry Page
had to make due with hand-me-downs from Stanford¹s computer science
department. They would go to the loading dock to see who was getting new
computers, then ask if they could have the old, obsolete machines that the
new ones were replacing. Thus, from the very beginning, Brin and Page were
forced to develop distributed algorithms that ran on a network of not-very
reliable machines.
Today this philosophy is built into the company¹s DNA. Google buys the
cheapest computers that it can find and crams them in racks and racks in its
six (or more) data centers. ³PCs are reasonably reliable, but if you have a
thousand of them, one is going to fail every day,² said Hoelzle. ³So if you
can just buy 10 percent extra, it¹s still cheaper than buying a more
reliable machine.²
Working at Google, an engineer told me recently, is the nearest you can get
to having an unlimited amount of computing power at your disposal.
The Kingdom of Openness
There is another company that has perfected the art of running massive
numbers of computers with a comparatively tiny staff. That company is
Akamai.
Akamai isn¹t a household word now, but it did make the front pages when the
company went public in November 1999 with what was, at the time, the fourth
most successful initial public offering in history. Akamai¹s stock soared
and made billionaires of its founders. In the years that followed, however,
Akamai has fallen on hard times. It wasn¹t just the dot-com crash that
caused significant layoffs and the abandonment of the company¹s California
offices: Akamai¹s cofounder and chief technology officer Danny Lewin was
aboard American Airlines Flight 11 on September 11 and was killed when the
plane was flown into the World Trade Center. Company morale was devastated.
Akamai¹s network operates on the same complexity scale as Google¹s.
Although Akamai has only 14,000 machines, those servers are located in 2,500
different locations scattered around the globe. The servers are used by
companies like CNN and Microsoft to deliver Web pages. Just as Google¹s
servers are used by practically everyone on the Internet today, so are
Akamai¹s.
Because of their scale, both Akamai and Google have had to develop tools and
techniques for managing these machines, debugging performance problems, and
handling errors. This isn¹t software that a company can buy off the
shelfthey require laborious in-house development. It is, in fact, software
that is one of Akamai's key competitive advantages.
Yes, a few other organizations are also running large clusters of computers.
Both NASA's Ames Research Center and Virginia Tech have large clusters
devoted to scientific computing. But there are key differences between these
systems and the clusters that both Google and Akamai have created. The
scientific systems are located in a single place, not spread all over the
world. They are generally not directly exposed to the Internet. And perhaps
most importantly, the scientific systems are not providing a commodity
service to hundreds of millions of Internet users every day: Google and
Akamai must deliver 100 percent uptime. It¹s easy to go out and buy 10,000
computersall you need is cash. It¹s much harder to make those computers all
work together as a single service that supports millions of simultaneous
users.
To be fair, there are important differences between Google and
Akamaidifferences that assure that Google won¹t be breaking into Akamai¹s
business anytime soon, nor Akamai moving into Google¹s. Both companies have
developed infrastructure for running massively parallel systems, but the
applications that they are running on top of those systems is different.
Google¹s primary application is a search engine. Akamai, by contrast, has
developed a system for delivering Web pages, streaming media, and a variety
of other standard Internet protocols.
Another important difference, says Christy, ³is that Akamai has had a very
hard time creating a clear business model that works, whereas Google has
been unbelievably successful.² Akamai has thus started looking for new ways
that it can sell services that only a massive distributed network can
deliver. Struggling for profitability, the company has been aggressively
looking for new opportunities for its technology. This might be the reason
that Akamai, unlike Google, was willing to be interviewed for this article.
³We started with basic bit deliveryobjects, photos, banners, ads," says Tom
Leighton, Akamai¹s chief scientist. "We do it locally. Make it fast. Make it
reliable. Make the sites better.²
Now Akamai is developing techniques for letting customers run their
applications directly on the company's distributed servers. Leighton says
that 25 of Akamai¹s largest customers have done this. The system can handle
sudden surges, making it ideal for cases where it is impossible to
anticipate demand.
For example, says Leighton, Akamai¹s network was used to handle a keyboard
giveaway contest sponsored by Logitech. Thinking that its contest might be
popular, Logitech created an elaborate series of rules, assuring that only
so many keyboards would be given away to every state and within any given
time period. But Logitech grossly underestimated how many people would click
in to the contest. In the past, such underestimates have caused highly
publicized Internet events like the Victoria¹s Secret webcast to crash,
frustrating millions of Web surfers and embarrassing the company. But not
this time: Logitech¹s contest ran on the Akamai network without a hitch.
Of course, Logitech could have tried to build the system itself. It could
have designed and tested a server capable of handling 100 simultaneous
users. That server might cost $5,000. Then Logitech could have bought 20 of
those servers for $100,000 and put them in a data center. But a single data
center could get congested, so it might make more sense to put 10 of them
in one data center on the East Coast and 10 in another data center on the
West Coast. Still, that system could only handle 2,000 simultaneous users:
it might be better to buy 100 servers, for a total cost of $500,000, and put
them at 10 different data centers. But even if they had done this, the
engineers at Logitech would have had no way of knowing if the system would
actually have worked when it was put to the testand they would have
invested a huge amount of money in engineering that wouldn¹t have been
needed after the event.
And contests aren¹t the only thing that can run on Akamai¹s network.
Practically any program written in the Java programming language can run on
the company¹s infrastructure. The system can handle mortgage applications,
catalogs, and electronic shopping carts. Akamai even runs the backend for
Apple¹s iTunes 99-cent music service.
Perhaps because Akamai is so proud of the system that it has built, the
company is very open about the network's technical details. Its network
operations center in Cambridge, MA, has a glass wall allowing visitors to
see a big screen with statistics. When I visited the company in January, the
screen said that Akamai was serving 591,763 hits per second, with 14,372
CPUs online, 14,563 gigahertz of total processing power, and 650 terabytes
of total storage. On April 14, the number had jumped to a peak rate of
900,000 hits per second and 43.71 billion requests delivered in a 24-hour
period. (Akamai wouldn¹t disclose the number of CPUs online because that
number is part of its quarterly earnings report, to be released on April 28.
³But it hasn¹t changed much,² the company¹s spokesperson told me.)
Mail and Scale
Looking forward, a few business opportunities have obvious appeal to both
Google and Akamai. For example, both companies could take their experience
in building large-scale distributed clusters to create a massive backup
system for small businesses and home PC users. Or they could take over
management of home PCs, turning them into smart terminals running
applications on remote servers. This would let PC users escape the drudgery
of administering their own machines, installing new applications, and
keeping anti-virus programs up to date.
And then there is e-mail. Back on April 1, Google announced that it was
going to enter the consumer e-mail business with an unorthodox press
release: "Search is Number Two Online ActivityEmail is Number One: 'Heck,
Yeah,' Say Google Founders."
Since then, Google has received considerable publicity for the announced
design of its Gmail (Google Mail) offering. The free service promises
consumers one gigabyte of mail storage (more than a hundred times the
storage offered by other Web mail providers), astounding search through mail
archives, and the promise that consumers will never need to delete an e mail
message again. At first many people thought that the announcement was an
April Fools jokea gigabyte per user just seemed like too much storage. But
since the vast majority of users won¹t use that much storage, what Google¹s
promise really says is that Google can buy new hard drives faster than the
Internet¹s users can fill them up. [Editor's note: Google¹s proposal to fund
Gmail by showing advertisements based on the content of users' e-mail has
received significant criticism from a variety of privacy activists. Earlier
this month a number of privacy activists circulated a letter asking Google
to not launch Gmail until these privacy issues had been resolved. Simson
Garfinkel signed that letter as a supporter after this article was written
but before its publication.]
Google¹s infrastructure seems well-suited to the deployment of a service
like Gmail. Last summer Google published a technical paper called The Google
File System (GFS), which is apparently the underlying technology developed
by Google for allowing high-speed replication and access of data throughout
its clusters. With GFS, each user¹s e-mail could be replicated between
several different Google clusters; when users log into Gmail their Web
browser could automatically be directed to the closest cluster that had a
copy of their messages.
This is hard technology to get rightand exactly the kind of system that
Akamai has been developing for the past six years. In fact, there¹s no
reason, in principle, why Akamai couldn't deploy a similar large-scale
e-mail system fairly easily on its own servers. No reason, that is, except
for the company¹s philosophy.
Leighton doesn¹t think that Akamai would move into any business that
required the company to deal directly with end users. More likely, he says,
Akamai would provide the infrastructure to some other company that would be
in a position to do the billing, customer support, and marketing to end
users. ³Our focus is selling into the enterprise,² he says.
George Hamilton, an analyst at the Yankee Group who covers enterprise
computing and networking, agrees. Hamilton calls the idea of Google
competing with Akamai ³far-fetched.² But Google could hire Akamai to
supplement Google¹s technology needs, he says.
Still, such a partnership seems unlikelyat least on the surface. Google
might buy Akamai, the way the company bought Pyra Labs in February 2003 to
acquire Pyra's Blogger personal Web publishing system. But Akamai, with its
culture of openness, doesn¹t seem like a good match to secretive Google¹s.
Then there is the fact that 20 percent of Akamai¹s revenue now comes
directly from Microsoft, according to Akamai's November 2003 quarterly
report. Google¹s rivalry with Microsoft in Internet search (and now in
e-mail) has been widely commented upon in the press; it is unlikely that the
company would want to work so closely with such a close Microsoft partner.
Ted Schadler, a vice president at the market research firm Forrester, says
that it¹s possible to envision the two companies competing because they are
both going after the same opportunity in massive, distributed computing. ³In
that sense, they have the same vision. They have to build out a lot of the
same technology because it doesn¹t exist. They are having to learn lots of
the same lessons and develop lots of the same technologies and business
models.²
Schadler says Akamai and Google are both examples of what he calls
³programmable Internet business channels.² These channels are companies that
offer large infrastructure that can offer high quality services on the
Internet to hundreds of millions of users at the flick of a switch. Google
and Akamai are such companies, but so are Amazon.com, eBay and even Yahoo!.
³They are all services that enable business activityfoundation services
that [can be] scaled securely,² Schadler says.
³If I were a betting man,² Schadler adds, ³I would say that Google is much
more interested in serving the customer and Akamai is more interested in
provide the infrastructureit¹s retail versus wholesale. There will be lots
and lots of these retail-oriented services.²
If true, Google might suddenly find itself competing with a company that,
like Google itself, seemed to come out of nowhere. Except this time, that
company wouldn¹t have to figure out any of the tricks of running the massive
infrastructure itself.
And that explains why Google is so secretive.
Simson Garfinkel is the author of 12 books on information technology and its
impact.
List of Distribution
Pierluigi Ritrovato
Head of Unit
Centro di Ricerca in Matematica Pura ed Applicata (CRMPA)
Centre for Research in Pure and Applied Mathematics
C/O DIIMA
Dep. of Information Engineering and Applied Mathematics
University of Salerno
Via Ponte Don Melillo
84084 - Fisciano (SA)
Italy
Tel: +39 089 964289/2201;
+39 089 964189 (secretary)
Fax: +39 089 964191
Cel: +39 348-2803487
ritrovato at crmpa.unisa.it
ritrovato at crmpa.it
http://www.crmpa.it/
Ralph. C. Huntsinger, Prof., Dr.
Professor of Computer Science
Emeritus Professor of Mechanical Engineering and Manufacturing
Director Emeritus, the McLeod Institute of Simulation Sciences (MISS)
College of Engineering, Computer Science, and Technology (ECT)
California State University, Chico
Chico, CA 95929-0410
+1 530 343 3456 (Home)
Fax: +1 530 898 5995
drralph at ecst.csuchico.edu
drralph at stormnet.com
Yamasawa, Kiyohito, Dr.Eng.
Professor
Dept. Of Electrical & Electronic Engineering
Shinshu University
4-17-1 Wakasato, Nagano 380-8553
Japan
Tel: +81-26-269 51 96
Fax: +81-26-223 77 54
Isdn:+81-26-223-0228
Yamasaw at Gipwc.shinshu-U.ac.jp
Http://Yslab.shinshu-U.ac.jp
Dr. Yasuharu Suematsu
Director General
National Institute of Informatics
2-1-2 Hitotsubashi
Chiyoda-ku, Tokyo 101-8430
JAPAN
03-4212-2003
03-4212-2001
Fax: 03-4212-2004
suematsu at nii.ac.jp
http://www.nii.ac.jp/help-j.html
**********************************************************************
* Takeshi Utsumi, Ph.D., P.E., Chairman, GLOSAS/USA *
* (GLObal Systems Analysis and Simulation Association in the U.S.A.) *
* Laureate of Lord Perry Award for Excellence in Distance Education *
* Founder and V.P. for Technology and Coordination of *
* Global University System (GUS) *
* 43-23 Colden Street, Flushing, NY 11355-3998, U.S.A. *
* Tel/Fax: 718-939-0928; (day time only--prefer email) *
* Email: utsumi at columbia.edu; Tax Exempt ID: 11-2999676 *
* http://www.friends-partners.org/GLOSAS/ *
**********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /pipermail/attachments/20040424/920bb0bb/attachment-0001.htm
More information about the gu-l
mailing list