Share This

Wednesday, January 20, 2010

China Details Homemade Supercomputer Plans

China Details Homemade Supercomputer Plans

The machine will use an unfashionable chip design.

By Christopher Mims
Tuesday, January 19, 2010

It's official: China's next supercomputer, the petascale Dawning 6000, will be constructed exclusively with home-grown microprocessors. Weiwu Hu, chief architect of the Loongson (also known as "Godson") family of CPUs at the Institute of Computing Technology (ICT), a division of the Chinese Academy of Sciences, also confirms that the supercomputer will run Linux. This is a sharp departure from China's last supercomputer, the Dawning 5000a, which debuted at number 11 on the list of the world's fastest supercomputers in 2008, and was built with AMD chips and ran Windows HPC Server.

The arrival of Dawning 6000 will be an important landmark for the Loongson processor family, which to date has been used only in inexpensive, low-power netbooks and nettop PCs. When the Dawning 5000a was initially announced, it too was meant to be built with Loongson processors, but the Dawning Information Industry Company, which built the computer, eventually went with AMD chips, citing a lack of support for Windows, and the ICT's failure to deliver a sufficiently powerful chip in time.

The Dawning 6000 will be completed by mid-2010 at the latest, says Hu, and could be up and running as early as the end of 2010. It is the second time that a representative from the ICT has promised a supercomputer built entirely using Loongson processors.

The development of Loongson 3 began in 2001 as a product of China's 10th five-year program. All of the chips in the Loongson family are based on the MIPS instruction set--originally developed in the 1980s but now out of favor in desktop and server computers, although still used in many embedded devices. Currently, the Top 500 list is dominated by x86 chips, with non-x86 CPUs powering less than 15 percent of the high-performance systems on the list.

"This is a very high-performance MIPS architecture where, when it's run in a cluster configuration, it becomes very powerful," says Art Swift, vice president of marketing at Sunnyvale, CA-based MIPS Technologies, which developed the MIPS architecture.

A paper published in 2009 proposes using Loongson 3 chips in clusters of up to 16 cores to accomplish extremely high performance. Tom Halfhill, analyst at Microprocessor Report, calculates that in this configuration, meeting the petaflop performance mark (one quadrillion operations per second) could require as few as 782 16-core chips.

Halfhill says the Loongson 3 is little different from the latest-generation chip, Loongson 2F, which is already available in consumer PCs. The main differences are that it includes hardware translation of x86 instructions (used in most of the microprocessors made by Intel and AMD), and it incorporates multiple cores--from four up to a proposed 16--each capable of processing commands independently. Conspicuously absent from the Loongson 3 is multithreading, which allows a single core to execute multiple instructions simultaneously. (Both Intel and Sun have already incorporated multithreading into some of their chips.)

Generations 2 and 3 of the Loongson use the same general-purpose core, but the Loongson 3 tethers more cores together. A quad-core Loongson 3 chip is currently in prototype, and a final, 64-nanometer version of the chip was "taped out" in late December, meaning the final description of the chip will soon be sent to the manufacturer, STMicroelectronics.

While the quad-core Loongson 3 could find applications in everything from desktop PCs to set-top boxes (the chip incorporates additional instructions designed specifically to speed up multimedia playback), an eight-core version will likely be need for the proposed petascale supercomputer. That version will incorporate four regular cores, along with four "GStera" coprocessors designed especially for mathematically intensive calculations. These coprocessors are especially significant because they are better at handling intensive mathematical calculations, including the LINPACK test, which uses linear algebra to benchmark the world's fastest supercomputers, and to determine their ranking (and their owners' bragging rights) in the Top 500 list of supercomputers.

Jack Dongarra, the computer scientist who introduced the LINPACK benchmark, says that the proposed architecture of the Dawning 6000--multi-purpose cores coupled to coprocessors for certain types of mathematical calculations--follows the standard supercomputer design.

The quad-core Loongson 3 already incorporates two 64-bit floating-point units in each of its cores. So in theory it could be used as the commodity chip in a supercomputer. However, it would require vastly more of these cores to achieve the same processing power, says Dongarra.

Intel remains unfazed by the prospect of a new, state-sponsored contender in the field of high-performance computing. "Measuring competitive impact for a product that does not exist [yet] is always problematic, and we generally refrain from doing so," says Chuck Mulloy a spokesperson for Intel. "In our entire history there has never been a time when we didn't face a competitor. We don't expect that to change--in fact we welcome it."

Dongarra cautions that it's pointless to speculate about the performance of the forthcoming Dawning 6000 until benchmarks have been run, not least because the MIPS architecture is nonstandard in high-performance computing. "While I wish them well, I see a lot of challenges to making the whole system work, " says Dongarra. These challenges include having to adapt the software that Dawning runs.

Halfhill, who has traveled to the ICT in Beijing to report on the birth of the Loongson 3, believes that whatever the performance of the system, it's only a matter of time before China builds a home-grown chip competitive with those produced in the West. "Technically there's nothing to stop them from doing world-class processors," he says. "They've got architects and computer scientists just as smart as ours."

Comments
*
Another Me-too Chinese Project

Interesting article but what a waste of good research talent! The hard reality is that any new processor that does not solve the parallel programming crisis is on a fast road to failure. No long march to victory in sight for the Loongson, sorry.

China should be trying to become a leader in this field, not just another me-too follower. There is an unprecedented opportunity to make a killing in the parallel processor industry in the years ahead. Intel may have cornered the market for now but they have an Achilles' heel: they are way too big and way too married to last century's flawed computing paradigms to change in time for the coming massively parallel computer revolution. Their x86 technology will be worthless when that happens. The trash bins of Silicon Valley will be filled with obsolete Intel chips.

Here's the problem. The computer industry is in a very serious crisis due to processor performance limitations and low programmer productivity. Going parallel is the right thing to do but the current multicore/multithreading approach to parallel computing is a disaster in the making. Using the erroneous Turing Machine-based paradigms of the last sixty years to solve this century's massive parallelism problem is pure folly. Intel knows this but they will never admit it because they've got too much invested in the old stuff. Too bad. They will lose the coming processor war. That's where China and Intel's competitors can excel if they play their cards right.

The truth is that the thread concept (on which the Loongson and Intel's processors are based) is the cause of the crisis, not the solution. There is an infinitely better way to build and program computers that does not involve threads at all. Sooner or later, an unknown startup will pop out of nowhere and blow everybody out of the water.

My advice to China, Intel, AMD and the other big dogs is this: first invest your resources into solving the parallel programming crisis. Only then will you know enough to properly tackle the embedded systems, supercomputing and cloud computing markets. Otherwise be prepared to lose a boatload of dough. When that happens, there shall be much weeping and gnashing of teeth but I'll be eating popcorn with a smirk on my face and saying "I told you so".

How to Solve the Parallel Programming Crisis:
http://rebelscience.blogspot.com/2008/07/how-to-solve-parallel-programming.html
o
Intel did tried to abandon the x84 architecture until it blow up in its face; Remember Itanium?

Developing a whole new computer architecture require huge amount of resources and talent. I highly doubt ICT have the budget or staff to accomplish it. Face it, scientific program always get the short end of the stick, it's the same everywhere in the world.

You say: "Using the erroneous Turing Machine-based paradigms of the last sixty years ..." I recall learning that, based on TM, parallelizing by a factor of N can improve performance by at most a factor of N. Are you saying that there are parallel architectures that break TM paradigm and so get around this limitation?
o
Turing Machine
No. What I'm saying is that, if the Turing computing model (TCM) were the appropriate model for parallel processing, the industry would not be in the mess that it is currently in and you would not be reading this comment. Regardless of what has been claimed by the experts about universality, a Turing Machine models one thing and one thing only, a sequential computer.

The biggest problem with the TCM is that operation timing (other than the implicit sequentiality of execution) is not part of the model. What is needed is a computing model in which any two operations in a program can be unambiguously determined as being either sequential or parallel (simultaneous). This determinism is impossible with concurrent threads and therein lies the problem.

In sum, the computer industry must abandon threads altogether or resign itself to endure a lot of pain in the years ahead. The Loongson solves nothing. It's just more pain for the Chinese.

Fact Check!
"This is a sharp departure from China's last supercomputer, the Dawning 5000a, which debuted at number 11 on the list of the world's fastest supercomputers in 2008, and was built with AMD chips and ran Windows HPC Server."

WRONG!

As of November 2009, a Chinese system occupies the number 5 position on the TOP500 list. Tianhe-1, assembled by China’s National University of Defense Technology, attains a theoretical peak rate of 1.2 PFLOPS. It includes 2560 compute nodes, each with two quad-core Xeon processors for scalar workloads and two AMD Radeon 4870x2 GPUs for vector workloads.

1 comment: