Published: August 21, 2007
By Timothy Prickett Morgan
The annual Hot Chips conference for microprocessor and other chip designers is going on this week in Silicon Valley, and one of the hot--or rather, cool--new vendors showing off its new chip is a company called Tilera. The company, which is based on ideas from research performed at the Massachusetts Institute of Technology, has created a 64-core processor called the Tile64. This chip has a whole new instruction set architecture that, Tilera hopes, will shake up the appliance business soon and maybe the server market some day.
Tilera was founded in Santa Clara, California, in October 2004 and has been operating in stealth mode since that time; the company's research and development is still done in the tech corridor outside of Boston, in this case in Westborough. The Tile64 processor that is being announced today is based on an MIT project called Raw, which was funded by the U.S. National Science Foundation and the Defense Advanced Research Projects Agency, the research arm of the U.S. Department of Defense. The Raw project, which was initiated in 1996 and which delivered a 16-core processor connected by a mesh of on-core switches, delivered what was in essence the alpha version of the Tilera64 chip in 2002. One of the key components of that Raw project was the compiler technology that could harness the multicore architecture of the processor and the integrated switches that linked them together. Anant Agarwal, who worked on the first MIPS RISC processor at Stanford University in the 1980s and who had created a 32-node mesh-based cache coherent processor at MIT in 1994, had tackled many of these problems. The team that created the Tile64 processor includes techies who worked on Sun Microsystems' Sparcle and Digital Equipment's Alpha RISC processors, too, as well as networking systems from Cisco Systems and supercomputers from Hewlett-Packard and the long-since defunct Thinking Machines (also an MIT spinout).
Both DARPA and MIT have spent tens of millions of dollars investing in the research that has culminated in the Tile64 processor. According to Bob Doud, director of marketing at the company, Tilera has closed two rounds of venture capital funding to get the $40 million necessary to get a finished product, with a broad future roadmap, to market in 2007. Bessemer Venture Partners, Walden International, Columbia Capital, and VentureTech Alliance have all kicked in venture funding; the latter organization is the venture arm of Taiwan Semiconductor Manufacturing Company, Tilera's chosen foundry for the Tile64 chip.
The Tile64 chip is not based on any existing processor cores and their associated instruction sets, but rather is a new core developed from the ground up to take advantage of mesh networking on each core to create a large pool of compute resources that can be dedicated to running a single instance of Linux and its applications or carved up on the fly into virtual Linux images, each isolated from other virtualized slices. The Tile64 core is a 32-bit design, which uses RISC and VLIW concepts, rather than the 64-bit designs used on modern RISC and X64 processors. The core can do three instructions per clock cycle, and the chip's speed ranges from 600 MHz to 1 GHz. The Tile64 chip has 64 KB of L2 cache as well as L1 data and instruction caches that are 8 KB in size each. The switch that is at the heart of the Tile64 processor actually implements five different mesh networks--one each for memory access, streaming packet transfers, user data network, cache misses, and interprocess communications. Wrapped around the cores are four DDR2 main memory controllers, two Gigabit Ethernet ports, two PCI Express controllers, two 10 Gb/sec XAUI interfaces, and two flexible I/O interfaces to support peripherals such as compact flash memory or disk drives. The whole shebang is implemented in a 90 nanometer process.
There are a number of interesting architectural features of the Tile64 design. First, it does not use a bus architecture to talk to peripherals or to have processors and cache memory talk to each other. The Mesh network allows point-to-point communication between the chips and does away with bus architectures, which require high clock speeds and lots of energy--even four quad-core designs. The mesh network, according to Doud, allows for near linear scaling as processor cores are added to a chip--something that X64 and RISC chips using a bus design cannot do. This ability to scale, plus the low power consumed by the Tile64 chip when it is running and coupled with its homegrown Linux and development environment, will be enough to overcome the fact that this is a new instruction set--or so Tilera hopes. This is a pretty good bet for the embedded systems market, where the MIPS and PowerPC chips rule the roost.
"Our view is that the battle over instruction set architectures is over," explains Doud. "The processor core is the new transistor, and no one cares about ISAs unless they are coding in assembly language at this point." That is a paraphrase of a saying that Argawal, who is now chief technology officer at Tilera, has drilled into everyone's heads. Argawal also believes that by 2014, chip designers will cram 1,000 or more processor cores onto a chip.
Another interesting feature of the Tile64 chip is that the mesh network allows the L2 caches on each core to be used like a giant L3 cache in a traditional design. Basically, any core can look into the L2 cache of any other core on the chip, and treats that like a giant 5 MB L3 cache. While each core on the Tile64 chip can run its own complete instance of Linux, the cache coherency engendered in the mesh network means that a collection of cores can be setup to run an SMP variant of Linux, too. Because the mesh network controls all communication into and out of a core, a microcode feature called Multicore Hardwall Technology can partition a Tile64 into multiple virtual machines, allowing different instances of Linux and their applications to run on the chip and be isolated from each other.
The Tile64 chip supports a variant of the Linux 2.6 kernel and has a tweaked version of the open source GNU C compiler and the open-source Eclipse integrated development environment. Tilera has spent a lot of time grafting on graphical tools, based on the GNU debugger, to help programmers and administrators cope with the 64-core environment, since it is a slightly different paradigm from X64 or current RISC processors from the major Unix vendors. However, Tilera contends that standard C applications just compile and run on the chip, and that the SMP extensions and a socket-like stream communication library hooked into the Linux variant built for the Tile64 takes care of the rest. A C++ compiler is in the works, and it is hard to believe that a Fortran compiler is not far behind.
Tilera says that the 64-core Tile64 chip can deliver around 5 billion operations per second at 1 GHz, and says the chip consumes about 170 millwatts per core running at 600 MHz and 300 milliwatts per core running at 1 GHz. The Tile64 chip uses a power-conservation method called "clock gating" to put cores to sleep when they are not in use, conserving energy. Stacking the Tile364 up against an Intel "Woodcrest" dual-core Xeon 5160 processor running at 3 GHz, Doud says that the Tile64 chip delivers 10 times the performance, 10 times the performance per square inch, and 30 times the performance per watt of the Xeon chip. And compared to the digital signal processors commonly used by the handful in embedded devices where performance and low power consumption is an issue, Doud says the Tile64 can deliver 40 times the performance of a DM648 DSP from Texas Instruments, and that the chip delivers 10 times the performance per square inch, too.
While Tilera is not yet ready to take on the server or workstation markets with its chips, it is targeting two markets that are ripe for large increases in performance. The first is networking gear, such as switches, routers, and security appliances, which are increasingly being programmed to do all kinds of work that was usually done on the edge of the network. For instance, it makes sense to check for malware and viruses at every point packets move through devices on the network, and this requires a lot of computing power to do at the line speeds of a Gigabit Ethernet or 10 Gigabit Ethernet network. 3Com is already taking parts from Tilera for some of its future gear. The other initial market that Tilera is initially targeting with its 64-core chip and Linux environment is digital multimedia systems, including makers of video conferencing systems, digital surveillance devices, and cable and broadcast systems.
When Doud adds up the total addressable markets where chips like the Tile64 could play--including where custom chips, DSPs, various multicore designs (often based on the MIPS architecture), and X64 and RISC multicore chips all play--he gets a target that is about $54 billion in size. That's a pretty big opportunity.
And to chase it, Tilera is working on a 120-core kicker, the Tile120, and will also offer a cut-down 32-core chip, the Tile36. The Tile36 is expected in the first half of 2008, while the Tile120 is due in late 2008 to early 2009. Running at 600 MHz, the Tile64 costs $435 in 10,000-unit quantities. Pricing was not announced for the faster versions of the chip. Tilera has also created an appliance that plugs into a PCI Express slot that includes a single Tile64 processor and 12 Gigabit Ethernet ports; this appliance is available now.
RELATED STORIES
Sun Polishes Up Sparc T2 Multithreaded Chips
Linux Supercomputer Maker SiCortex Lands $10 Million in Funding