 |
Clear conference calendars: Register for Cell/B.E. apps workshop
Workshop coming in July at GA Tech: This two-day workshop (July 10-11, 2008 | agenda) will cover from ray tracing to LANL's Roadrunner, from applications on low-cost Cell/B.E. clusters to computer vision and digital imaging. It will address programmability issues like language and compiler, programming models and common runtime, and ISV programmability framework and tooling. There is no charge to attend; registrants must be registered by June 30, 2008. Please see disclaimer on use of "LANL Roadrunner" name.
Categories
: [ Cell | events ]
Jun 17 2008, 02:10:00 PM EDT
Permalink
|
Design on a dime: Beyond 45nm, is multithreading dead?
From DAC: Is multithreading really the best way to exploit multicore systems effectively?: A concerning question popped up at the recent 45th Design Automation Conference: "Is multithreading really the best way to exploit multicore systems effectively?" This reflected the efforts EDA vendors have been putting into adding mthreading capabilities to their tools to help with multicore design; problem is, at the 45nm node, more designs climb over the 100 million-gate mark and break current IC CAD tools. Parallel processing has traditionally relied on threads but threads sort of start bottoming out at four processors.
Read the detailed report to see what some of the best thinkers in the industry think about this question, including Gary Smith of Gary Smith EDA -- he thinks threads are dead: "It is a short-term solution to a long-term problem. Library- or model-based concurrency is the best midterm approach."
Categories
: [ general | news ]
Jun 17 2008, 12:34:00 PM EDT
Permalink
|
Nano, nano: Put tiny satellite in orbit, win a prize
You have until September 2011: The N-Prize ("Nanosatellite"/"Negligible Resources") is a competition to stimulate innovation around inexpensive access to space. To compete, you must launch a satellite weighing between 9.99 and 19.99 grams into Earth orbit and track it for a minimum of nine orbits. It must not cost more than US$2000 must be done before 19:19:09 (GMT) on September 19, 2011. The prize is about US$19,000.
Categories
: [ general | news ]
Jun 17 2008, 12:32:00 PM EDT
Permalink
|
Oddments: Qentangled images capture for first time
Entanglement on film: Quantum entangled images, in this case two random pictures physically separated but linked through their complementary features, have been captured in real time by researchers at the Joint Quantum Institute. They did it by using linked laser beams originating from a single point that produces twin images (a cat face, one inverted and the other backwards) at separate locations. For more on qentanglement, see "Storing nothing and doing it right!,"
"Photon encoding breaks record,"
"'It's your mother calling yesterday',"
"Qentanglement goes where no man ...,"
"Honey, get out the Qentanglement photo album," and
"Tangled up in that quantum net."
Saucy algorithm exploits symmetries to crack combinatorial problems: The torture level of the scourge of design automation math -- combinatorial problems like "what is the shortest route to send an Internet message around the world?" -- has been reduced by a new "saucy" algorithm. The Saucy algorithm's developers claim it can solve combinatorial problems by finding symmetries among large swaths of possibilities. (Symmetries are mathematical equivalent branches of a search, interchangeable options that lead to the same outcome so they only need to be calculated once. If you ID all the symmetries in a set before you start comparing outcomes, you can eliminate lots of "duplicates.") They claim that in a test of the previously mentioned Internet message problem, it found an optimum path in under a second.
Categories
: [ general | news ]
Jun 17 2008, 12:31:00 PM EDT
Permalink
|
Product watch: Debugger and deskside work with Cell/B.E. systems
Deskside lets you work with Cell/B.E. code too: Terra Soft's quad-core 970 PowerStation (a four-way SMP system based on the PowerPC 970MP Processor and the CPC945 North Bridge Chip) is a deskside workstation/server that also may be used to prepare and optimize code for Cell/B.E. systems (in fact, Yellow Dog Linux includes the IBM SDK for Multicore Acceleration which is installed by default). You can even use the PowerStation to develop code for and manage clusters built on PS3s or the high performance IBM BladeCenter QS22 systems.
TotalView Debugger gets Cell/B.E. support: Blue Gene/P support too. TotalView Technologies TotalView 8.5 source code debugger now lets users debug Cell Broadband Engine architecture applications (as well as delivers enhanced IBM Blue Gene/P support). It supports Linux systems using the IBM Cell/B.E. SDK (SDK 2.1 on FC6 and SDK 3.0 on Fedora 7/RHEL 5.1).
Categories
: [ Cell | news ]
Jun 17 2008, 12:29:00 PM EDT
Permalink
|
Programming with BLAS: SPE thread creation
|
Programming with BLAS: SPE thread creation (SDK 3.0)
|
INFObomb
|
|
A quick read on how the default SPE management routines can enable SPE thread creation; for the IBM SDK for Multicore Acceleration 3.0
|
|
More INFObombs
|
|
|
|
|
|
When a pre-built BLAS application binary (executable) is run with the BLAS library, the library internally manages SPE resources available on the system using the default SPE management routines. This is also true for the other BLAS applications that do not intend to manage the SPEs and want to use default SPE management provided by the BLAS library.
Example application
The sample application that invokes the BLAS-PPE library (from "Programming with BLAS: Using the PPE interface library") -- which invokes the scopy and sdot routines -- is an example of the default SPE management routines.
#include <blas.h>
#define BUF_SIZE 32
/********************** MAIN ROUTINE **********************/
int main()
{
int i,j ;
int entries_x, entries_y ;
float sa=0.1;
float *sx, *sy ;
int incx=1, incy=2;
int n = BUF_SIZE;
double result;
entries_x = n * incx ;
entries_y = n * incy ;
sx = (float *) _malloc_align( entries_x * sizeof( float ), 7 ) ;
sy = (float *) _malloc_align( entries_y * sizeof( float ), 7 ) ;
for( i = 0 ; i < entries_x ; i++ )
sx[i] = (float) (i) ;
j = entries_y - 1 ;
for( i = 0 ; i < entries_y ; i++,j-- )
sy[i] = (float) (j) ;
scopy_( &n, sx, &incx, sy, &incy ) ;
result = sdot_( &n, sx, &incx, sy, &incy ) ;
return 0;
}
Control with environmental variables
For such applications, you can partially control the behavior of BLAS library by using certain environment variables. There are many environment variables available to customize the launching of SPE and memory allocation in the BLAS library, but for full control you can register and use your own SPE and memory callbacks. Here are the environment variables:
BLAS_NUMSPES: Specifies the number of SPEs to use. The default is eight (SPEs in a single node).
BLAS_USE_HUGEPAGE: Specifies if the library should use huge pages or heap for allocating new space for reorganizing input matrices in BLAS 3 routines. The default is to use huge pages. Set the variable to 0 to use heap instead.
BLAS_HUGE_PAGE_SIZE: Specifies the huge page size to use in KB. The default value is 16384KB (16MB). The huge page size on the system can be found in the file /proc/meminfo.
BLAS_HUGE_FILE: Specifies the name of the file to be used for allocating new space using huge pages in BLAS 3 routines. The default filename is /huge/blas_lib.bin.
BLAS_NUMA_NODE: Specifies the NUMA node on which SPEs are launched by default and memory is allocated by default. The default NUMA node is -1 which indicates no NUMA binding.
BLAS_SWAP_SIZE: Specifies the size of swap space in KB. The default is not to use swap space.
BLAS_SWAP_NUMA_NODE: Specifies the NUMA node on which swap space is allocated. The default NUMA node is -1 which indicates no NUMA binding.
BLAS_SWAP_HUGE_FILE: Specifies the name of the file that will be used to allocate swap space using huge pages. The default filename is
/huge/blas_lib_swap.bin.
For more on environmental variables, see "Programming with BLAS: Tuning the library for performance."
Taken from the Basic Linear Algebra Subprograms Programmer's Guide and API Reference. Download the SDK 3.0. Check out some reference guides in the Cell Resource Center SDK library.
|
|
|
|
|
|
ORIGINAL DOCUMENTATION | DOWNLOAD SDK 3.0 |
SDK 3.0 LIBRARY |
MORE INFObombs |
BACK to BLOG |
BACK to ZONE
|
|
|
Categories
: [ Cell | infobombs ]
Jun 17 2008, 12:20:00 PM EDT
Permalink
|
Faster than a speeding bullet: Breaking the pflops barrier
LANL Roadrunner earns its name: Seems LANL's Roadrunner is now poised to take its place as the fastest supercomputer in the world -- think a stack of 100K laptops about one-and-a-half miles tall. In the Roadrunner, two IBM QS22 blade servers and one IBM LS21 blade server are combined into a specialized tri-blade configuration (which can run 400gflops) for a total of 3,456 tri-blades. Standard processing like file system I/O is taken care of by the Opteron processors while math-/CPU-intensive tasks go to the Cell/B.E. processors. (There are more interesting facts in the Roadrunner fact sheet.)
Even the New York Times is getting in on the story: "If all six billion people on earth used hand calculators and performed calculations 24 hours a day and seven days a week, it would take them 46 years to do what the Roadrunner can in one day." Other coverage includes:
You got the hardware; what about the software?: This EE Times article discusses efforts to enable all sorts of software to take advantage of the speed of multicore systems. Buddy Bland, project director for a major supercomputer center at Oak Ridge National Lab (which hopes to install its own pflops system this year), noted that "getting applications to scale is our biggest challenge" and goes on to add that "it turns out you get just as much advancement from better software and algorithms as you do from better hardware." Oak Ridge has been testing such parallel programming languages as IBM's X10, Cray's Chapel, and Sun's Fortress.
Bill Thigpen, chief of supercomputing engineering at the NASA Ames Research Center, has observed an increasing gap between the rate at which benchmark performance is rising and the increases in the ability to do actual work: "One of the challenges is being able to get the available work out of the theoretical performance peak." He goes on to note that scaling is a challenge: "Communications becomes a bigger part of your work. If you spend increasing time passing information between the processors, the processors are not doing as much work on the real issue."
The article goes on to illuminate why the important thing researchers learn from the LANL Roadrunner may not have to do with speed but with how the heterogeneous processors interact.
UPDATE 06/12/08: Panel on LANL's Roadrunner at ISC08: In a special panel session at the International Supercomputing Conference (June 17-20, Dresden; session on June 18) entitled "RoadRunner: The First Petaflop/s System in the World and its Impact on Supercomputing," two leaders of the drive to build Roadrunner -- Dr. Andrew White from Los Alamos and Dr. Don Grice of IBM -- will be joined by HPC experts to discuss the impact the system will have on the world of computing. Included are
- Lawrence Berkeley National Laboratory's Dr. Erich Strohmaier on "All #1 Systems in the TOP500 So Far."
- Drs. Grice and White on Roadrunner's hardware and software architecture and applications.
- University of Tennessee/Oak Ridge National Laboratory's Dr. Jack Dongarra on "Roadmap to Exaflop/s in the Year 2019."
- Reactions and comments on the achievement from the US, Europe, and Asia.
- An audience question period.
Other conference highlights include
- HPC and next-generation climate modeling.
- The past year in perspective.
- Harnessing the potential of multicore/manycore processors.
- Deciding whether HPC is going green.
- HPC challenges and opportunities in the era of the petaflops.
- And you get to grill leading HPC vendors!
Please see disclaimer on use of "LANL Roadrunner" name.
Categories
: [ Cell | events | news ]
Jun 10 2008, 01:08:00 PM EDT
Permalink
|
It came from the Lab: Processors that are waterfall cool
IBM water cools 3D chips: IBM Research Zurich has demonstrated 3D processor stacks that are cooled (to a rate of 180W per layer) with water flowing down 50micron channels between the chips. With 3D chip stacks, enough heat can get trapped between the layers to melt the cores -- on the back of each layer, etched into silicon oxide, is an aqueduct. (Eventual plans are for memory chips between processor cores to increase interconnections times 100 and reduce feature size by a factor of 10.) Commercial release target is 2013.
Categories
: [ general | news ]
Jun 10 2008, 01:02:00 PM EDT
Permalink
|
Trends and tradeoffs: Real community collaboration development
IBM AlphaWorks: From software theory to fact: This ZDnet UK profile is a real tribute to the fine work of IBM alphaWorks, that of providing a place for the developer commmunity to preview and collaborate on emerging technology from IBM's research labs (and, of course, turn them into commercial products). To date, alphaWorks has had 40 percent of the technologies showcased migrate into IBM products; the site also provides more than 200 downloads for developers. Technologies that really take off for developers often get picked up then by developerWorks (which goes on to build a highly interactive center of theoretical and practical resource material, Q&A forums, code exchanges, blogs, podcasts, etc.). The author interviews alphaWorks senior software engineering manager Laura Bennett who talks about the "next big things" in technology: Software-as-a-service, Web 2.0 and collaboration, Semantic Web, rapid application development, data visualization, and health care. Bennett also discusses some alphaWorks technologies that have had a significant impact on IBM like the Cell Broadband Engine (blade servers, supercomputers, game consoles), the Unstructured Information Management SDK (informational semantics), and autonomic computing technologies (which have almost "disappeared" as standalone topics because of its high rate of integration into such IBM products as Tivoli, as well as into third-party products).
Categories
: [ Cell | general | news ]
Jun 10 2008, 01:00:00 PM EDT
Permalink
|
Conventional Wisdom alert: The next ubiquitous toolchain for embedded
Is it possible that open source Eclipse might win over proprietary?: EE Times contributing technical editor Richard A. Quinnell highlights a gradual shift that has been going on for years -- product announcements and initiatives that point to the Eclipse Framework establishing a dominant position as number-one embedded tool chain. Many embedded devtool vendors like Mentor Graphics, QNX Systems, and Wind River are adapting their tools for use with Eclipse. And the Eclipse Foundation is stepping up its pursuit of device development via four initiatives to enhance its Device Software Development Platform: Real-time software components, Windows Embedded CE support, the Eclipse device-debugging project, and the target communications framework. The Eclipse Framework is a set of open-source components that can be combined to form a software development tool suite that includes basic editors, compilers, debuggers, and a user interface; it can be configured as an IDE for languages such as Java and C/C++ (and soon, Ada) by using the relevant components. (Release 1.0 of the debugging project will be available in the next Eclipse Framework Ganymede release -- see developerWorks Eclipse project resources to keep the up-to-datiest on this.)
Categories
: [ general | news ]
Jun 10 2008, 12:58:00 PM EDT
Permalink
|
Nano, nano: Paper of steel
Nanocellulose makes iron look like a wimp: Regular paper is made from a crystalline polymer of glucose called cellulose; the process to make it generates quite long microfibers that are full of defects that can break apart when stressed. Swedish Royal Institute researchers have figured out how to keep the cellulose fibers small (about 1000 times smaller than regular paper) and relatively defect-free; they then coated them with carboxymethanol which readily forms hydrogen bonds that help fibers make tight contacts with one another. This new paper has a tensile strength about seven times greater than regular paper and over one-and-a-half times greater than cast iron. The researchers say that beyond the obvious uses, these fibers could replace carbon in reinforced plastics construction (cheaper and better) and it is easier and cheaper to dry, making it cheaper to produce.
Categories
: [ general | news ]
Jun 10 2008, 12:56:00 PM EDT
Permalink
|
Oddments: Ped power
You've heard of "voting with your feet": Will your footsteps now be used to generate electricity? Underfloor generators may be the next big thing in every public place you go. The pressure of your footfalls compresses pads under the flooring, driving fluid through tiny turbines which generate electricity which is then stored in batteries. Researchers have calculated that 34,000 striders an hour can power 6,500 lightbulbs. And generation is not limited to heel strikes -- any movement a structure makes can be converted into power. Recently, trains passing over a Midlands UK railway bridge generated more than enough electricity to power a flood detector. Any building or towering structure that sways in the wind is a candidate for making a little extra charge, too. And one more thing about the walking power plant -- it can do double duty. With the technology embedded in both a floor and the heel of a shoe, the walker can power room lights and recharge his own personal electronics.
Robots: Robofish keep track of each other without creator prompting. The Game of Life from Duke researcher (but with robots).
Futuretech: A long-awaited device, Computerworld tracks the near-reality of e-paper.
Why time moves in a line from yesterday to tomorrow (and other juicy bits): Caltech researchers have a new model of our universe, one that may contain a signature of a time before the Big Bang and could explain why time moves in a straight, one-way line. In the new model, fluctuations in the cosmic background radiation (considered proof of the Big Bang and thought to be the seeds that galactic clusters grew from) might be evidence that our universe was "pinched off" an existing parent universe. The model also postulates that:
- Universes can spontaneously generate from empty space.
- Spontaneous generation of a universe is probably not a spectacular event. Co-author of the model Professor Sean Carroll says a "universe could form inside this [a] room and we’d never know."
- Originally, one-way time movement (known as the "arrow of time") was attributed to the second law of thermodynamics which insists that systems move over time from order to disorder. This model depends on the major assumption that the universe started life in an ordered state.
Categories
: [ general | news ]
Jun 10 2008, 12:54:00 PM EDT
Permalink
|
Programming with BLAS: Five maximum performance tips
|
Programming with BLAS: Five maximum performance tips (SDK 3.0)
|
INFObomb
|
|
A quick read on five tips to gain maximum library performance; for the IBM SDK for Multicore Acceleration 3.0
|
|
More INFObombs
|
|
|
|
|
|
These five tips will let you leverage maximum performance from the BLAS library.
128byte-aligned
Make the matrices and vectors 128byte-aligned: Memory access is way more efficient when the data is 128byte-aligned.
Huge pages
Use huge pages to store vectors and matrices. By default, the library uses this feature for memory allocation done within the library.
NUMA binding
Use NUMA binding for the application and the library. Set the BLAS_NUMA_NODE environment variable (a quick look is in Tuning the library for performance) to enable this feature for the library. BLAS_NUMA_NODE can be set to 0 or 1 for a dual-node system. An application can enable NUMA binding either using the command-line NUMA policy tool numactl or NUMA-policy API libnuma provided on Linux.
Swap space
Use the swap space feature (quickly described in Tuning the library for performance) for matrices smaller than 1KB with appropriate NUMA binding.
Start with the right numbers
The library gives better performance while working on vectors and matrices of large sizes. Performance of optimized routines is better when the stride value is 1. Level 3 routines show good performance when the number of rows and columns are a multiple of 64 for single precision (SP) and 32 for double precision (DP).
Taken from the Basic Linear Algebra Subprograms Programmer's Guide and API Reference. Download the SDK 3.0. Check out some reference guides in the Cell Resource Center SDK library.
|
|
|
|
|
|
ORIGINAL DOCUMENTATION | DOWNLOAD SDK 3.0 |
SDK 3.0 LIBRARY |
MORE INFObombs |
BACK to BLOG |
BACK to ZONE
|
|
|
Categories
: [ Cell | infobombs ]
Jun 06 2008, 06:31:00 PM EDT
Permalink
|
It came from the Lab: Grand Theft Auto and life sciences research
How does GTA4 further R&D: Follow this logic. To say Grand Theft Auto IV is extremely popular is like saying the sun will definitely rise in the east in the morning. What makes it popular (this is for people that have never engaged it)? The graphics and animation are mindblowing. (In fact, in the US TV show Saturday Night Live, two actors play two of the mobsterlike characters and it is almost a tough task to tell them apart from their synthetic versions. Maybe the movement of the animated ones is smoother and more realistic.) Reception of the game has demonstrated to both the gaming industry and semiconductor manufacturing industry that there is a trememdous market for the hardware and software to bring "real" to the virtual world. Once this level of processing power is in place, it can then be easily used for life (or other) sciences research:
Categories
: [ Cell | news ]
Jun 03 2008, 12:52:00 PM EDT
Permalink
|
|
 |
|