Presentations—Gelato ICE | Singapore | October 2006
Monday, October 2
Welcome
Mark K. Smith, Gelato Central Operations
Welcome, introduction, and overview of Gelato Federation activities.
Presentation (pdf, 1.5 MB)
Keynote—Gelato: A Call to Arms
Steve Geary, HP
This presentation is intended to be a call to action for the world's largest identifiable Itanium community to accelerate the acceptance and adoption of Itanium as a mainstream architecture in the market and in the community.
Presentation (pdf, 668 KB)
An Update on LSB and Open Standards
Mats D. Wichmann, Intel
For some reason, there is a perception that Open Source Software (OSS) developers are hostile to standards. Nothing could be further from the truth, in fact, OSS is probably one the heaviest users of standards of all kinds as long as the standards make sense and are not overly prescriptive or encumbered with licenses and fees that make them impossible to use for open source. Open Standards are an excellent complement to open source, and help insure freedom and relevancy, thus protecting investment in a particular platform. This talk will review some of the important Open Standards efforts as well as challenges such as digital rights management (DRM). An update on the progress of the Linux Standard Base effort over the last two years will be included.
Presentation (pdf, 1 MB)
Basic Intel Itanium Architecture
Cameron McNairy, Intel
The Itanium architecture and the paradigm of explicit parallel instruction computing (EPIC) are often poorly understood. This presentation will cover important aspects of the EPIC paradigm, including software pipelining, register save engine, predication, parallel instruction groups, data and control speculation, and many other mysteries of the Itanium application and system architectures.
No presentation available currently
Grid Computing at CERN: An Update on Preparations for First Beam in 2007
Lawrence Pinsky, University of Houston
As an update to the situation as described in the talk at Gelato ICE April 2006, more experience has be obtained, including the use of Itaniums within the grid infrastructure in the most recent and intensive data challenges conducted in preparation for the final run-up to turn on of the LHC machine at CERN. The LHC project at CERN will receive first beam in mid 2007 and the major experiments there will all be depending on grid-based distributed computing to provide the necessary data analysis. ALICE, the dedicated relativistic heavy ion experiment at the LHC, will be dependent on the most eclectic of grid infrastructures and the widest variety of platforms within that infrastructure. While Itaniums are scattered throughout the various experiments for a number of tasks, ALICE is the only one of the experiments that will employ a significant number of Itanium nodes as part of the main grid distributed environment. The physics of ALICE is among the most challenging of the LHC experiments from a computing standpoint and that physics will be reviewed along with an update of the performance of the Itaniums and the grid middleware that is reaching operational form.
Presentation (pdf, 7 MB)
High-Performance Storage Solutions on IA-64 Linux
Mike Gigante, SGI
In this talk I will discuss recent improvements in IO performance and scalability for both local filesystems (HPC) and for NFS. The local filesystems work has focused on improving XFS performance for buffered and direct IO in support of SSI machines such as the SGI Altix. I will summarize this effort and present performance data and compare our results to other Linux filesystems. The NFS work has concentrated on scaling Linux NFS on the SGI Altix. Over the past 12 months, very significant improvements in aggregate streaming bandwidth and in general NFS workloads have been made.
We have also made significant improvements in the ability to support very large numbers of NFS clients. I will present a summary of current performance and contrast this to our position 12 months ago.
No presentation available currently
A Security Monitoring System for Grid Computing
Shingo Takeda, Osaka University
In a computational grid, it is difficult to monitor security, because hundreds of resources are shared by a large number of users. In this talk, we will introduce our security monitoring system which helps both VO and local administrators to find possible abuse easily and quickly.
Presentation (pdf, 26 MB)
Compiler Design Criteria for Modulo Scheduled Itanium Codes
Clemens C. Roothaan, Gelato Honorary Member
In large scale scientific and engineering computer applications, one frequently encounters critically important tasks where modulo scheduled execution can achieve unprecedented efficiency. The Itanium was designed specifically to permit the construction of such codes. The vector math library (VML) released by HP in the public domain, and similar codes contributed by Intel, are the first practical implementations in this domain. On the whole, a successful strategy for the construction of modulo scheduled codes requires some new tools beyond what traditional compilers provide. Drawing from our experience with VML, we will present our analysis and discuss our plans for a compiler for modulo scheduled codes.
No presentation available currently
Technical and Scientific Computing Performance: Today and Tomorrow
Kenneth Tan, OptimaNumerics
Itanium processors are commonly used for high-performance technical and scientific computing. This talk will give an overview of the current status in terms of performance attained by a selection of technical/scientific code running on supercomputers. We will look at the impediments to attaining high performance. Bearing in mind the trends of what lies in the future, we will then explore why we can't just rely on improvements in chips to get better performance. This talk will look at some solutions to the performance issue (solutions deliverable today and those still in development).
No presentation available currently
64-Bit Migration to Linux on Itanium: Challenges, Advantages, and Tools
The objective of this talk is to bring forth the benefits and challenges involved in the 64-bit migration path, in general as well as specific to Linux on Itanium. This presentation will provide insights into the tools and techniques available for aiding 64-bit migration, while also sharing some tips with the programming community on how to avoid common pitfalls.
Presentation (pdf, 236 KB)
Itanium Projects: From R&D to Industry
Jon Lau, National Grid Office
While continuing its efforts of promoting grid usage in R&D, NGO has extended its efforts to encourage industry to leverage grid computing. In this talk, NGO will describe some of the projects that it has carried out using Itanium as the compute resources. Some of these projects include digital media rendering tools and electronic design automation tools.
Presentation (pdf, 2.6 MB)
Tuesday, October 3
Keynote—What Intel Itanium Architecture and Processors Can Do for You
Cameron McNairy, Intel
This presentation will introduce the system and application architecture and the processor that implements that architecture. An exploration of systems and software built around Itanium processors will show what people can and will be doing with Itaniums in the future.
Presentation (pdf, 8 MB)
Host Presentation—National Grid Office in Singapore
Hing Yan Lee, National Grid Office
The National Grid (NGO) has the mission to facilitate the seamless use of an integrated cyber-infrastructure in a secure, effective, and efficient manner to advance scientific, engineering, and biomedical R&D, with the longer term goal of transforming the Singapore economy using grid computing. NGO's activities include: formulating the framework and policies; planning and developing a secure platform; adopting common open standards; encouraging the adoption of grid computing; demonstrating the commercial viability of compute-resource-on-tap; and laying the foundation for a vibrant grid computing economy. In this presentation, we will share several new initiatives in grid computing.
Presentation (pdf, 5.5 MB)
Host Presentation—HPC and Grid R&D Activities at IHPC
Terence Hung, Institute for High Performance Computing
The speaker will give a brief introduction to the Institute of High Performance Computing and its computational science and engineering research. The talk will articulate the various HPC and grid related R&D activities at IHPC and how they form part of the overall institute's strategy to power scientific discovery.
Presentation (pdf, 20 MB)
SynaBASE: Next-Generation Bioinformatics Database Platform
Arif Anwar, Synamatix
Performance advantages of the SynaBASE structured-network database platform for advancing next-generation genomics research will be presented. SynaBASE is optimized on HP Integrity rx4640 Itanium servers and has demonstrated to be effective in breaking through critical bottlenecks in Life Sciences research (i.e. the management and analysis of large complex biological data). The HP Integrity system provides the best SMP (symmetric multiprocessing) solution for a memory-intensive bioinformatics platform, offering greater memory scalability than other comparable architectures—up to 96 Gbytes—and advanced processor support. This coupled with native Linux kernel support for the IA-64 instruction set has rendered Linux the most conducive operating system for SynaBASE development on C++ and SynaSuite (the web-based suite of tools for SynaBASE) development on Java. Downstream performance benefits include 1000 to 10,000+ fold increases in analysis speed, enhanced sensitivity, and specificity of data results.
Presentation (pdf, 2.3 MB)
Optimizing Software for Intel Itanium Architecture with Intel Compilers
Eric W. Moore, Intel
This presentation will cover required compiler knowledge for accelerating software on Intel Itanium architecture.
Presentation (pdf, 134 KB)
The GPT and Superpages
Peter Chubb, University of New South Wales
This talk will present a status report on a number of Itanium kernel related projects being researched at Gelato@UNSW. An update on strategies for using large pages (superpages) to increase TLB coverage will be presented. A progress report on our page table abstraction layer will also be covered, along with an update on work to port a guarded page table (GPT) behind the API.
Presentation (pdf, 136 KB)
Easily Locating Optimization Opportunities with the Intel VTune Performance Analyzer
Eric W. Moore, Intel
An introduction to software performance analysis with the Intel VTune Performance Analyzer will be presented.
Presentation (pdf, 610 KB)
An Inside Look at Scaling Linux to 1024 Processors
Steve Neuner, SGI
Standard Linux distributions are now supporting 1024 processor systems, realtime, and other high performance OS features without using a customized or special kernel. This session will look at recent 2.6 Linux kernel improvements that have made this possible and describe how these features flow from the Linux community into a standard, fully supported operating system for customers running High Performance Computing (HPC) systems with Intel Itanium 2 processors.
Presentation (pdf, 590 KB)
ClustalW Optimization: Adaptive Scheduling
Shin Yee Chung, Institute for High Performance Computing
ClustalW is a general purpose, multiple alignment program that aligns multiple DNA and protein sequences. A parallel version of ClustalW, called ClustalW-MPI, was developed by the Bioinformatics Institute (BII). However, ClustalW-MPI is not able to fully utilize the given processors due to a mixture of code regions with high parallelism and limited parallelism. Hence, static number of processors is not optimal. As a result, processor cycles are wasted as the idling processors cannot be used by other jobs due to exclusive-use policy. We propose a simple adaptive task scheduling to monitor its own performance at runtime, and readjustment of the number of processors in use. Hence, multiple jobs will be able to share a single pool of processors to increase the system throughput.
Presentation (pdf, 364 KB)
Add-on (pdf, 45 KB)
Performance Monitoring on Itanium: the PMU Counter Advantage
IPF is based on the EPIC paradigm, which exposes the parallelism of the machine to software. On IPF, application performance depends a lot on the parallelism extracted and optimizations done by the compiler and linker. It is extremely important to have a good performance monitoring tool to aid the developers and compilers to fine tune the application performance. The IPF hardware recognizes the need for performance monitoring and has a built-in hardware performance monitoring unit (PMU). PMU is a piece of CPU hardware that collects wide ranges of micro-architectural events and supports event counting as well as accurate sampling. Several performance monitoring tools utilize the PMU to get the performance profiles of applications. HP Caliper is a general-purpose performance analysis tool for applications running on IPF HP-UX and Linux. HP Caliper allows one to understand the performance of the application and to identify ways to improve it.
Presentation (pdf, 963 KB)
Compiling the Linux Kernel with the Intel Compiler
Feilong Huang, Intel
In general, compiling the Linux kernel with a compiler other than the GNU compiler requires some effort. Linux kernels require a compiler with very high compatibility with the GNU compiler, including command line compatibility and source compatibility, and even binary compatibility.
In this session, Feilong will introduce how to compile the 2.6.9 Linux kernel with the Intel C++ Compiler (ICC) and will talk about some interesting issues he solved while compiling the Linux kernel with ICC. He will also mention debugging and kernel performance benchmarks.
Presentation not available to the public
S7 Case Study: Porting 2 Million Lines of C++ Code to HP-UX Itanium
Pankaj Kulkarni, S7 Software Solutions
Wind/U is a complete implementation of Windows OS on UNIX & Linux. It supports the Win32 API, common controls and dialogs, MFC, COM, OLE, ATL, and third party libraries such as Stingray. Since Wind/U is designed to be source code level compatible, it offers the true Windows functionality without any emulation. You can make your Windows applications fully interoperable with UNIX with little or no change to your original source code.
This presentation is a case study of porting approximately 2 million lines of code to HP-UX on Itanium. It will cover the porting issues encountered and their resolutions along with valuable information gathered during the 12 man-month porting experience. The talk will also point out the benefits reaped by porting to Itanium.
Presentation (pdf, 642 KB)
Update on the Perfmon2 Interface
Stéphane Eranian, HP
In this presentation, we will report on the latest developments of the perfmon2 interface and our efforts to get it finally integrated into the mainline Linux kernel. We will also give an update on the current state of the user level tools (pfmon/libpfm/Caliper) on Montecito and other processors.
Presentation (pdf, 590 KB)
HP/OSLO Linux Scalability Tracking and Investigations
Lee Shermerhorn, HP
The HP/OSLO Scalability and Performance project has been tracking the scalability and performance of Linux kernels on HP's IA-64 platforms for the past couple of years. The project team investigates regressions, bottlenecks, and anomalies and, where applicable, generates patches for submittal to distros and upstream. This presentation will provide an update on the scalability of Linux up through recent kernels, and the results of some of the IA-64 specific investigations we have undertaken.
Presentation (pdf, 2.2 MB)
Experiences with Itanium Clusters in Grids
Lennart Johnsson, University of Houston
Our research is focused on methodologies, tools, and scientific libraries for high-performance of scientific and engineering applications. In the last decade, the context of this research has been grids, specifically the GrADS, VGrADS, LCG, and TIGRE efforts in which our Itanium clusters have been and are used. We will report on our experiences with software development, tools, and performance achieved on our clusters and an Itanium-based SMP.
Presentation (pdf, 7.5 MB)
Hyper-Threading on Dual-Core Intel Itanium 2 Processors
Cameron McNairy, Intel
Hyper-threading on the dual-core Intel Itanium 2 processor provides what appears as two logical processors for each core. As a result, the benefits of hyper-threading are available to an application automatically. However, there are things that the application and operating system can do to optimize for hyper-threading. This presentation introduces hyper-threading and then transitions into what software can and should do to best realize performance.
Presentation (pdf, 424 MB)
Wednesday, October 4
Multi-Core Programming Seminar and Lab: Parallel Programming Concepts
Feilong Huang, Intel
This talk will present an introduction to parallel computing concepts.
Presentation (pdf, 2.3 MB)
Particle Simulation for Subcellular Dynamics and Localization of Biological Molecules
Akihiko Konagaya, RIKEN
Ryuzo Azuma, RIKEN
A particle model designed for three-dimensional simulation of the reaction diffusion dynamics within cells is proposed. Chemical reactions proceed through the interactions of particles in space, with activation energies determining the rates of these chemical reactions at each interaction. The energy-based model can include the cellular membrane, membranes of other organelles, and cytoskeleton. The simulation algorithm is tested for a reversible enzyme reaction model. Snapshot images taken from the simulation of molecular interactions on the cell surface reveal clustering domains (size ~0.2mm) associated with rafts. Sample trajectories of raft constructs indicate ghop diffusion. Corralled diffusion membrane proteins are also observed on these domains. These findings demonstrate that our approach is promising for modeling the localization properties of biological phenomena.
Presentation (pdf, 2.7 MB)
Multi-Core Programming Seminar and Lab: OpenMP
Feilong Huang, Intel
OpenMP is a shared-memory parallel programming standard. It consists of a set of compiler directives, library routines, and environment variables that can be used in Fortran and C/C++ programs. It is portable, scalable and easy-to-use. In this presentation, you will learn what OpenMP is, what it can do for you, and how to parallel your code with OpenMP.
Presentation (pdf, 787 KB)
Xen and Intel Virtualization Technology for IA-64
Yaozu Dong, Intel
Xen is an open-source virtual machine monitor, originally developed for x86 architecture, that has been extended to the Itanium Processor Family (IPF) to run "paravirtualized" operating systems on top of the monitor (XEN/IPF). These guest operating systems are limited to only those for which the source code is available for modifications. Intel has extended the Xen/IPF hypervisor to use the Intel Virtualization Technology for Itanium (VT-i) to support unmodified guest OS's.
This talk will present architecture, design, implementation, and challenges used in enabling full virtualization for Xen through VT-i. We will cover processor virtualization, memory virtualization, and extension to device model for architecture-specific issues, such as instruction cache and data cache coherency that is specific for IPF.
Presentation (pdf, 461 KB)
Update on the Osprey Project, the Alternative GCC Backend for Itanium
Shin-Ming Liu, HP
The Itanium processor is moving towards a multi-core, multi-threaded design while most existing applications are written for single CPU systems. To extract the best performance out of the Linux/Itanium-based platform, a higher performance compiler tuned for Itanium is essential. GCC, the compiler of choice for all Linux developers, is currently undergoing a major enhancement to deliver this performance. To meet Linux/Itanium performance needs in the immediate future, the Osprey Project is a very viable solution. The Osprey Project weds the need for high-performance applications with GCC.
The Osprey Project gathers contributions from multiple universities. It is based on the Open64 compiler with enhancements made by PathScale and ORC. The compiler is available for Internet download in both source and binary form from . Beta testing has begun for the first release of the Osprey compiler. The official Open64 website also supports compiler research and development contributions by research institutes and the open-source community.
Presentation (pdf, 239 KB)
Multi-Core Programming Seminar and Lab: Intel Thread Checker
Rama Kishan V Malladi, Intel
Intel Thread Checker reduces time-to-market for threaded applications by speeding the development process. This tool helps find non-thread safe functions and code sections where two threads might modify the same memory location without proper synchronization.
Presentation (pdf, 1.2 MB)
Evaluating Xen IA-64 Security and Performance
César De Rose, Pontifical Catholic University of Rio Grande do Sul
Security and performance are among the main concerns related to virtualization. It is important to evaluate the impact of introducing the additional computation layers necessary to provide virtual resources to applications. Providing security assurance and consistent performance measures of such a complex system is a challenging task, due to the several variables that need to be controlled.
In this talk, we will describe some security testing techniques and a performance testing strategy for the IA-64 port of the Xen virtual machine monitor, and then will present some preliminary results.
Presentation (pdf, 1.6 MB)
The ISP RAS Activities for Improving GCC for Itanium
Arutyun I. Avetisyan, Institute for System Programming, Russian Academy of Science
This talk will present an overview of the ISP RAS activity on improving GCC for Intel Itanium processors. The presentation will cover the history of projects with HP on improving GCC instruction scheduling and will present current results of an ongoing project with HP on implementing a new VLIW-targeted instruction scheduler. Future plans of the ISP RAS, including preparation of the current patches for inclusion in GCC, will also be discussed.
Presentation (pdf, 166 KB)
Virtualization and User-Level Drivers
Peter Chubb, University of New South Wales
UNSW has been working for some time both on user-level drivers and a framework for using Linux as a hypervisor. These separate projects have been reported on in previous Gelato conferences.
Given that we have a way to run a complete Linux image in userspace on top of Linux, it would seem desirable to allow access to a subset of the PCI bus from a guest operating system. This would allow:
- A virtual machine to be given exclusive access to devices on the PCI bus.
- Easier device driver development
Myrto Zehnder, a Master's student from ETH Zurich, has been working with Gelato@UNSW for the past six months on allowing controlled access to the device-driver framework in the guest. This talk will present the mechanisms we have developed, and the pitfalls we have encountered in implementing the system.
Presentation (pdf, 139 KB)
Storage Layout Optimizations to Improve Parallel Distributed Filesystem Performance
Doug Johnson, Ohio Supercomputer Center
Every parallel filesystem is built on a combination of many instances of local filesystems. Their performance directly impacts the overall performance of the distributed system. We plan to study in detail the behavior of various operations in a parallel filesystem, such as bulk storage and metadata activity while varying features of the underlying local filesystem, including layout and journaling. In this talk, we will present intermediate results from these investigations.
Presentation (pdf, 139 KB)
Practical Experience with Performance Monitors on Xeon and Itanium
Ryszard Erazm Jurga, CERN
This presentation will provide insights into the performance monitors on Xeon and Itanium and their practical applications. It will share our experience with different tools available for those processors, as well as our own attempt to their implementation. It will also introduce some results from our measurements on both CPUs.
Presentation (pdf, 2.2 MB)
Multi-Core Programming Seminar and Lab: Intel Thread Profiler
Rama Kishan V Malladi, Intel
Intel Thread Profiler helps understand the structure of threaded applications and maximize their performance. This tool helps analyze bottlenecks in threaded code and pinpoint problems areas/ reasons for the slowdown of the application performance.
Presentation (pdf, 1 MB)