Yen-Kuang Chen
This tutorial covers algorithm design and algorithmic-level optimization for future CPU and GPU with many cores. A multimedia application can be implemented via many different algorithms. For the best performance of multimedia applications on future CPU and GPU, we must carefully consider the interplay between processors and algorithms/applications. Beyond frequency increases, the performance of personal computers has improved significantly because of the introduction of multiple cores (e.g., the latest Intel Core Quad processor, IBM Cell processors, and Nvidia GeForce 9800 GX2). Moving forward, we expect a trend of increasing the number of processing cores in a single personal computer (e.g., Nvidia GeForce 9800 GX2 has 256 stream processors). To harness the computational capability from multi-core processors, one of the best ways is to exploit the data-level and thread-level parallelism in the applications. As there is a symbiotic relationship between computation and memory, to achieve best effect of the highest level of computation is to assure the best memory performance. Hence, we must design or choose the algorithm for maximal thread-level parallelism, and data-level parallelism, and cache localities.
Yen-Kuang Chen received his Ph.D. from Princeton University, and is a Principal Engineer in Corporate Technology Group, Intel Corporation. His research interests include developing innovative multimedia applications, studying the performance bottleneck in current computers, and designing next generation microprocessor/platform. In particular, he is currently analyzing the emerging multimedia applications and providing inputs to the definition of the next-generation CPUs and GPUs with many cores. He is one of the key contributors to Supplemental Streaming SIMD Extension 3 in Intel® Core™ 2 Duo processors. He has 10+ US patents, 25+ pending patent applications, and 75+ technical publications. He is an associate editor of the Journal of VLSI Signal Processing Systems (including special issues on “System-on-a-Chip for Multimedia Systems”, “Design and Programming of Signal Processors for Multimedia Communication”, and “Multi-core Enabled Multimedia Applications & Architectures”), of IEEE Transactions on Circuit and System for Video Technology, and of IEEE Transactions on Circuit and System I. He has served as a program committee member of 20+ international conferences and workshops on multimedia, video communication, image processing, VLSI circuits and systems, parallel processing, and software optimization. He is an invited participant to 2002 Frontiers of Engineering Symposium (National Academy of Engineering) and to 2003 German-American Frontiers of Engineering Symposium (Alexander von Humboldt Foundation). He is an IEEE Senior Member and an ACM Senior Member.