SPRINT: A Tool to Generate Concurrent Transaction-Level Models from Sequential Code
- PDF / 1,089,037 Bytes
- 15 Pages / 600.03 x 792 pts Page_size
- 86 Downloads / 195 Views
Research Article SPRINT: A Tool to Generate Concurrent Transaction-Level Models from Sequential Code Johan Cockx, Kristof Denolf, Bart Vanhoof, and Richard Stahl Interuniversity Micro Electronics Center (IMEC vzw), Kapeldreef 75, 3001 Leuven, Belgium Received 1 September 2006; Accepted 23 February 2007 Recommended by Erwin de Kock A high-level concurrent model such as a SystemC transaction-level model can provide early feedback during the exploration of implementation alternatives for state-of-the-art signal processing applications like video codecs on a multiprocessor platform. However, the creation of such a model starting from sequential code is a time-consuming and error-prone task. It is typically done only once, if at all, for a given design. This lack of exploration of the design space often leads to a suboptimal implementation. To support our systematic C-based design flow, we have developed a tool to generate a concurrent SystemC transaction-level model for user-selected task boundaries. Using this tool, different parallelization alternatives have been evaluated during the design of an MPEG-4 simple profile encoder and an embedded zero-tree coder. Generation plus evaluation of an alternative was possible in less than six minutes. This is fast enough to allow extensive exploration of the design space. Copyright © 2007 Johan Cockx et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1.
INTRODUCTION
Advanced state-of-the-art applications such as multimedia codecs must achieve a high computational power with minimal energy consumption. Given these conflicting constraints, multiprocessor implementations not only deliver the necessary computational power, but also provide the required power efficiency. Multiple processors operating at a lower clock frequency can provide the same performance as a single processor at a higher clock frequency, but with a lower energy consumption [1, 2]. A multiprocessor implementation can be further optimized by selecting a specialized processor for each task, providing a better power-performance trade-off than the single general purpose processor. An efficient implementation of these applications on such a platform raises two key challenges. First, parallel tasks must be identified and extracted from the sequential reference specification. There must be an excellent match between the extracted tasks and the architecture resources: any significant mismatch results in performance loss, a decrease of resource utilization and reduced energy efficiency of the implementation. Second, the memory and bus/communication network on the platform consume a major part of the energy [3–6], and optimizations reducing this power dissipation are crucial.
However, the task of exploring various program partitions presents one of the major bottlenecks in current design environments. To evaluate a given partition, a concurrent executable mo
Data Loading...