Automatic Transformation of Bit-Level C Code to Support Multiple Equivalent Data Layouts

Portable low-level C programs must often support multiple equivalent in-memory layouts of data, due to the byte or bit order of the compiler, architecture, or external data formats. Code that makes assumptions about data layout often consists of multiple

PDF / 406,740 Bytes
15 Pages / 430 x 660 pts Page_size
82 Downloads / 211 Views

DOWNLOAD

REPORT

Abstract. Portable low-level C programs must often support multiple equivalent in-memory layouts of data, due to the byte or bit order of the compiler, architecture, or external data formats. Code that makes assumptions about data layout often consists of multiple highly similar pieces of code, each designed to handle a diﬀerent layout. Writing and maintaining this code is diﬃcult and bug-prone: Because the diﬀerences among data layouts are subtle, implicit, and inherently low-level, it is diﬃcult to understand or change the highly similar pieces of code consistently. We have developed a small extension for C that lets programmers write concise declarative descriptions of how diﬀerent layouts of the same data relate to each other. Programmers then write code assuming only one layout and rely on our translation to generate code for the others. In this work, we describe our declarative language for specifying data layouts, how we perform the automatic translation of C code to equivalent code assuming a diﬀerent layout, and our success in applying our approach to simplify the code base of some widely available software.

1

Introduction

C is sometimes referred to as a “portable assembly language” because it is implemented for almost every platform and allows direct bit-level access to raw memory. It remains the de facto standard for writing low-level code such as debuggers and run-time systems that manipulate low-level data representations or process external ﬁle formats. It is therefore quite ironic that C is not well-suited to writing portable bit-level code. In practice, for such code to be portable it must support multiple equivalent data formats. A ubiquitous example is big-endian and little-endian byte order. The order of bit-ﬁelds in a struct can also vary with compilers. Additional idiosyncratic examples arise with each unusual ﬁle format, compiler, or architecture. Though such code is occasionally performance-critical (e.g., network-packet processing), it usually is not (e.g., ﬁle-header processing). Writing portable code is time-consuming and error-prone. Simple web searches reveal hundreds of (known) endianness bugs. Even for bug-free code, datalayout portability often leads to large amounts of code duplication; using Google L. Hendren (Ed.): CC 2008, LNCS 4959, pp. 85–99, 2008. c Springer-Verlag Berlin Heidelberg 2008

86

M. Nita and D. Grossman

Code Search we found approximately one thousand open-source packages with near-identical code segments based on byte- or bit-order. Such code segments are often poorly documented and are diﬃcult to maintain consistently, particularly because the code is inherently low-level. 1.1

Conventional Approaches

We believe the most common approach to supporting multiple equivalent data layouts is, indeed, code duplication and conditionals (typically with the preprocessor) to choose the correct variant. In our experience, a common approach to such duplication is that one version of the code is developed ﬁrst (e.g., for a big-endian machine), then — perhaps much later

Data Loading...

Automatic Transformation of Bit-Level C Code to Support Multiple Equivalent Data Layouts

Recommend Documents

Evaluating ISA Support and Hardware Support for Recursive Data Layouts

NOBRAINER: A Tool for Example-Based Transformation of C/C++ Code

Sparse Code Multiple Access

Reading C++ Code

Automatic Code Motion to Extend MPI Nonblocking Overlap Window

Automatic optimized support vector regression for financial data prediction

WPF Controls, Layouts, Events, and Data Binding

A Framework to Support Digital Transformation

Automatic Source Code Generation from Owl Pseudocode

Automatic Container Code Recognition Using MultiDeep Pipeline

Approaches to Good Data Governance in Support of Public Sector Transformation Through Once-Only

Multiple Objective Programming Support