An Improved Process for the Creation, Maintenance, and Documentation of Analysis-ready Data

  • PDF / 6,513,179 Bytes
  • 5 Pages / 648 x 864 pts Page_size
  • 20 Downloads / 161 Views

DOWNLOAD

REPORT


Allan Glaser Associate Director; Scientific Programming, Merck S.Co., Blue Bell, Pennsylvania

Key Words Analysis data sets: Process improvement; Data dictionary Correspondence Address Allan Glaser; Merck & Co., Inc., Mail Stop UNA-102. 785 Jolly Road, Blue Bell, PA 19422. SAS and all other SAS Institute lnc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the United States and other countries. @ indicates US registration.

An Improved Process for the Creation, Maintenance, and Documentation of Analysis-ready Data

INTRODUCTION The collection, analysis, and reporting of clinical trial data are inherently complex. As data move through various stages of the process, they typically reside in two disparate repositories. The first repository is usually based on a commercially available database management system and provides a foundation of robust data structures that are well defined and documented. The structures in this repository tend to reflect the source data. The second repository is intended to contain manipulated data (ie, data more amenable to analysis and reporting). It is not uncommon for that repository to be incompletely defined or poorly documented, and it may not provide an optimal foundation for later work. It is essentially a nonstandardized and minimally controlled environment. Inherent weaknesses may manifest themselves through technical errors, frustration, rework, and ultimately reduced productivity. The intent of this article is to present a general methodology for at least partially improving this situation.

A N A L Y S I S DATA SETS The second repository is usually developed with S A P and related computer programming tools and typically consists of a collection of SAS data sets. The constituent data sets may include information at the project, protocol, patient, visit, Drug Information Journal. Vol. 40, pp. 331 -335.2006 Printed in the USA. All rights reserved. Copyright 0 2006

331

The development of analysis-ready data for clinical trials is a complicated process that is often problematic. Not only are the data inherently complex, but also a n e c a l and reporting requirements tend to evobe over a period of time. Additional difficulties arise because of the number of individuals working with the data and the varied uses of the data. A new approach is described that greutly facilitates the creation, maintenance, and documentation of these data. Results are encoumninp and include contributing to high quality and improved produh'vity. C

B

I

2

and event levels, and the overall structure of these data sets is critical for the proper understanding and analysis of the data. Each data set has intrinsic attributes, such as its name, a label that identifies its function, and its size. Similarly, each variable within the data sets has intrinsic attributes, including its name, label, type (eg, numeric or character), length, format, and row position. For example, the variable "age" may have the label "Patient Age," be defined as a 4-byte numeric, have a for