A study of learning likely data structure properties using machine learning models

  • PDF / 917,586 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 65 Downloads / 223 Views

DOWNLOAD

REPORT


STTT Special Issue: SPIN 2019

A study of learning likely data structure properties using machine learning models Muhammad Usman1 · Wenxi Wang1 · Kaiyuan Wang1 · Cagdas Yelen1 · Nima Dini1 · Sarfraz Khurshid1

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Data structure properties are important for many testing and analysis tasks. For example, model checkers use these properties to find program faults. These properties are often written manually which can be error prone and lead to false alarms. This paper presents the results of controlled experiments performed using existing machine learning (ML) models on various data structures. These data structures are dynamic and reside on the program heap. We use ten data structure subjects and ten ML models to evaluate the learnability of data structure properties. The study reveals five key findings. One, most of the ML models perform well in learning data structure properties, but some of the ML models such as quadratic discriminant analysis and Gaussian naive Bayes are not suitable for learning data structure properties. Two, most of the ML models have high performance even when trained on just 1% of data samples. Three, certain data structure properties such as binary heap and red black tree are more learnable than others. Four, there are no significant differences between the learnability of varied-size (i.e., up to a certain size) and fixed-size data structures. Five, there can be significant differences in performance based on the encoding used. These findings show that using machine learning models to learn data structure properties is very promising. We believe that these properties, once learned, can be used to provide a run-time check to see whether a program state at a particular point satisfies the learned property. Learned properties can also be employed in the future to automate static and dynamic analysis, which would enhance software testing and verification techniques. Keywords Data structure invariants · Machine learning · Korat · Learnability

1 Introduction Data structure invariants play an extensive role in the verification and validation of software systems. These are properties that data structures should satisfy, and are also termed as

B

Muhammad Usman [email protected] Wenxi Wang [email protected] Kaiyuan Wang [email protected] Cagdas Yelen [email protected] Nima Dini [email protected] Sarfraz Khurshid [email protected]

1

University of Texas at Austin, Austin, TX 78712, USA

class invariants in object-oriented programming [39,45]. For example, a binary tree should satisfy properties such that there is no cycle in the tree and that all nodes are reachable from the root node. We believe that data structure invariants play a vital role in the field of software testing. Model checkers usually use these invariants as assertions and try to violate these invariant properties in order to find bugs in programs. If an invariant is violated, it exposes a bug in the software system [25,44,67]. Data structure repair t