Machine-Learning Models for Combinatorial Catalyst Discovery

  • PDF / 185,641 Bytes
  • 6 Pages / 612 x 792 pts (letter) Page_size
  • 54 Downloads / 242 Views

DOWNLOAD

REPORT


JJ11.5.1

Machine-Learning Models for Combinatorial Catalyst Discovery Gregory A. Landrum, Julie Penzotti and Santosh Putta Rational Discovery LLC 555 Bryant St. #467 Palo Alto, CA 94301, USA ABSTRACT Standard machine-learning algorithms were used to build models capable of predicting the molecular weights of polymers generated by a homogeneous catalyst. Using descriptors calculated from only the two-dimensional structures of the ligands, the average accuracy of the models on an external validation data set was approximately 70%. Because the models show no bias and perform significantly better than equivalent models built using randomized data, we conclude that they learned useful rules and did not overfit the data. INTRODUCTION Industrial and scientific interest have driven enormous amounts of experimental and theoretical research into polymer catalysis.[1] Despite all that we have learned, it remains impossible (under most circumstances) to design effective new catalysts from first principles either using a computer or on a piece of paper. As with the rest of chemistry, experimental search and refinement remains an integral part of catalyst discovery. Though first-principles design of a new catalyst may not be feasible, we can take a page from the field of drug discovery and use computational models, built from existing experimental data, in a decision-support role to accelerate the discovery process by screening huge collections of potential catalysts in silico. Machine-learning algorithms are ideally suited to develop the highly-efficient models required for this approach. Machine learning has been applied successfully to the prediction of a variety of materials properties including superconductivity,[2] ferromagnetism,[3] structure prediction,[4] and heterogeneous catalysis.[5] There are also numerous examples of the application of learning methods to organometallic compounds; these include conformational analysis[6] and mining crystal-structure databases.[7] The major stumbling block when using machine-learning approaches is that they are hungry for data: a large collection of consistent experimental measurements is required to develop a useful model. The application of high-throughput (or combinatorial) methods to molecular catalysis[8,9] has begun to produce data sets which are large enough to apply machine-learning algorithms to the prediction of catalyst properties. For this work, we use a combinatorial catalysis data set published earlier this year by a group from Symyx Technologies.[9] The data set consists of a set of 96 related ligands, the general form of which is sketched in Figure 1.

Figure 1: General form of the ligands used in this study.

JJ11.5.2

These ligands were combined with Hf(CH2Ph)4 and an activator in solution and the ability of the resulting catalysts to polymerize a mixture of ethylene and 1-octene was then measured. Four different values were recorded for each of the resulting polymers: yield, molecular weight, polydispersity and percent 1-octene incorporation. This data set, conta