Computer-Assisted Organic Synthesis W. Todd Wipke, EDITOR
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001
University of California, Santa Cruz W. Jeffrey Howe, EDITOR The Upjohn Company
A symposium cosponsored by the Division of Chemical Information and the Division of Computers in Chemistry at the Centennial Meeting of the American Chemical Society, New York, N.Y., April 7-8, 1976.
61
ACS SYMPOSIUM SERIES
AMERICAN
CHEMICAL
SOCIETY
WASHINGTON, D. C. 1977
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001
Library of CongressCIPData Computer-assisted organic synthesis. (ACS symposium series; 61 ISSN 0097-6156) Includes bibliographical references and index. 1. Chemistry, Organic—Synthesis—Data processing— Congresses. I. Wipke, W. Todd, 1940. II. Howe, William Jeffrey, 1946. III. American Chemical Society. D i vision of Chemical Information. IV. American Chemical Society. Division of Computers in Chemistry. V . American Chemical Society. VI. Series: American Chemical Society. ACS symposium series; 61. QD262.C54 ISBN 0-8412-0394-6
547'.2'0285 ACSMC8
77-13629 61 1-239
Copyright © 1977 American Chemical Society All Rights Reserved. N o part of this book may be reproduced or transmitted in any form or by any means—graphic, electronic, including photocopying, recording, taping, or information storage and retrieval systems—without written permission from the American Chemical Society.
PRINTED IN THE UNITED STATES OF AMERICA
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
ACS Symposium Series
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001
Robert F. Gould, Editor
Advisory Board Donald G. Crosby Jeremiah P. Freeman E. Desmond Goddard Robert A. Hofstader John L. Margrave Nina I. McClelland John B. Pfeiffer Joseph V. Rodricks Alan C. Sartorelli Raymond B. Seymour Roy L. Whistler Aaron Wold
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001
FOREWORD The ACS S Y M P O S I U M SERIES was founded in 1974 to provide a medium for publishing symposia quickly in book form. The format of the SERIES parallels that of the continuing A D V A N C E S I N C H E M I S T R Y SERIES except that in order to save time the papers are not typeset but are reproduced as they are submitted by the authors in camera-ready form. As a further means of saving time, the papers are not edited or reviewed except by the symposium chairman, who becomes editor of the book. Papers published in the ACS S Y M P O S I U M SERIES are original contributions not published elsewhere in whole or major part and include reports of research as well as reviews since symposia may embrace botfe types of presentation.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.pr001
PREFACE Hphe application of digital computers to the design and study of organic syntheses has stimulated the interest of chemists and computer scientists alike. The field not only promises practical benefits in synthesis planning and chemical education but also contributes to many areas of theoretical interest such as synthetic strategies, reaction indexing, structure representation, substructural perception, computer graphics, molecular modelling, and artificial intelligence. Many concepts and techniques now being used in structure elucidation, structure-activity, and other chemical information systems werefirstdeveloped in the computer synthesis planning field. Since 1969 when thefirstpaper in this area appeared, several groups have pursued approaches to computer synthesis planning. The symposium from which this volume derives was the first meeting of all the major research groups in this field. The papers in this volume describe the state of the art of computer synthesis as viewed by the major research groups working in the area. The editors acknowledge the Petroleum Research Fund for partial support of the symposium through a travel grant to Ivar Ugi and thank Cynthia O'Donohue for her help as program chairman. W. TODD WIPKE W. JEFFREY HOWE
Santa Cruz, August 1977
CA
Kalamazoo, MI
vii In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1 LHASA—Logic and Heuristics Applied to Synthetic Analysis DAVID A. PENSAK
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
Central Research and Develop. Dept., Ε. I. du Pont de Nemours and Co., Wilmington, Del. 19898 E. J. COREY Dept. of Chemistry, Harvard University, Cambridge, Mass. 02138
Despite the wealth o f knowledge about various chemical r e a c t i o n s , there e x i s t s no formal framework of i n t e r r e l a t i o n s h i p s t o guide the chemistint h e synthesis o f even moderately complex molecules. The LHASA (Logic and H e u r i s t i c s Applied t o Synthetic A n a l y s i s ) p r o j e c t is an attempt t o c o d i f y and organize the techniques used in organic synthesis. One important aspect o f the p r o j e c t has been t h e w r i t i n g o f a general purpose computer program which will a i d the laboratory chemist and will employ both the b a s i c and more complex techniques f o r s y n t h e t i c design as e l u c i d a t e d by this study. The program (hereafter also c a l l e d LHASA) is intended t o propose a v a r i e t y o f s y n t h e t i c routes t o whatever molecule it is given. The responsibility f o r final e v a l u a t i o n o f the merit o f the routes lies with the chemist. The program is t o be an adjunct t o the laboratory chemist as much as any analytical t o o l . Since LHASA is incapable o f proposing any routes the chemist could not have thought o f by himself, i . e . , it does not suggest new reactions that have never been tried, there needs t o be some justification for the massive e f f o r t involved in writing the program. It is well known that humanity and creativity b r i n g w i t h them c e r t a i n unavoidable shortsightedness and p r e j u d i c e s . There will be particular reactions with which 1 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
each chemist i s most f a m i l i a r and i t i s to these that he w i l l f i r s t look to f i n d h i s s y n t h e t i c r o u t e ( s ) . I t i s p r e c i s e l y t h i s f a i l u r e to consider a l l p o s s i b l e routes t h a t makes a program l i k e LHASA both u s e f u l and necessary. Computers are w e l l known f o r t h e i r a b i l i t y to perform rote tasks a great number of times without complaint. Examination of p o t e n t i a l s y n t h e t i c path ways may be broken down i n t o s u f f i c i e n t l y small steps as to be amenable to computer implementation. How should one go about designing a synthesis? One of the most b a s i c techniques i s to work back wards, the f i n a l product or t a r g e t molecule being the u l t i m a t e g o a l . This t a r g e t i s examined to f i n d any or a l l compounds which can be transformed i n t o i t i n a s i n g l e chemical step. Each of these precursors may then be s i m i l a r l y analyzed u n t i l s a t i s f a c t o r y s t a r t i n g m a t e r i a l s are obtained. This method of a n a l y s i s i s c a l l e d r e t r o s y n t h e t i c (or, e q u i v a l e n t l y , a n t i t h e t i c ) . A s t r u c t u r a l m o d i f i c a t i o n that i s being performed i n the r e t r o s y n t h e t i c d i r e c t i o n i s c a l l e d a transform and i s g r a p h i c a l l y depicted as a double arrow.
CH
3
When applied i n i t s most general form, r e t r o s y n t h e t i c a n a l y s i s could be applied to every pre cursor of the t a r g e t molecule and then, i n t u r n , to each of the new s t r u c t u r e s . The g r a p h i c a l represen t a t i o n of such an a n a l y s i s i s c a l l e d a synthesis t r e e . (A complex example i s shown below). I t i s worthwhile to note that s t r u c t u r e s tend to be l e s s complex the f u r t h e r away they are from the t a r g e t molecule and no c o n s t r a i n t s are placed on the choice of r e a c t i o n s . The s t a r t i n g m a t e r i a l s are the l a s t to be generated, thereby maintaining f l e x i b i l i t y i n the choice of route u n t i l the end of the a n a l y s i s .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK A N D COREY
LHASA
3
WRITE-THRU
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
MAKE
KILL
SAVE
RESTORE
TYPE NODE
-LINEAGE
-FAMILY
GET NODE
-LINEAGE
-FAMILY
SCOPE1
PROCESS
RESTART
MENU2
-ALL
Of considerable importance t o the way a synthesis i s analyzed i s the d e t a i l e d plan o f execution. The ordering o f the a d d i t i o n (or removal) o f i n d i v i d u a l f u n c t i o n a l i t i e s , the manipulation o f stereocenters, and the c l o s u r e o f rings can be o f c r u c i a l importance i n terms o f i n t e r f e r e n c e s o r competing r e a c t i o n s . The procedures f o r choosing the sequence i n which these d i s c r e t e steps are performed are c a l l e d s t r a t e g i e s . At present there are three broad categories o f s y n t h e t i c strategy that LHASA i s capable o f employing. They are 1) Opportunistic o r F u n c t i o n a l Group Based Strategies 2) S t r a t e g i c Bond Disconnections f o r P o l y c y c l i c Targets 3) S t r a t e g i e s Based on S t r u c t u r a l Features
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
4
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
a) Appendages b) Rings (small, common, medium-large) c) Masked F u n c t i o n a l i t y The aim o f the LHASA p r o j e c t has been the c r e a t i o n o f a computer program which employs the s t r a t e g i e s gleaned from the study o f s y n t h e t i c design. Such a program now e x i s t s though i t i s c o n t i n u a l l y undergoing m o d i f i c a t i o n and expansion as new s t r a t e gies are e l u c i d a t e d and implemented. The remainder o f t h i s paper describes the o v e r a l l o r g a n i z a t i o n o f the LHASA program and the implementation o f these s t r a t e g i e s . P a r t i c u l a r a t t e n t i o n i s paid t o those aspects of LHASA which are o f s p e c i a l i n t e r e s t t o s y n t h e t i c chemists o r t o computer s c i e n t i s t s working i n chemical areas. ORGANIZATION OF LHASA The LHASA program i s exceedingly complex - about 400 subroutines, 30,000 l i n e s o f FORTRAN code and a data base o f over 600 common chemical r e a c t i o n s . To describe i t i n d e t a i l i s w e l l beyond the scope o f t h i s paper. A general overview i s relevant as i t puts the f u n c t i o n s o f the data base i n a reasonable perspective. Figure 1 shows a g l o b a l view o f LHASA. 1
•See f o r example Corey, E. J . , W. J . Howe and D. A. Pensak, J . Amer. Chem. S o c , 2& 7724 (1974). Corey, E. J . , Quart. Rev. Chem. S o c , 25, 455 ( 1 9 7 1 ) · Corey, E. J . , W. T. Wipke, R. D. Cramer I I I and W. J . Howe, J . Amer. Chem. S o c , £ 4 , 421 (1972). Corey, E. J . , W. L. Jorgensen, J . Amer. Chem. S o c , 28,
189 (1976).
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK AND COREY
5
LHASA
PERCEPTION
GRAPHICS
\
EXECUTIVE
CHEMISTRY PACKAGES
STRUCTURE
TRANSFORM EVALUATION
MANIPULATION
CHEMISTRY DATA BASE
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
Figure 1 Sample recognition questions
GRAPHICS There i s no doubt that the part o f LHASA which makes the most immediate impact on chemists i s the graphics. He may draw i n the s t r u c t u r e that he i s i n t e r e s t e d i n analyzing using standard chemical con ventions and a l l communication from the program t o him i s v i a s t r u c t u r a l diagrams. This manner o f i n t e r f a c i n g w i t h the chemist-user was chosen because i t has been shown t h a t the rate a t which he can a s s i m i l a t e chemical data i s maximized i f i t i s i n the n o t a t i o n w i t h which he i s most f a m i l i a r . As the chemist s i t s f a c i n g the CRT (cathode ray tube) with which he communicates w i t h LHASA, on h i s r i g h t i s a d i g i t i z i n g data t a b l e t . This i s a device which measures the two dimensional coordinates o f t h e s t y l u s o r pen which i s used f o r i n p u t t i n g s t r u c t u r e s . As the s t y l u s i s moved along the surface o f the t a b l e t , LHASA i s t o l d by the t a b l e t where the s t y l u s touched the surface and where i t had moved t o when i t was l i f t e d . A l i n e i s displayed on the CRT between these two p o i n t s e x a c t l y l i k e a bond being drawn on a sheet o f paper. The LHASA graphics routines recognize t h a t two atoms are required t o make t h i s bond and makes appropriate i n t e r n a l e n t r i e s i n the program data base. Conforming t o standard s t r u c t u r e conventions, an atom
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
6
COMPUTER-ASSISTED ORGANIC SYNTHESIS
i s carbon unless otherwise i n d i c a t e d and s u f f i c i e n t hydrogens are assumed to f i l l out valence* In short, LHASA graphics l e t the chemist enter the s t r u c t u r e exactly as he would draw i t on a sheet o f paper.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
When a m u l t i p l e bond i s d e s i r e d , i t i s necessary only t o t r a c e over the s i n g l e bond one (or two) a d d i t i o n a l times. LHASA responds by redrawing the bond as double (or t r i p l e ) . I n d i c a t i n g stereochemistry i s e s s e n t i a l l y the same, the appropriate i n d i c a t o r (wedged o r dotted) i s chosen and the d e s i r e d bond i s traced w i t h the s t y l u s (see below). STORE
END
REPLAY
SCAN
SC0PE2
PROCESS
ORAM
MOVE
DELETE
WIPE
STEREO
MENU2
H UP
O
N
C
OOMN
P
S
LEFT
F
C
RIGHT
L
B SHALL
R
I
X
-
SINGLE-CRT
•
ic
ATOMNO
A
Ε BONONO
Considerable e f f o r t was expended t o insure that no a r t i s t i c t a l e n t i s required i n s t r u c t u r a l i n p u t . A reasonable amount o f inaccuracy i s permitted i n p o i n t ing t o an atom o r bond. LHASA determines what was intended and acts a c c o r d i n g l y . There i s no need t o worry about consistency o r p r e c i s i o n o f bonds o r angles. With the one exception o f i n d i c a t i n g c i s trans isomerism around double bonds, i t makes no d i f f e r e n c e how d i s t o r t e d the s t r u c t u r e i s i n p u t . LHASA works s o l e l y from information about connec t i v i t y . This f l e x i b i l i t y i s q u i t e u s e f u l f o r drawing
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK AND COREY
7
LHASA
s t r u c t u r e s i n a p a r t i c u l a r conformation. The program w i l l process i t c o r r e c t l y and d i s p l a y a l l o f f s p r i n g i n the same conformation.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
The only other times t h a t the chemist must com municate w i t h LHASA are when d e c i s i o n s are to be made about which s t r u c t u r e i s to be analyzed and what method i s to be sed. I n a l l cases a t a b l e o f choices (or menu) i s displayed and the chemist merely p o i n t s to the one (or ones) that he wishes. A sample menu i s shown below. SINGLE GROUP
GROUP PAIR
DEBUG
FULL SEARCH
FULL SEARCH
TREE
BOND MODE
EXIT
BOND MODE
NARROW MODE SUBGOALS MANUAL MODE
KEY SUBSTRUCTURES
SEQUENTIAL FGI DIELS-ALDER ISOLATED STRAT BOND
APPENDAGE CHEMISTRY ROBINSON k+2 RING APPNDG ONLY
DISCONNECTIVE
ROBINSON X-3 BRANCH APPNDG ONLY
RECONNECTIVE UNMASKING
SMALL RINGS PERCEPTION ONLY
STEREOSPECIFIC C=C
At no time i s he forced to l e a r n any s p e c i a l command formats or memorize l i s t s o f o p t i o n s . The LHASA graphics and c o n t r o l s t r u c t u r e s were s p e c i f i c a l l y designed to be as easy and n a t u r a l f o r the chemist t o use as p o s s i b l e . PERCEPTION Inherent i n a s t r u c t u r a l diagram i s a wealth o f information - r i n g s , f u n c t i o n a l groups, stereo chemistry, e t c . To be as e f f e c t i v e as p o s s i b l e , LHASA must recognize a l l o f these and u t i l i z e them i n i t s planning processes. The procedures by which these are c a l l e d p e r c e p t i o n . By v i r t u e of having very s o p h i s t i c a t e d perception r o u t i n e s , LHASA can avoid f o r c i n g the chemist t o input
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8
COMPUTER-ASSISTED ORGANIC SYNTHESIS
any a r t i f i c i a l (to him) i n f o r m a t i o n . An unexpected adjunct o f t h i s has been a guarantee of perceptual completeness. For example, consider the f o l l o w i n g s t r u c t u r e (the non-indole p o r t i o n o f the a l k a l o i d ajmaline).
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
HO
There are three six-membered r i n g s , one five-membered r i n g and one seven-membered r i n g yet few chemists perceived a l l of them. I f the s t r u c t u r e were redrawn as HO
a d i f f e r e n t , though s t i l l incomplete set of r i n g s i s recognized by the human. The point of t h i s example i s that LHASA must perceive r i n g s s o l e l y on the b a s i s of c o n n e c t i v i t y , not how the s t r u c t u r e i s drawn. The program would be useless i f i t missed syntheses based on c y c l i c substructures because the chemist had f a i l e d to i n d i c a t e a l l the r i n g s i n the molecule. RING PERCEPTION Many researchers have attacked the problem of f i n d i n g the set of c y c l e s i n a graph. T h e i r work 2
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK AND COREY
LHASA
9
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
has p r i m a r i l y been d i r e c t e d towards i d e n t i f y i n g the smallest set rings o f rings i n the network. For chem i c a l purposes, t h i s i s not s u f f i c i e n t however. F o r example, the s t r u c t u r e below
i s best synthesized by the D i e l s Alder a d d i t i o n shown, but the six-membered r i n g formed i s not part of the minimal c y c l i c b a s i s o f the molecule. I t i s necessary, t h e r e f o r e , t o redefine our problem as that of f i n d i n g the set o f cycles i n a graph which are o f chemical s i g n i f i c a n c e . For s y n t h e t i c purposes, rings must be s p l i t i n t o two classes - r e a l and pseudo. For each bond i n a molecule, the smallest r i n g c o n t a i n i n g that bond i s c a l l e d a r e a l r i n g . I t i s q u i t e p o s s i b l e that the number o f r e a l rings w i l l be greater than the c y c l i c order of the molecule (/ bonds - / atoms + 1). Pseudo r i n g s are the pairwise envelopes o f r e a l rings with the r e s t r i c t i o n that the s i z e o f the envelope be seven or l e s s . Real rings are u s e f u l because the chemistry of a bond i s best r e f l e c t e d by the s i z e o f the s m a l l est r i n g c o n t a i n i n g i t - f o r example, the f u s i o n bond i n the s t r u c t u r e below.
2 See f o r example Paton, K., Commun. Assoc. Comput. Mach., 12, 514 (1969).
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
10
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Pseudo rings are u s e f u l because they are the r i n g s which are o f t e n formed i n the c o n s t r u c t i o n o f bridged molecules. STRATEGIC BONDS - CYCLIC
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
There are u s u a l l y c e r t a i n bonds i n a molecule whose disconnection i n the r e t r o s y n t h e t i c d i r e c t i o n leads t o a s i g n i f i c a n t s i m p l i f i c a t i o n o f the c y c l i c s t r u c t u r e . These are termed s t r a t e g i c bonds. Since these have been described i n d e t a i l elsewhere, we s h a l l consider them here only b r i e f l y . ^ The f i r s t premise o f s t r a t e g i c bonds i s that the chemical a c t i v i t y o f a bond i s a d i r e c t f u n c t i o n o f the s i z e o f the smallest r i n g c o n t a i n i n g i t . This leads t o the requirement that a s t r a t e g i c bond must be i n a r i n g o f f i v e , s i x , o r seven members and not i n or exo t o a c y c l o p r o p y l r i n g . A s t r a t e g i c bond must also be i n the r i n g ( i f any) with the maximum number of bridges on i t . This insures t h a t i t s disconnection w i l l s i m p l i f y the c y c l i c network as much as p o s s i b l e . The s t r u c t u r e below shows the power o f t h i s h e u r i s t i c . 0
HO
5 Corey, E. J . , W. J . Howe, H. W. Orf, D. A. Pensak and G. A. Petersson, J . Amer. Chem. Soc, 2L 6ll6 (1975).
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK A N D COREY
11
LHASA
Other r e s t r i c t i o n s on s t r a t e g i c bonds prevent them from being aromatic and from l e a v i n g c h i r a l side chains. These requirements are also based on current chemical technique. FUNCTIONAL GROUP PERCEPTION
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
F u n c t i o n a l i t y i n molecules i s w e l l defined. There i s no argument about whether a group i s a ketone o r not. The problem i n LHASA i s t o perceive a l l groups and j u x t a p o s i t i o n s o f groups which can be chemically meaningful. To t h i s end, a context dependent grammar has been developed t o unambiguously represent the p h y s i c a l domain o f a group and the s i t e ( s ) at which i t can be expected t o r e a c t . This grammar defines which atoms i n the group are considered o r i g i n s . Each f u n c t i o n a l group i s c h a r a c t e r i z e d by at l e a s t one carbon atom which i s i t s attachment t o the r e s t o f the molecule. This i s c a l l e d an o r i g i n atom and i t i s around these o r i g i n s t h a t many o f the data t a b l e s i n LHASA are organized. As an example, an alpha-beta unsaturated ketone has two o l e f i n i c p o s i t i o n s w i t h s i g n i f i c a n t l y d i f f e r e n t a f f i n i t y t o e l e c t r o p h i l i c reagents. To consider the double bond as one group w i t h constant r e a c t i v i t y i s an unreasonable s i m p l i f i c a t i o n , but r e c o g n i z i n g i t as two o r i g i n atoms each w i t h o l e f i n i c character makes s u i t a b l e d i f f e r e n t i a t i o n p o s s i b l e . To accomplish t h i s r e c o g n i t i o n e f f i c i e n t l y , a t a b l e d r i v e n approach was chosen since the types o f groups t o be recognized change from time t o time, depending on the needs o f those using the program ( s i x t y - f o u r d i f f e r e n t groups are c u r r e n t l y recognized by LHASA, see Table 1). This c o n s i s t s o f a s e r i e s o f questions which can be answered w i t h e i t h e r a y e s o r "no" · Depending on the answer, the type o f the next question t o be asked i s s p e c i f i c a l l y determined. F o r n
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
,!
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
Table 1 KETONE ALDEHYDE ACID ESTER AMIDE*1 AMIDE*2 AMIDE*? ISOCYANATE ACID*HALIDE THIOESTER AMINE*} AZIRIDINE AMINE*2 AMINE*1 NITROSO DIAZO HALOAMINE HYDRAZONE OXIME IMINE THIOCYANATE ISOCYANIDE NITRILE AZO HYDROXYLAMINE NITRO AMINEOXIDE THIOL EPISULFIDE SULFIDE SULFOXIDE
SULFONE C*SULFONATE LACTAM PHOSPHINE PHOSPHONATE EPOXIDE ETHER PEROXIDE ALCOHOL NITRITE 0*SULFONATE FLUORIDE CHLORIDE BROMIDE IODIDE DIHALIDE TRIHALIDE ACETYLENE OLEFIN HYDRATE HEMIKETAL KETAL HEMIACETAL ACETAL AZIDE DISULFIDE ALLENE LACTONE VINYLW VINYLD ESTERX AMIDZ
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK A N D COREY
LHASA
example, suppose a carbon-nitrogen t r i p l e bond has been found the questions would look, l i k e C=N
?
yes >
isocyanide
yes
-C ? Ino
thiocyanate
yes
c-s-•CsNf
?
no v
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
nitrile
The a c t u a l data t a b l e which d r i v e s t h i s r e c o g n i t i o n process i s reproduced below. A22 A25 A24 A25 A26 A27 A28 A29
AJO A31 A32 A3 3 A34 A35 A3 6 A37 A38 A39
A40 A4l A42 A43
A44 A45 A46 A47
A48 A49
A50 A51 A52 A53
A54 A55
A56 A57 A58
A59 A60
A6l A62 A63 A64 A65
LOC A 2 3 NULL LOC A 2 5 NULL LOC A 2 7 LOC A 2 5 LOC A29 LOC A30 LOC A31 NULL LOC A33 LOC A34 LOC A35 LOC A36 NULL LOC A38 LOC A 3 9 LOC A40 LOC A4l NULL NULL LOC A44 NULL LOC A46 LOC A 4 7 LOC A48 NULL LOC A50 LOC A51 LOC A52 LOC A 5 3 LOC A54 LOC A55 LOC A56 NULL LOC A 5 8 LOC A59 LOC A 6 0 LOC A6l LOC A62 LOC A63 NULL LOC A65 NULL
LOC A24 NULL LOC A 2 6 NULL LOC A 2 8 LOC A 2 8 LOC A32 NULL LOC A31 NULL LOC A33 LOC A43 LOC Α35· LOC A37 NULL NULL LOC A 3 9 LOC A40 LOC A42 NULL NULL LOC A 4 5 NULL LOC A66 LOC A 4 7 LOC A 4 9 NULL LOC A 5 7 NULL LOC A52 LOC A 5 3 LOC A56 LOC A55 LOC A56 NULL NULL LOC A59 LOC A 6 0 LOC A64 LOC A62 LOC A63 NULL LOC A63 NULL
SHIFT + IF CARBON*COUNT IS TWO IDENTIFIED AS KETONE IF HYDROGEN*COUNT IS TWO IDENTIFIED AS ALDEHYDE IF HYDROGEN*COUNT IS ONE IF CARBON*COUNT IS ONE SEARCH FOR C**N SHIFT + SEARCH FOR C*N NONORIGIN ENTRY IDENTIFIED AS ISOCYANATE ENTRY BOND*SHARED SEARCH FOR C*0 BOND*SHARED SHIFT + IF HYDROGEN*COUNT IS ONE IDENTIFIED AS ACID SEARCH FOR C*0 SHIFT + NONORIGIN BOND*SHARED SHIFT + IF IN RING OF ANY SIZE IDENTIFIED AS LACTONE IDENTIFIED AS ESTER SEARCH FOR C*X IDENTIFIED AS ACID*HALIDE SEARCH FOR C*N BOND*SHARED SHIFT + IF HYDROGEN*COUNT IS TWO IDENTIFIED AS AM3DE*1 SHIFT + IF IN RING OF ANY SIZE SHIFT + SEARCH FOR C*N SHIFT + NONORIGIN SHIFT + BOND*SHARED SEARCH FOR C*N SHIFT + NONORIGIN BOND*SHARED IDENTIFIED AS LACTAM IDENTIFIED AS LACTAM BOND*SHARED SHIFT + NONORIGIN SHIFT + SEARCH FOR C*N BOND*SHARED SHIFT + NONORIGIN IDENTIFIED AS AMIDE*3 IF HYDROGEN*COUNT IS ONE IDENTIFIED AS AMIDE*2
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
14
COMPUTER-ASSISTED ORGANIC SYNTHESIS
FUNCTIONAL GROUP REACTIVITY
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
Frequently, during the experimental r e a l i z a t i o n of a s y n t h e t i c plan, c e r t a i n f u n c t i o n a l groups w i l l i n t e r f e r e w i t h the performance o f desired r e a c t i o n s . When t h i s happens, i t becomes necessary t o protect the offending group ( r e v e r s i b l y modify i t t o some other f u n c t i o n a l i t y that i s s t a b l e t o t h e r e a c t i o n c o n d i t i o n s ) . The extension o f computer a s s i s t e d synthetic analysis to sophisticated levels necessi t a t e s the d e t e c t i o n o f p o s s i b l e i n t e r f e r e n c e s . Such s i t u a t i o n s must be presented t o t h e chemist i n a g e n e r a l l y u s e f u l manner. This problem has been attacked i n LHASA by the separation o f f u n c t i o n a l group i n t o d i f f e r e n t c l a s s e s based on t h e i r e l e c t r o n i c and s t e r i c environment. At the same time a l i b r a r y o f standard reagents (cur r e n t l y 60) has been prepared c o n t a i n i n g the s t a b i l i t y of each o f the c l a s s e s o f f u n c t i o n a l groups t o each reagent. By t h i s mechanism the program can decide whether groups o f an i d e n t i c a l o r s i m i l a r type w i l l i n t e r f e r e w i t h the transform. F o r example, i n the s t r u c t u r e below i t i s p o s s i b l e t o s e l e c t i v e l y hydrogenate bond A i n the presence o f Β
From a computational point o f view, f u n c t i o n a l group r e a c t i v i t y i s s t r a i g h t f o r w a r d . Associated w i t h each group o r i g i n i s a number which unambiguously defines the environment o f the o r i g i n . These i n c l u d e s t e r i c hindrance and a c c e s s i b i l i t y , s t r a i n , and e l e c t r o n i c environment. I t i s important t o note that a group can be i n s e v e r a l subclasses simultaneously, a l l o f which must be encoded i n t o the one number. This number i s used t o assign r e a c t i v i t y l e v e l s t o each o r i g i n r e l a t i v e t o each reagent.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK AND COREY
15
LHASA
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
At the time of attempted r e a c t i o n execution, the program reads the d e s i r e d conditions from the data base. The r e a c t i v i t y l e v e l s of a l l n o n - p a r t i c i p a t i n g groups are examined. I f a l l are l e s s r e a c t i v e than the p a r t i c i p a t i n g group(s), then the transform i s allowed to proceed. I f t h i s i s not the case, the offending group(s) i s examined to see i f i t i s g e n e r a l l y p r o t e c t a b l e . I f i t i s then a s o l i d r e c tangle i s drawn around the group as the transform i s displayed to the chemist. Unstable, unprotectable groups are g r a p h i c a l l y i n d i c a t e d by a dashed box around t h e i r bonds.
LHASA does not t r y to assign s p e c i f i c p r o t e c t i n g groups. There i s j u s t so much chemical d e t a i l that would have to be programmed that the i n t e r a c t i v e aspect of LHASA would be severely degraded. An addi t i o n a l problem e n t a i l s evaluating when i n the syn t h e t i c route i t would be best to protect and then deprotect the group(s). The program c u r r e n t l y deals w i t h the synthesis t r e e on a node by node b a s i s . A g l o b a l o p t i m i z a t i o n of the i n d i v i d u a l steps i n the t r e e i s one a d d i t i o n a l l e v e l of s o p h i s t i c a t i o n which has not yet been attempted. APPENDAGE BASED STRATEGIES The vast majority of m u l t i s t e p syntheses i n v o l v e e i t h e r the disconnection, reconnection, or m o d i f i c a t i o n of what are l o o s e l y c a l l e d 'appendages . One p a r t i c u l a r l y u s e f u l r e t r o s y n t h e t i c strategy c o n s i s t s of fragmenting a r i n g and then disconnecting the r e s u l t i n g appendages, as shown below. 1
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
16
COMPUTER-ASSISTED ORGANIC SYNTHESIS
.0
S i m i l a r l y , s t r a t e g i e s i n v o l v i n g reconnection of appendages are e x c e p t i o n a l l y u s e f u l i n s t e r e o s p e c i f i c s y n t h e s i s . These reconnections have a l s o proven valuable i n the synthesis of medium s i z e r i n g s .
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
?T
5
OH I t i s important to note t h a t a l l stereochemistry i n these examples i s perceived by LHASA and used i n i t s strategies. There are two c l a s s e s o f appendages - r i n g appendages and branch appendages. A r i n g appendage i s a group o f atoms attached t o r i n g t h a t i s not i n a r i n g i t s e l f . A branch appendage may only o r i g i n a t e on a non-ring atom and must have three o r more attachments other than hydrogens. Non-terminal o l e f i n s and acetylenes are a l s o considered as o r i g i n s o f branch appendages f o r chemical reasons. Signifi cant i n the use of appendages i s the c o m b i n a t o r i a l problem o f determining i d e n t i c a l i t y o f appendages. This has been solved q u i t e e l e g a n t l y by Jorgensen. Appendage based s t r a t e g i e s may be d i v i d e d i n t o disconnective and reconnect!ve. The l a t t e r may be f u r t h e r p a r t i t i o n e d i n t o r i n g appendage - r i n g Corey, E. J . , W. L. Jorgensen, J . Amer. Chem. S o c , 28, 189 (1976).
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK AND COREY
LHASA
17
appendage, r i n g appendage-ring, and a c y c l i c recon n e c t i o n s . LHASA c u r r e n t l y knows about twenty d i f f e r e n t c l a s s e s of reconnective transforms - a small subset of the group p a i r chemistry data base (vide i n f r a ) . When i n a mode where these transforms are s p e c i f i c a l l y being executed, they are empowered to make s e v e r a l small s t r u c t u r a l m o d i f i c a t i o n s to achieve the d e s i r e d reattachment. THE CHEMISTRY PACKAGES OF LHASA
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
F u n c t i o n a l Group Based Transforms As we have already seen, LHASA has a wide v a r i e t y of s t r a t e g i e s which i t can employ, e i t h e r of i t s own v o l i t i o n or by d i r e c t i v e of the chemist-user. In order t o f a c i l i t a t e the use of these s t r a t e g i e s , the chemical data base i n LHASA i s broken down i n t o s e v e r a l separate c a t e g o r i e s , two group transforms, one group transforms, f u n c t i o n a l group interchange, f u n c t i o n a l group a d d i t i o n and r i n g o r i e n t e d t r a n s forms. This s e c t i o n w i l l describe each of these and g i v e b r i e f examples of t h e i r use. Two group transforms are keyed s p e c i f i c a l l y by two f u n c t i o n a l groups w i t h a path of predetermined l e n g t h between them. Examples of these are shown below.
One group transforms are s i m i l a r but are keyed by one s p e c i f i c group w i t h an a s s o c i a t e d path (not as
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
18
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
chemically meaningful as above). below.
Examples are shown
The world of chemistry would indeed be rosy i f there were always p r e c i s e matches against t h i s data base. Unfortunately, t h i s i s not o f t e n the case. Frequently, one of the group ( i n a two group s i t u a t i o n ) does match, but the other one does not. I f the i n c o r r e c t group could r e a d i l y be converted i n t o a matching group, then the transform would become acceptable. As i n the A l d o l Condensation, i f the molecular fragment present were the ether, the match would not be found, yet the e t h e r i f i c a t i o n of the a l c o h o l can o f t e n be q u i t e s t r a i g h t f o r w a r d . I f the
«0^00 °^o^OO ^o*00 performance of the A l d o l i s considered a chemical ' g o a l , then the conversion of the ether to the a l c o h o l i s a 'subgoal . In t h i s case, the subgoal c o n s i s t e d of modifying a group or F u n c t i o n a l Group Interchange (FGI). 1
1
A more complicated case would e x i s t i f the second group necessary to key the transform was t o t a l l y absent. In the example below, the only f u n c t i o n a l substructure capable of keying a t r a n s form would be the o l e f i n . To perform the A l d o l
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK AND COREY
19
LHASA
CO-XO-,00 transform i t would be necessary to add ( i n the r e t r o s y n t h e t i c d i r e c t i o n ) the a l c o h o l group. Such subgoals are c a l l e d F u n c t i o n a l Group A d d i t i o n (FGA). Obviously one does not want to always introduce a l l p o s s i b l e f u n c t i o n a l groups at a l l a v a i l a b l e p o s i t i o n s o r do i n d i s c r i m i n a n t group conversions without some guiding purpose or s t r a t e g y . As such, FGI's and FGA s are only executed i n response to a request from a higher l e v e l chemistry package.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
1
Subgoal requests can be combined and mixed according to the s i t u a t i o n . Next to be added i s FGI then INTRO since not a l l groups may be INTRO ed. 1
Ring Oriented Transforms I t was recognized t h a t of great s i g n i f i c a n c e to LHASA type analyses was the i n c l u s i o n of chemistry packages whose sole purpose was the c o n s t r u c t i o n of r i n g s . These transforms could not and should not be keyed by the presence or absence of any p a r t i c u l a r f u n c t i o n a l i t y . Since they had s p e c i f i c long range goals, they were given considerable power i n the type and number of subgoals that they could request. This i s i n c o n t r a s t to the two group or one group chem i s t r i e s where only one FGI or FGA could be performed before the f i n a l d i s c o n n e c t i o n . Four r i n g forming transforms have been con sidered at length by the LHASA development group - the D i e l s A l d e r a d d i t i o n , the Robinson Annelation, the Simmons-Smith r e a c t i o n , and i o d o - l a c t o n i z a t i o n . The f i r s t three of these have been f u l l y implemented i n LHASA and the f o u r t h i s completely flow charted and awaits only coding i n t o the chemistry data base language.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
20
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
A l l r i n g chemistry t a b l e s are organized i n t o what i s c a l l e d b i n a r y search t r e e s . Queries are posed about the existence o f c e r t a i n s t r u c t u r a l f e a t u r e s . Each o f these questions i s answerable w i t h a yes o r a no. Based on the answer one o f two d i f f e r e n t f o l l o w up questions i s s e l e c t e d . Embedded w i t h i n the t a b l e may be requests f o r subgoals, e i t h e r those already i n the FGI o r FGA t a b l e o r f o r s p e c i a l r e a c t i o n s which are needed only f o r these transforms and are not o f general s y n t h e t i c i n t e r e s t . The f i r s t step i n implementation o f a r i n g transform i s the preparation o f a chemical flow c h a r t . This defines a l l the questions about the s t r u c t u r e and describes i n a graphic r e p r e s e n t a t i o n the s y n t h e t i c steps t h a t w i l l be taken. I t i s q u i t e s t r a i g h t f o r w a r d f o r a chemist having no f a m i l i a r i t y w i t h LHASA t o read and make use o f these c h a r t s . A number o f graduate students and p o s t d o c t o r a l f e l l o w s i n the Corey group at,Harvard U n i v e r s i t y made s i g n i f i c a n t input t o the chemistry i n the t a b l e s without ever having t o worry about the computer implementation. The example below shows some o f the s y n t h e t i c routes generated by the D i e l s A l d e r transform f o r the i n d i c a t e d precursor. I t i s important t o note that while some o f the chemistry may look somewhat naive, i t can be q u i t e thought provoking.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
PENSAK A N D COREY
LHASA
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
1.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
21
22
COMPUTER-ASSISTED ORGANIC SYNTHESIS
I t i s c l e a r that designing a synthesis o f a r i n g with so many stereocenters presents a formidable challenge f o r most s y n t h e t i c chemists. I t i s f a i r t o say that the r i n g transforms are a g e n e r a l i z a t i o n o f the concepts derived f o r the group o r i e n t e d c h e m i s t r i e s . Work i s c u r r e n t l y underway t o g e n e r a l i z e t h i s s t i l l f u r t h e r , t o permit generation o f a r b i t r a r i l y complex molecular patterns, always s p e c i f i a b l e i n a n o t a t i o n e a s i l y readable by the chemist.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
CHMTRN - CHEMICAL DATA BASE LANGUAGE The chemical transforms are the heart and s o u l o f LHASA. Without good chemistry i n the data base, a l l the s o p h i s t i c a t e d perception would be e s s e n t i a l l y use l e s s . The f i r s t requirement i n the design o f the data base was that i t be m o d i f i a b l e without having t o recompile any other part o f the program. The second requirement was that i t require no knowledge o f FOKIRAN or how LHASA i s organized on a subroutine by sub r o u t i n e b a s i s . The t h i r d requirement was that the data base be e a s i l y readable by chemists with no t r a i n i n g i n LHASA and m o d i f i a b l e a f t e r only a l i t t l e i n t r o d u c t i o n t o the language. To meet these c o n d i t i o n s a s p e c i a l chemical pro gramming language CHMTRN (Chemical T r a n s l a t o r ) ^ was developed. By use o f a s p e c i a l assembler - TBLTRN (Table T r a n s l a t o r , w r i t t e n by Dr. Donald E. B a r t h ) , i t was p o s s i b l e t o convert the CHMTRN t a b l e s i n t o s p e c i a l l y encoded FORTRAN BLOCK DATA statements which could be loaded with LHASA o r read i n at run time. The b a s i c approach o f CHMTRN i s that there are keywords ( c u r r e n t l y s e v e r a l hundred) that have
I f t h i s name c o n f l i c t s o r d u p l i c a t e s that o f some other chemical program, I apologize. The d u p l i c a tion i s unintentional.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
P E N S A K AND
COREY
LHASA
23
s p e c i f i c numerical values assigned to them. A l l of these keywords that are typed on a s i n g l e l i n e i n the data base are l o g i c a l l y 'or' ed together i n t o one computer work. (The t r a n s l a t i o n and combination are handled by TBLTRN). Each such l i n e i s c a l l e d a q u a l i f i e r as i t l i m i t s or modifies the scope of the transform.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
LHASA contains an i n t e r p r e t e r c a l l e d EVLTRN (Evaluate Transform) which decodes the b i t patterns and performs the requested queries about the current s t r u c t u r e or performs a s p e c i f i e d operation. As an example, consider a l i n e from the t a b l e s which says SUBTRACT 20 FOR EACH PRIMARY HALIDE ALPHA TO CARB0N*1 OFFPATH ONRING. This q u a l i f i e r decrements the base r a t i n g of the transform f o r each primary h a l i d e that i s on a c y c l i c carbon which i s not a part of the path keying the transform. From t h i s example, i t i s p o s s i b l e to see how densely the chemical data i s packed (one q u a l i f i e r takes up only one computer word - 32 b i t s ) . There i s a t a r g e t to be searched f o r (the h a l i d e ) , a domain or l o c a t i o n to which the search i s r e s t r i c t e d (alpha to the carbon but on a r i n g and o f f the path), and an i t e r a t i o n command i n d i c a t i n g that the operation (the s u b t r a c t i o n ) i s to be performed f o r each occurrence. CHMTRN has s e v e r a l other c o n s t r u c t i o n s worthy of mention. The f i r s t i s the a b i l i t y to make m o d i f i c a t i o n s to the s t r u c t u r e according to r e s u l t s of q u a l i f i e r e v a l u a t i o n s . One can say, f o r example, ATTACH AN ALCOHOL TO CARB0N*2 CIS TO CARB0N*4 This command also shows j u s t one case where s t e r e o chemical considerations can be i n c l u d e d . Complete block s t r u c t u r i n g (as i n P L / l or ALGOL) has been incorporated. This i s u s e f u l where a com plex s e r i e s of queries should be a p p l i e d i n c e r t a i n
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
24
COMPUTER-ASSISTED ORGANIC
SYNTHESIS
repeatable circumstances, f o r example, FOR EACH KETONE ANYWHERE DO BEGIN
END.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
A l l q u a l i f i e r s between the BEGIN and END are executed f o r each KETONE. These may be nested t o any d e s i r e d depth. S i m i l a r l y IF-THEN-ELSE c o n s t r u c t i o n s are a l s o allowed. Software subroutines have a l s o been implemented i n CHMTRN/EVXTRN. Suppose there i s one group o f q u a l i f i e r s which needs o f t e n be applied t o d i f f e r e n t l o c a t i o n s i n the molecule at varying times. This can be handled by the c o n s t r u c t i o n CALL FGIW AT CARB0N*3 AND B0ND*2 In the subroutine FGIW these arguments are addressable as SPECIFIED*ATOM and SPECIFIED*BOND. I t i s p e r m i s s i b l e t o apply stereochemical c o n s t r a i n t s t o arguments at the time o f execution o f the CALL. There i s no p r a c t i c a l l i m i t on the depth o f subroutine c a l l s . Subroutines may a l s o r e t u r n a value t o i n d i c a t e whether o r not they succeeded i n the task they were assigned t o do. The Robinson Annelation transform has received d e t a i l e d examination by the LHASA group. One sub r o u t i n e i n the t a b l e s p e c i f i c a l l y checks t o see i f there i s any f u n c t i o n a l i t y alpha t o the ketone i n the cyclohexane and i f there i s , remove i t by exchanging i t f o r something non-offensive. This subroutine i s reproduced below as an example o f the CHMTRN language.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
P E N S A K A N D COREY
LHASA
25
...THIS SUBROUTINE IS CALLED TO CLEAR AWAY ANY UNDESIRABLE FUNCTIONALITY ...ALPHA TO A KETONE ON THE RING ALPHCHK IF NO HYDROGEN ON THE SPECIFIED ATOM THEN GO TO 19 IF THERE IS NOT A WITHDRAWING GROUP ON THE SPECIFIED ATOM THEN GO TO l8 IF THE SPECIFIED ATOM IS THE SAME AS CARB0N*2 THEN RETURN SUCCESS IF BOND*5 IS A FUSION*BOND THEN RETURN SUCCESS IF THERE IS NOT A NITRO ON THE SPECIFIED ATOM THEN GO TO l8 EXCHANGE THE GROUP FOR AN AMINE IF SUCCESSFUL THEN GO TO J2 OTHERWISE RETURN FAIL 18 IF THERE IS A HALIDE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A KETONE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A WITHDRAWING GROUP ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A DONATING GROUP ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS AN OLEFIN ALPHA TO THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A FUNCTIONAL GROUP ON THE SPECIFIED ATOM THEN RETURN FAIL IF THERE IS NOT A FUNCTIONAL GROUP ALPHA TO THE SPECIFIED ATOM THEN RETURN SUCCESS EXCHANGE THE GROUP FOR A WITHDRAWING GROUP IF SUCCESSFUL THEN GO TO J2 OTHERWISE RETURN FAIL
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
19
J2
IF THE SPECIFIED ATOM IS A QUATERNARY^ENTER THEN RETURN FAIL IF THERE IS AN ALCOHOL ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A HALIDE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS NOT AN ETHER ON THE SPECIFIED ATOM THEN RETURN FAIL CALL DET THE GROUP AT THE SPECIFIED ATOM AND GO TO RET
Two examples o f how these constructs are a p p l i e d together w i l l demonstrate t h e i r u t i l i t y and f l e x i b i l i t y . A number o f r e a c t i o n s , such as Michael a d d i t i o n depend on the conformation o f the i n t e r mediate enolate f o r t h e i r s p e c i f i c i t y . I t i s p o s s i b l e to make i n i t i a l queries about the s t r u c t u r e , generate the enolate, ask about i t , then generate the f i n a l precursor and ask questions about i t . At each stage of t h i s process, i t i s p o s s i b l e t o detect a f a t a l c o n d i t i o n and terminate e v a l u a t i o n o f t h e transform. This same language i s being used t o s u c c e s s f u l l y c a l c u l a t e p r e f e r r e d conformations o f cyclohexanes f o r e v a l u a t i o n o f r e g i o s p e c i f i c i t y and i n f u n c t i o n a l group reactivity analysis. ORGANIZATION OF THE DATA BASE The one group and two group and the subgoal t a b l e s are queried very f r e q u e n t l y during a t y p i c a l a n a l y s i s s e s s i o n . A data s t r u c t u r e has been developed which i s extremely e f f i c i e n t f o r these searches - the r e t r i e v a l time being independent o f e i t h e r the s i z e o f the data t a b l e o r the number o f s u c c e s s f u l h i t s i n the t a b l e . Because o f the general a p p l i c a b i l i t y o f t h i s
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
26
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
technique, we s h a l l describe i t i n more d e t a i l - using the two group t a b l e s as an example*
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
At present there are s i x t y - f o u r d i f f e r e n t func t i o n a l groups which are capable o f keying a transform i n some manner. This can e i t h e r be as the f i r s t key ing group o r the second group ( f o r example, both a ketone and an o l e f i n key the A l d o l Condensation)
The keying mechanism must point t o the a p p l i c a b l e transform regardless o f the o r d e r i n g o f the groups and i t must a l s o handle s i t u a t i o n s where the keying groups are the same. The f i r s t element i n the r e p r e s e n t a t i o n i s a ' d i r e c t o r y s e t - a Boolean set w i t h a b i t on f o r each group t h a t can p a r t i c i p a t e i n any transform a t the d e s i r e d path l e n g t h . I f the group i s not marked i n t h i s set, then there i s no need t o f u r t h e r i n t e r r o gate the t a b l e - there w i l l not be any acceptable entries· 1
For each group t h a t does p a r t i c i p a t e i n t r a n s forms, there are two a d d i t i o n a l multi-word s e t s . A b i t i s on f o r each transform i n which the group takes part - i n the f i r s t set i f i t i s the f i r s t keying group and i n the second i f i t i s the second. L o g i c a l l y 'AND'ing the f i r s t group ε f i r s t set w i t h the second group's second set y i e l d s a set w i t h b i t s on f o r only these transforms. These b i t p o s i t i o n s are used as indexes i n t o a t a b l e o f addresses o f the q u a l i f i e r s for those a p p l i c a b l e transforms. While i t sounds r a t h e r complicated, i t r e a l l y i s not. What has been done i s t o generate an e x t e r n a l addressing s t r u c t u r e at assembly time. 1
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
PENSAK A N D COREY
27
LHASA 2 word*
—
Address o f Path 0 D i r e c t o r y Address o f Transform Address Table Count o f S p e c i a l Sets f o r t h i s Table Address o f Path 1 D i r e c t o r y
Group P a r t i c i p a t i o n S e t Symmetrical Trans fori Re connective Transforms Special s t r a t e g i c Directory
Subgoal Trans forms Simplifying
Sets
Trans fori
Disconnective Transforms 1st group as GROUP»1 1st group as GROUP*2
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
2nd group as GROUP*1 2nd group as GROUP*2 Group-transform sets
f
Transform Address Table
y
F i r s t Transform's
Qualifiers
X \y
Second Transform's
Qualifiers
Q u a l i f i eQrusa l i f i e r s TRemaining h i r d Transform's Group P a r t i c i p a t i o n S e t
The f i g u r e above shows the o v e r a l l s t r u c t u r e of the t a b l e . I t should be noted that i f you wish t o r e s t r i c t your search t o , f o r example, those transforms which break carbon-carbon bonds a l l that i s necessary i s t o define, at assembly time, a set t o i n d i c a t e t h i s c h a r a c t e r i s t i c and i n d i c a t e which transforms are a p p l i c a b l e . At run time, AND ing t h i s set with otherwise allowed transforms applies the r e s t r i c t i o n i n p a r a l l e l . This technique o f generating an e x t e r n a l addressing s t r u c t u r e when coupled w i t h Boolean opera t i o n s i s a q u i t e powerful and u s e f u l technique. ,
,
HOW DO YOU ADD TO THE DATA BASE The c r i t i c i s m has o f t e n been l e v e l l e d at LHASA that i t takes considerable time t o add t o the data
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
28
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
base. The D i e l s Alder t a b l e , f o r example, took: almost s i x man-months t o prepare and debug. This s e c t i o n w i l l describe the process o f b u i l d i n g a s o p h i s t i c a t e d data t a b l e l i k e the D i e l s Alder o r the Robinson Annelation and demonstrate that the s o p h i s t i c a t i o n o f the r e s u l t s obtained i s a d i r e c t f u n c t i o n o f the exhaustiveness and s p e c i f i c i t y o f the t a b l e s . ( I t i s worthwhile t o point out, however, that there are o f t e n times when naive chemistry proposed by LHASA i n s i t u a t i o n s where i t was not o r i g i n a l l y envisioned, has turned out t o be e x c e p t i o n a l l y i n t e r e s t i n g . ) A l l the r i n g transform packages i n LHASA employ binary search techniques. This means that a l l s t r u c t u r a l questions are t o be answered w i t h a yes o r a no. Preparation o f the sequence o f questions r e l a t i n g t o s t r a i g h t f o r w a r d chemical s i t u a t i o n s poses no r e a l problems. I t i s the i d e n t i f i c a t i o n and r e s o l u t i o n o f the e x t r a o r d i n a r y cases t h a t are d i f f i c u l t . For example, i n a Robinson disconnection f o r t h e sequence below, the geminal dimethyl s u b s t i t u t i o n i s a f o r m i dable problem.
0
I t i s up t o the chemist designing the t a b l e s t o f i r s t perceive that t h i s s i t u a t i o n might occur. Second, decide whether he wishes t o have the t a b l e s salvage the d i f f i c u l t s i t u a t i o n and i f he does, he has t o manually determine what kind o f chemistry should be attempted. The above example i s a c l e a r b l a c k o r white s i t u a t i o n . Unless the dimethyl s u b s t i t u e n t i s r e moved, the transform j u s t cannot proceed. The grey areas cause j u s t as much o f a dilemma f o r the chemist. In M a r s h a l l ' s synthesis o f isonootkatone two p o s s i b l e stereoisomers could have r e s u l t e d .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
1.
PENSAK A N D COREY
29
LHASA
à P r i o r i p r e d i c t i o n o f the stereochemical course o f a r e a c t i o n , even knowing the three dimensional s t r u c t u r e of the reagents i s q u i t e d i f f i c u l t , i f not impossible. When preparing a s e c t i o n o f the t a b l e d e a l i n g w i t h such an ambiguity the chemist i s faced w i t h two a l t e r n a t i v e s , disregard stereochemistry e n t i r e l y (and make sure that LHASA does not imply that any stereochemistry i s being s p e c i f i e d ) o r go i n t o the l a b o r a t o r y and run an experiment. The r e c o g n i t i o n o f t h i s s i t u a t i o n can o f t e n get the t a b l e w r i t i n g chemist t h i n k i n g and has sometimes even suggested s p e c i f i c reactions that should be run. As we have been adding t o the data base at Du Pont (to the one and two group t a b l e s ) , the ques t i o n has o f t e n been r a i s e d how much d e t a i l should we go i n t o i n the q u a l i f i e r s ? " This i s somewhat o f a dilemma, many o f the i n d u s t r i a l reactions we are d e a l i n g with have only been considered f o r a l i m i t e d number o f substrates. I t i s not c l e a r whether q u a l i f i e r s should be incorporated t h a t r e s t r i c t the transforms t o only those cases where i t i s known t o work o r whether only those which are known t o f a i l should be s p e c i f i c a l l y excluded. Both ways take an immense amount o f l i t e r a t u r e work t o do c o n s i s t e n t l y . M
We a l l know that butadiene can be dimerized under c a t a l y t i c c o n d i t i o n s t o a wealth o f d i f f e r e n t products. A d d i t i o n o f a l k y l substituents changes the product mix and introduces a v a r i e t y o f d i f f e r e n t stereoisomers as w e l l . What happens i f we put f u n c t i o n a l groups on butadiene. Do a l l the reactions s t i l l proceed - do you get any new ones, etc.? We do not know and s e r i ous doubt whether many experiments have ever been run
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
30
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
on such systems. The second a l t e r n a t i v e above has been chosen - q u a l i f i e r s are being used only t o exclude r e a c t i o n s known not t o work. As such, a l o t of naive chemistry comes out o f our v e r s i o n LHASA, some o f i t e x c e p t i o n a l l y poor but at l e a s t the analyses are reasonably comprehensive. In summary, adding simple r e a c t i o n s t o LHASA i s simple. I n c o r p o r a t i n g s o p h i s t i c a t e d r e a c t i o n s can be as complicated as you wish t o make i t . (Work i s c u r r e n t l y underway t o prepare a general package o f subgoal transforms which w i l l serve t o remove i n t e r ferences - r e l i e v i n g the chemist from having t o work them out separately f o r each super transform.) CONCLUSION Why i s Du Pont i n t e r e s t e d i n LHASA? The program was c l e a r l y designed f o r c a r b o c y c l i c n a t u r a l products synthesis i n mind - an area i n which the Company has only l i m i t e d i n t e r e s t . - Our attempt t o add i n d u s t r i a l s y n t h e t i c chem i s t r y t o LHASA i s f o r c i n g us t o organize our thoughts along l i n e s heretofore not done. We are being required t o look a t our r e a c t i o n s i n terms o f t h e i r known and unknown g e n e r a l i t y . This i n and o f i t s e l f i s highly b e n e f i c i a l . - Our pharmaceutical and agrichemical chemists have been using the n a t u r a l products aspects o f LHASA to generate new ideas, o f t e n not o f i n d u s t r i a l syn t h e t i c merit but c e r t a i n l y o f i n t e r e s t when l o o k i n g f o r ways t o make d e r i v a t i v e s , e s p e c i a l l y commonality of routes. - L a s t l y , we have already seen t h a t some o f our i n d u s t r i a l knowledge i s t u r n i n g out t o be u s e f u l t o organic synthesis i n other areas. For example, very few chemists o u t s i d e o f those who are a c t u a l l y using i t d a i l y are aware t h a t , given s u i t a b l e c a t a l y s t s , butadiene can dimerize i n t o the compound below - a
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
P E N S A K A N D COREY
LHASA
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
q u i t e a t t r a c t i v e (and inexpensive) f o r prostaglandin s y n t h e s i s .
J.
31
starting material
Our hope i s that LHASA w i l l help us t o insure that we have considered a l l reasonable routes t o our major products. ACKNOWLEDGMENTS The LHASA p r o j e c t has been i n existence since 1968. During the years a number o f e x t r a o r d i n a t e l y able graduate students and post d o c t o r a l f e l l o w s came t o work w i t h Professor Corey, almost a l l laboratory s y n t h e t i c chemists. Whether i t i s the i n f e c t i o u s enthusiasm f o r the p r o j e c t o r j u s t an enjoyment o f using the computer as a research t o o l , not one alumnus of the LHASA p r o j e c t has abandoned h i s involvement w i t h computers and returned t o bench chemistry. They are (with current l o c a t i o n s ) W. D. W. R. W. G.
J. Howe - Upjohn E. Barth - Harvard Business School L. Jorgensen - Purdue D. Cramer - Smith K l i n e and French Todd Wipke - Santa Cruz A. Petersson - Wesleyan W. Vinson - Harvard H. W. Orf - Harvard
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
32
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001
ABSTRACT Design o f complex organic syntheses is a task w e l l s u i t e d t o computer implementation. For a molecule o f moderate s i z e the number o f p o t e n t i a l s y n t h e t i c pathways is extremely l a r g e . Furthermore the number o f u s e f u l l a b o r a t o r y r e a c t i o n s is growing e x p l o s i v e l y . The LHASA program is a tool for syn t h e t i c chemists to aid in choosing the most reasonable routes t o any d e s i r e d molecule without exhaustive enumeration. The b a s i c s t r u c t u r e o f the program and the chemistry it employs are discussed g i v i n g s p e c i a l c o n s i d e r a t i o n t o the s t r a t e g i e s employed in s e l e c t i o n o f routes.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2 Computer Programs for the Deductive Solution of Chemical Problems on the Basis of a Mathematical Model of Chemistry
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
JOSEF BRANDT, JOSEF FRIEDRICH, JOHANN GASTEIGER, CLEMENS JOCHUM, WOLFGANG SCHUBERT, and IVAR UGI Organisch-Chemisches Institut, Technische Universität München, Arcisstr. 21, D-8 München-2, Germany
1.
Introduction
1.1
Computers and P r o g r e s s in Chemistry
In t h e p a s t decades t h e r e has been d r a m a t i c p r o g r e s s in c h e m i s t r y . The c o n c e p t u a l and theoretical approaches t o c h e m i s t r y have been decisively shaped by quantum mechanics. The r e s e a r c h t o p i c s and t e c h n i q u e s o f c h e m i s t r y have been changed in a p r o f o u n d manner by the advent o f modern s e p a r a t i o n and purification methods as w e l l as advances in structural d e t e r m i n a t i o n by t h e v a r i o u s types o f s p e c t r o s c o p y and X-ray crystallography. The use o f computers in c h e m i s t r y has a l s o p l a y e d an essential r o l e in this c o n t e x t . However, it is s a f e t o p r e d i c t t h a t t h e major c o n t r i b u t i o n s o f computers t o c h e m i s t r y a r e still t o be expected i n the future. Computers a r e a l r e a d y an important t o o l o f c h e m i s t r y . C o m p u t e r - a s s i s t e d documentation, the collection and e v a l u a t i o n o f e x p e r i m e n t a l d a t a , and quantum m e c h a n i c a l c a l c u l a t i o n s o f m o l e c u l a r p r o p e r t i e s predominate. The importance o f t h e solution o f c h e m i c a l problems such as the d e s i g n o f syntheses w i t h t h e aid o f computers is not y e t w i d e l y r e c o g n i z e d , but will have f a r - r e a c h i n g consequences and may w e l l l e a d t o r a t h e r fundamental changes in the activity o f c h e m i s t s . In
33 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
34
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
p a r t i c u l a r , the computer programs f o r the d e d u c t i v e s o l u t i o n o f c h e m i c a l p r o b l e m s on t h e b a s i s o f l o g i c a l s t r u c t u r e models and m a t h e m a t i c a l r e p r e s e n t a t i o n o f c h e m i s t r y have g r e a t p o t e n t i a l f o r the f u t u r e .
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
1.2
The S o l u t i o n o f C h e m i c a l P r o b l e m s on t h e o f T h e o r e t i c a l P h y s i c s and M a t h e m a t i c s
Basis
The m o l e c u l a r s y s t e m s , t h e s u b j e c t o f c h e m i s t r y , consist o f a r a t h e r s m a l l number o f b u i l d i n g b l o c k s , n a m e l y t h e c a . 100 c h e m i c a l e l e m e n t s w h i c h a r e c o m b i n e d a c c o r d i n g to r u l e s d e r i v e d from m a t h e m a t i c a l l y s t a t e d p h y s i c a l p r i n c i p l e s . The r e p r e s e n t a t i o n o f chemistry i n m a t h e m a t i c a l t e r m s seems t h e r e f o r e q u i t e n a t u r a l . T h i s has not yet been u t i l i z e d to the f u l l c o n c e i v a b l e extent. The e n e r g y h y p e r s u r f a c e d e s c r i b e s t h e e n e r g y o f c h e m i c a l s y s t e m s w i t h a g i v e n s e t o f atoms as a f u n c t i o n o f t h e s p a t i a l d i s t r i b u t i o n o f t h e a t o m i c n u c l e i . I f one c o u l d compute t h e c o m p l e t e e n e r g y h y p e r s u r f a c e f o r any s e t o f atoms and a n a l y s e t h e c o r r e s p o n d i n g d a t a i n a s u i t a b l e m a n n e r , one c o u l d p r e d i c t m o s t o f t h e relevant p r o p e r t i e s o f m o l e c u l a r systems w i t h the methods o f t h e o r e t i c a l p h y s i c s and m a t h e m a t i c s . T h i s , however, i s not f e a s i b l e i n f u t u r e , because the computational extremely l a r g e , even i n the case molecular systems.
the f o r e s e e a b l e e f f o r t w o u l d be of rather small
F o r t h e s o l u t i o n o f many c h e m i c a l p r o b l e m s t h e t h e o r e t i c a l t r e a t m e n t o f a few s e l e c t e d p o i n t s a n d t h e i r vicinity on t h e e n e r g y h y p e r s u r f a c e w o u l d s u f f i c e . Y e t , i n many c a s e s , o n e d o e s n o t know w h i c h p o i n t s a r e t h e r e l e v a n t o n e s . F o r c h e m i s t r y i t w o u l d be r a t h e r u s e f u l t o have a method f o r p r o v i d i n g a s u r v e y o f t h o s e p o i n t s and pathways on an e n e r g y h y p e r s u r f a c e which are e s s e n t i a l f o r the s o l u t i o n of a g i v e n problem, w i t h o u t t h e n e c e s s i t y o f a quantum m e c h a n i c a l t r e a t ment o f w i d e a r e a s o f a n e n e r g y h y p e r s u r f a c e . In t h e p r e s e n t p a p e r a s i m p l e m a t h e m a t i c a l model o f t h e c h e m i s t r y o f a g i v e n s e t o f atoms i s p r e s e n t e d w h i c h a f f o r d s p r e c i s e l y t h e l a t t e r , a n d may s e r v e a s a b a s i s o f computer programs f o r the d e d u c t i v e solution of a great v a r i e t y of chemical problems.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
2.
BRANDT
E T
AL.
Computer
Programs
2·
The C h e m i s t r y
2.1
Chemical Equivalence
for Chemical
of a Fixed
Set o f
Problems
35
Atoms
Classes
Any p r o g r e s s i n c h e m i s t r y may b e i n t e r p r e t e d a s t h e r e c o g n i t i o n o f new e q u i v a l e n c e r e l a t i o n s a n d c l a s s e s . U n t i l r e c e n t l y , a u n i v e r s a l model o f c h e m i s t r y could n o t b e d e f i n e d , b e c a u s e we l a c k e d t h e c o n c e p t s f o r a mathematical treatment of the c o n s t i t u t i o n a l aspect of c h e m i s t r y . T h e r e f o r e i t i s n e c e s s a r y t o c o n s i d e r any c o n s t i t u t i o n a l l y d i f f e r e n t m o l e c u l a r systems as e q u i v a l e n t r e g a r d l e s s o f t h e number o f m o l e c u l e s w h i c h they c o n t a i n , i f , i n p r i n c i p l e they are i n t e r c o n v e r t i b l e through chemical r e a c t i o n s .
2.2
Isomeric
Ensembles
of Molecules
The s e t o f a l l m o l e c u l e s c a n be p a r t i t i o n e d into e q u i v a l e n c e c l a s s e s w h o s e members h a v e a l l t h e same molecular formula, i . e . the equivalence classes of isomers. Isomerism i s here the r e l e v a n t equivalence relation. Extending the concept o f isomerism from molecules t o e n s e m b l e s o f m o l e c u l e s (EM) l e a d s t o a m a t h e m a t i c a l m o d e l o f c o n s t i t u t i o n a l c h e m i s t r y ^ . A n EM i s d e s c r i b e d by i t s l i s t o f t h e m o l e c u l a r s p e c i e s i n t h e f o r m o f s u i t a b l e c h e m i c a l f o r m u l a s . F o r a n EM o n e c a n d e f i n e two t y p e s o f e m p i r i c a l e l e m e n t a r y f o r m u l a s , o n o n e hand t h e e n s e m b l e f o r m u l a , on t h e o t h e r hand a p a r t i t i o n e d e m p i r i c a l ensemble formula which c o n s i s t s o f the m o l e c u l a r formulas o f t h e m o l e c u l e s i n t h e EM. 1
L e t A b e a s e t o f a t o m s . T h e n a l l E M ( A ) , i . e . a l l EM w h i c h c a n b e made f r o m A , h a v e t h e same e n s e m b l e f o r m u l a
. T h e p a r t i t i o n e d e m p i r i c a l f o r m u l a o f a n EM i s a p a r t i t i o n i n g { < A > , . . . < A >} o f A i n w h i c h e a c h i s t h e f o r m u l a o f a m o l e c u l e . Accordingly, a n E M ( A ; c o n s i s t s o f o n e o r more m o l e c u l e s , w h i c h a r e o b t a i n e d f r o m A , i f e a c h atom o f A i s u s e d e x a c t l y o n c e . t
S i n c e i s o m e r i c m o l e c u l e s may d i f f e r constitutionally and s t e r e o c h e m i c a l ^ , a p a r t i t i o n e d e n s e m b l e f o r m u l a g e n e r a l l y c o r r e s p o n d s t o more t h a n o n e E M ( A ) . T h e c o n s t i t u t i o n a l f o r m u l a o f a n EM(A) c o n t a i n s t h e c o n s t i t u t i o n a l formulas of a l l molecules i n that EM(A). An F I E M ( A ) , t h e f a m i l y o f i s o m e r i c e n s e m b l e s o f m o l e c u l e s o f an atom s e t A i s t h e c o l l e c t i o n o f a l l
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
36
E M ( A ) . An F I E M i s underlying set of
g i v e n by t h e atoms A .
empirical
ORGANIC SYNTHESIS
formula
of
the
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
A c h e m i c a l r e a c t i o n or a sequence of c h e m i c a l r e a c t i o n s , i s t h e t r a n s f o r m a t i o n o f a n FM i n t o a n i s o m e r i c E M . T h u s t h e w h o l e c h e m i s t r y o f a s e t o f atoms A i s g i v e n by t h e EM(A) and t h e i r i n t e r c o n v e r s i o n s w i t h i n t h e FIEM(A). An e n e r g y h y p e r s u r f a c e d e s c r i b e s a l l c h e m i c a l s y s t e m s w h i c h c o n t a i n a g i v e n s e t o f a t o m s . An F I E M ( A ) o f s t a b l e EM(A) c o r r e s p o n d s t o a f a m i l y o f e n e r g y m i n i m a on t h e e n e r g y h y p e r s u r f a c e o f t h e a t o m s e t A . S e q u e n c e s o f EM(A) w h i c h a r e c h e m i c a l l y i n t e r c o n v e r t e d correspond t o pathways on t h e e n e r g y h y p e r s u r f a c e o f A .
3.
BE-Matrices
In m o l e c u l a r systems t h e a t o m i c c o r e s a r e h e l d t o g e t h e r by v a l e n c e e l e c t r o n s w h i c h o c c u p y m o l e c u l a r o r b i t a l s i n v o l v i n g t w o , o r more a t o m s . A c o v a l e n t bond i s a p a i r o f v a l e n c e e l e c t r o n s i n a m o l e c u l a r o r b i t a l a b o u t two c o r e s . G e n e r a l l y , the chemical c o n s t i t u t i o n of a m o l e c u l a r system i s d e s c r i b e d by i t s c o v a l e n t l y b o u n d p a i r s o f a t o m i c c o r e s . The d i s t r i b u t i o n o f t h o s e v a l e n c e e l e c t r o n s w h i c h a r e n o t c o n t a i n e d i n c o v a l e n t c h e m i c a l b o n d s c a n be i n c l u d e d i n the d e s c r i p t i o n of a chemical c o n s t i t u t i o n . A c h e m i c a l c o n s t i t u t i o n i s u s u a l l y r e p r e s e n t e d by a c o n s t i t u t i o n a l f o r m u l a i n which the c h e m i c a l element symbols c o r r e s p o n d to the r e s p e c t i v e atomic c o r e s , the c o v a l e n t b o n d s a r e i n d i c a t e d by l i n e s b e t w e e n t h e e l e m e n t s y m b o l s , a n d t h e f r e e v a l e n c e e l e c t r o n s by d o t s at the atomic symbols. The c h e m i c a l c o n s t i t u t i o n o f m o l e c u l e s h a s a l s o b e e n r e p r e s e n t e d by v a r i o u s t y p e s o f m a t r i c e s . F o r t h e present purpose the BE-matrices are p a r t i c u l a r l y suitable. An η χ η
BE-matrix
Β =
\
(13)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT
ET AL.
Computer
Programs
for Chemical
Problems
37
r e f e r s t o a m o l e c u l e o r EM w h i c h c o n t a i n s a n a t o m s e t A = { A ^ , . . . , A } w i t h η i n d e x e d a t o m s . T h e i - t h row a n d i - t h column of A.
of a BE-Matrix
An o f f - d i a g o n a l
entry
b..
belongs
to the i - t h
atom
i n t h e i - t h row a n d j - t h
-J-J
column
is
the formal
covalent
bond
order
between
the
atoms
A. and Α . . S i n c e t h i s i m p l i e s a l s o a bond from ι J A. to A . we h a v e b . . = b . . . T h u s B E - m a t r i c e s a r e J ι ^-J J ι symmetric. Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
3
T h e i - t h d i a g o n a l e n t r y o f a B E - m a t r i x i s t h e number f r e e valence e l e c t r o n s which belong to A^. T h e number a t o m A^ i s
of valence e l e c t r o n s which belong g i v e n b y t h e r o w - c o l u m n sum
to the
T h e c r o s s sum b . o v e r a d i a g o n a l e n t r y c o m p r i z e s e n t r i e s i n t h e i - t h row a n d c o l u m n s a n d i s e q u a l b.
=
2
b
i
- b
i
±
of
all to
.
T h e c r o s s sum b . i s t h e number o f v a l e n c e which occupy the v a l e n c e o r b i t a l s o f A^.
electrons
A t a b l e which c o n t a i n s f o r a l l ^ c h e m i c a l elements the a l l o w a b l e c o m b i n a t i o n s o f b^, b^, c o o r d i n a t i o n numbers and b o n d o r d e r s a f f o r d s a q u i c k c h e c k w h e t h e r a B E matrix represents a valence chemically stable molecular system. The
sum
S
=
£
b
i j
o v e r a l l e n t r i e s o f a B E - m a t r i x i s t h e t o t a l number v a l e n c e e l e c t r o n s i n t h e E M . F o r a l l EM o f a n F I E M t h i s number i s t h e s a m e .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
of
COMPUTER-ASSISTED
38
Unless the some r u l e , represented equivalent the atomic indices.
ORGANIC SYNTHESIS
i n d e x i n g o f t h e atoms i s f i x e d a c c o r d i n g t o a g i v e n chemical c o n s t i t u t i o n i s not only by one B E - m a t r i x , b u t t h e r e a r e up t o η ! B E - m a t r i c e s w h i c h d i f f e r by p e r m u t a t i o n s of i n d i c e s and t h e r e s p e c t i v e row/column
Any B E - m a t r i c e s Β a n d B w h i c h d i f f e r o n l y b y row/ c o l u m n p e r m u t a t i o n s d e s c r i b e t h e same E M . T h e t r a n s f o r m a t i o n o f a n η χ η B E - m a t r i x Β by row/column p e r m u t a t i o n i n t o an e q u i v a l e n t B E - m a t r i x B c a n be a c h i e v e d by a n η χ η p e r m u t a t i o n m a t r i x Ρ a n d i t s i n v e r s e , P^, a c c o r d i n g t o f
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
f
B
t
_
p
t
.
. ρ
B
F o r v a r i o u s p u r p o s e s , s u c h a s d o c u m e n t a t i o n , one n e e d s an u n a m b i g u o u s c o r r e s p o n d e n c e b e t w e e n a n EM a n d i t s B E - m a t r i x . T h e r e f o r e , we h a v e d e v e l o p e d p r o c e d u r e s for t h e a s s i g n m e n t o f a t o m i c i n d i c e s w h i c h l e a d t o an canonical BE-matrix. Among t h e e q u i v a l e n t of several molecules
B E - m a t r i x o f a n EM w h i c h c o n s i s t s i>**«> > there exist BE-matrices
M
M
m
i n b l o c k f o r m , s u c h t h a t e a c h b l o c k r e p r e s e n t s one c o n t i g u o u s m o l e c u l e . The b l o c k f o r m o f B E - m a t r i c e s u s e f u l f o r t h e s e p a r a t i o n o f m o l e c u l e s o f EM w h o s e B E - m a t r i c e s have b e e n g e n e r a t e d by a c o m p u t e r .
4.
The R e p r e s e n t a t i o n R-Matrices
of
Electron
Relocation
is
by
A c h e m i c a l r e a c t i o n i s t h e c o n v e r s i o n o f a n EM i n t o i s o m e r i c EM by r e l o c a t i o n o f v a l e n c e e l e c t r o n s .
an
S i n c e t h e t o t a l number o f v a l e n c e e l e c t r o n s S d o e s n o t c h a n g e i n a c h e m i c a l r e a c t i o n , i t must be t h e same f o r t h e i n i t i a l EM(B) and t h e f i n a l EM(E) o f a c h e m i c a l r e a c t i o n . I t f o l l o w s t h a t t h e sum o f e n t r i e s o f B E m a t r i c e s must b e i n v a r i a n t u n d e r t h o s e B E - m a t r i x t r a n s f o r m a t i o n s Β -> Ε w h i c h r e p r e s e n t c h e m i c a l r e a c t i o n s EM(B) + EM(E). L e t Β and Ε be t h e B E - m a t r i x o f t h e s t a r t i n g EM(B) and t h e f i n a l E M ( E ) o f a c h e m i c a l r e a c t i o n . T h e n a n Rm a t r i x ( r e a c t i o n m a t r i x ) i s d e f i n e d by t h e transfor-
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
Computer
BRANDT E T A L .
The
sum o f
the
entries
Programs
for Chemical
r..
an R - m a t r i x
of
Problems
39
is
because
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
s
Σ
e..
=
Σ .. b
r
Σ b..
T h e m a t r i x R = Ε - B must b e s y m m e t r i c b e c a u s e m a t r i c e s Β a n d Ε a r e s y m m e t r i c by d e f i n i t i o n .
the
BE-
T h e m a t r i x R = -R i s t h e i n v e r s e o f R. T h e t r a n s f o r m a t i o n Ε + R = Β r e p r e s e n t s t h e r e a c t i o n E M ( Ε ) -> E M ( Β ) , i . e . t h e r e t r o - r e a c t i o n o f EM(B) EM(E) . An R - t r a n s f o r m a t i o n Β + R = Ε r e p r e s e n t s a c h e m i c a l r e a c t i o n i f , and o n l y i f i t obeys t h e m a t h e m a t i c a l f i t t i n g condition e.. = b.. + r.. > 0 IJ IJ i J f o r a l l e n t r i e s b e c a u s e , by d e f i n i t i o n , a B E - m a t r i x rrust have n o n - n e g a t i v e e n t r i e s . F u r t h e r m o r e , the r e s u l t Ε o f an R - t r a n s f o r m a t i o n o f a B E - m a t r i x Β o f a s t a b l e EM(B) w i l l represent a s t a b l e EM(E), i f the e n t r i e s e . . of Ε have v a l u e s t h a t are p e r m i s s i b l e f o r atoms A . and A . ( " c h e m i c a l f i t t i n g " ) .
the
respective
A c c o r d i n g l y , the p o s i t i v e e n t r i e s b . . of a given BE m a t r i x Β may b e u s e d t o d e t e r m i n e the negative e n t r i e s of mathematically f i t t i n g R-matrices. The p o s i t i v e e n t r i e s must be s e l e c t e d t o y i e l d a n R - m a t r i x 1
with can of
. .
r
i j
= 0.
Moreover,
the
J
entries
of
an
be s e l e c t e d t o meet t h e v a l e n c e c h e m i c a l t h e c h e m i c a l e l e m e n t s i n Ε = Β + R.
R-matrix restrictions
T h u s i t i s p o s s i b l e t o f i n d f o r a g i v e n EM a l l o f i t s c h e m i c a l r e a c t i o n s a n d t h e i r p r o d u c t s by t h e t r a n s f o r m a t i o n p r o p e r t i e s o f the B E - m a t r i c e s , and t h e v a l e n c e c h e m i c a l c o n s t r a i n t s o f t h e l a t t e r . The a p p l i c a t i o n o f a l l m a t h e m a t i c a l l y and c h e m i c a l l y f i t t i n g R-matrices to a BE-matrix generates the BE-matrices o f the whole FIEM.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
40
COMPUTER-ASSISTED
4.2
R-Categories
The t r a n s f o r m a t i o n o f a B E - m a t r i x by corresponds to a chemical r e a c t i o n . The
off-diagonal
the
number
and
a negative
electrons entries Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
ORGANIC SYNTHESIS
the
r..
broken
atoms
bonds
entries
covalent
diagonal
= r..
-LJ
covalent i i numbers
of
negative
A^
are
entry lose.
the
J -L
Α.-Α.,
r..
bonds r^ The
= r..
how
positive of
the
positive
R-matrix
indicate
between
tells
numbers
and the
a fitting
A.
and
many
A.,
free
off-diagonal newly
made
diagonal
entries
to the i n c r e a s e s i n f r e e e l e c t r o n atoms Α . . ι G e n e r a l l y , a n R - m a t r i x d o e s n o t o n l y f i t o n e , b u t many B E - m a t r i c e s . A c c o r d i n g l y , an R - m a t r i x does not o n l y r e p r e s e n t an i n d i v i d u a l c h e m i c a l r e a c t i o n , b u t a whole c a t e g o r y o f r e a c t i o n s w h i c h h a v e i n common t h e e l e c t r o n r e l o c a t i o n p a t t e r n r e p r e s e n t e d by t h e R - m a t r i x ) . An R - c a t e g o r y i s an e q u i v a l e n c e c l a s s o f c h e m i c a l r e a c t i o n s w h i c h h a v e i n common t h e same e l e c t r o n r e l o c a t i o n p a t t e r n and c e r t a i n f e a t u r e s o f t h e p a r t i c i p a t i n g bond s y s t e m s . The row/column p e r m u t a t i o n e q u i v a l e n c e o f B E - m a t r i c e s i m p l i e s t h a t R - m a t r i c e s r e p r e s e n t t h e same r e a c t i o n s when t h e y a r e i n t e r c o n v e r t e d b y row/column r
c
o
r
r
e
s
P ° ^ at the n
<
2
permutation
a c c o r d i n g to
P EP t
= P (B t
+ R)
Ρ = P BP + t
P RP. t
R - m a t r i c e s b e l o n g t o t h e same R - c a t e g o r y i f t h e y a r e i n t e r c o n v e r t e d b y r e d u c t i o n o r e x p a n s i o n , i . e . by r e m o v a l o r a t t a c h m e n t o f rows and columns c o n t a i n i n g only zeros. The e l i m i n a t i o n o f a l l z e r o s f r o m an R - m a t r i x irreducible R-matrix.
rows and columns c o n t a i n i n g y i e l d s the c o r r e s p o n d i n g
Any two R - m a t r i c e s R a n d R b e l o n g t o t h e same c a t e g o r y i f t h e r e e x i s t s an R - m a t r i x R which t r a n s f o r m e d int,o R a c c o r d i n g t o !
f t
R
= P R t
T
only
Rc a n be
*P,
and f r o m w h i c h R c a n be o b t a i n e d by r e m o v a l o r a t t a c h m e n t o f rows and columns c o n t a i n i n g z e r o s T
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
only.
2.
Computer
BRANDT E T A L .
Programs
for Chemical
Any two c h e m i c a l r e a c t i o n s w h i c h a r e t h e same i r r e d u c i b l e R - m a t r i x b e l o n g R-category.
41
Problems
represented t o t h e same
by
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
W i t h few e x c e p t i o n s , t h e s y n t h e t i c a l l y important organic chemical reactions proceed with a r e l o c a t i o n of e l e c t r o n s i n v o l v i n g up t o s i x a t o m s . I n s u c h r e a c t i o n s up t o t h r e e b o n d s a r e b r o k e n , a n d / o r n e w l y m a d e , i n some c a s e s a c c o m p a n i e d by a s i m u l t a n e o u s c h a n g e o f t h e f o r m a l e l e c t r i c a l c h a r g e by +1 a t one a t o m a n d - 1 a t a n o t h e r one. Such r e a c t i o n s b e l o n g t o R - c a t e g o r i e s w h o s e R - m a t r i c e s h a v e up t o t h r e e o f f - d i a g o n a l p a i r s o f p o s i t i v e and n e g a t i v e e n t r i e s r . . = r.. = +1, and
r^j
r^
= ± 2,
placed zero,
= r.^
in
= -1.
The
non-zero
corresponding
to
s u c h a manner
except
one
T a b l e 1 shows zero e n t r i e s .
non-radical
that
row/column
all pair
such R-matrices
as
T h e f o l l o w i n g r e a c t i o n s (-•) o r (<=) r e s p e c t i v e l y , are examples l i s t e d i n T a b l e 1. R-Cat.
1:
R-Cat.
2:
Li -C
R-Cat.
3:
J C
R~Cat.
4:
C H N=C:
iθ
Ρ: L
L
k e
J
+
H
J
+ Η
+
3
0
Ϊ
Η
Θ
entries
reactions,
row/column with
r.
lists
sums
=£ r . .
of
their
j®
—
+ H C H
:J
J
3
k
+ H C O H
t e
i
i
k
j
R-Cat.
fc
+ ci-ci «= c H fe
Ϊ,Ο + : 0 - Η
V
θ
\
5
Is
P - H
5: H , C - c f · '
>-Cl
l
HSCJTN'
H C£-N=C:
6:
HC = C Η I I Hp — C H
HC— C Η
+ C1
5
J
R-Cat.
.CI
k
5
L G
J
z
H^C
are
are = t non
their retro-reactions f o r the R - c a t e g o r i e s
—Li*"*
H
:
diagonal
C
H
7
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
42
3
ο
j
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
Ο
J
I
\
ν>ι
ι
1
)
W
μ* I
1-3 Ρ
?
ο φ
μ^ M h-^
\
to
I
\
J
P
W
I
\
)
P
P
K
P
O
ΓθΓ\)θο<ιμ*Γ\)θμ* I
I
Broken
No. o f Made
Bonds
No. o f P a r t i c i p a t i n g Free Electrons C_i.
μ*
+ "Τ" μ*
No. o f Bonds
? Ο
h3
cr
Φ
ι
μ*
et Η· Ο
+ μ*
Φ
W
μ^
Ο Μ) C/3 Ο
et
Φ
<<: Ο CD W
+ μ*
ο S5 Ο
I
α. ο ρ
ι μ*
—*—+—+~ + Γ Ο Γ Ο [V)
IV)
IV)
+ IV)
+ rv)
IV)
I IV)
s: hΗ· Ρ e t OQ
I
rv) s-S
Ρ Ω et Η· Ο
1
CD e t
IV)
Φ
CO
w μ. Φ
I
IV)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT
ET
ClH2
II.
and
R-Cat.
Computer
AL.
CJH2
7:
Brk
+
I
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
HCX n
L
HC-—N-C-H5 II· I
-
ο
θ
li
C
"ι ^V X c i
EC
Ο
y
lt/
+ :0-H " ~ * " I
C 1
HC^
x
n
Η and
θ
Ι;
Η
HCI
d
HzCJBrL
HC=N-Ct,H5 I; .
Ο
Problems
H2CLBrk
BrL
H
a
Programs for Chemical
Η
C l - C ^ r C H j - C H î N Î C H j ) 2 — • C l : % CH=CHZ + C H = N " ( C H P Z I^-N=Ct
:C=N-C^3 φ =
I^N=C—Ci-N-qp,
0-C(CHj) 2 and
|k
+ .-P^OCH^
HjC
|| k
X)J Φί
HjC^ ^
J
C=N-0-SOp H t
R - C a t . 11:
c!. HC'
I
«
HC.
V
CH H
^
( C H ) N - C H HrC ^ \y H-C-C-H ?
_
s
v
φ +
"
H
0-H
+
^ CH
Ο
#·1 j
η
( H £ l N = C - H H C-H \ , / H-C=C-H Z
R - C a t . 10:
^(OCHPJ
:C=N I
+
J
:0 S0 C% L
2
A
HC^ CH
I
ι
HC^ CH
XT
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
44
COMPUTER-ASSISTED
R - C a t . 1 2 : ^
^
A
H
Hi
+
—
^
C
II L
ORGANIC SYNTHESIS
V
li
Ho Reaction
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
4.3
Types
C u s t o m a r i l y t h e term " r e a c t i o n " does not r e f e r t o an i n d i v i d u a l c h e m i c a l r e a c t i o n , but a c o l l e c t i o n of r e a c t i o n s whose members h a v e i n common a c e r t a i n p a t t e r n o f e l e c t r o n f l o w i n v o l v i n g c e r t a i n k i n d s o f atoms and b o n d s , d i f f e r i n g o n l y by t h o s e m o i e t i e s o f t h e r e a c t i n g m o l e c u l e s w h i c h do n o t p a r t i c i p a t e i n t h e p r o c e s s i n a d i r e c t manner. Thus t h e t e r m 3 - e l i m i n a t i o n c o m p r i s e s a l a r g e number o f i n d i v i d u a l r e a c t i o n s , s u c h a s
CH
3
OH I -CH-CH.
;>
CH^-CH^l
+ CH -CH=CH
3 T
-> C H = C H 2
CH^-CH=CC1-CH
3
0
C
\
T
3
H
0
+
Ho0
2
+ HC1
2
-> CH - C = C - C H . , + HC1
3
OH H I I C H — C — N-CH
d. ο
T
2
3
3
+ C H -C=N-CH n
c
2 5 I
H
T
+
3
Ho0
2
In o r d e r t o c l a s s i f y c h e m i c a l r e a c t i o n s i n a c o n s i s t a n t m a n n e r w h i c h i s s u i t a b l e f o r c h e m i c a l p u r p o s e s , we d e f i n e the f o l l o w i n g h i e r a r c h y of equivalent classes: R - c a t e g o r y ^ R A - t y p e ^ R B - t y p e r > R l - t y p e z> R 2 - t y p e 1)
Any
two
R-matrices
R-category
if
their
R
1
and R
2
belong
irreducible
to
forms
the
. . . etc .
same
R ° i and
R
0
2
d i f f e r o n l y by r o w / c o l u m n p e r m u t a t i o n s . 2) Any two r e a c t i o n s b e l o n g t o t h e same R A - t y p e , i f t h e y b e l o n g t o t h e same R - c a t e g o r y , a n d i f t h e i r n o n - z e r o e n t r i e s r e f e r t o t h e same c o m p o n e n t s o f a n a s s o c i a t e d v e c t o r of atoms. 3) Any two r e a c t i o n s o f t h e same R A - t y p e b e l o n g t o t h e same R B - t y p e , i f t h e y h a v e t h e same p o s i t i v e entries
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT
ET
Computer
AL.
Programs
for Chemical
Problems
45
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
i n the a f f e c t e d B E - m a t r i c e s , where t h e R - m a t r i c e s have n o n - z e r o entries. 4) Any two r e a c t i o n s o f t h e same R B - t y p e b e l o n g t o t h e same R l - t y p e , i f t h o s e e n t r i e s o f t h e B E - m a t r i c e s whose e n t r i e s a r e a f f e c t e d by t h e R - m a t r i x r e f e r t o t h e same c o m p o n e n t s o f t h e a t o m v e c t o r . 5) R 2 - , R 3 - j . . . t y p e s may b e d e f i n e d i n a n a l o g y t o t h e R l - t y p e s by f u r t h e r c l a s s i f y i n g a c c o r d i n g t o t h e second, t h i r d e t c . sphere of n e i g h b o r i n g atoms. T h i s h i e r a r c h i c c l a s s i f i c a t i o n o f c h e m i c a l r e a c t i o n s by t h e i r R- a n d B E - m a t r i c e s may n o t o n l y s e r v e a s a means o f f o r m a l o r d e r i n g o f r e a c t i o n s and as a b a s i s o f d o c u m e n t a t i o n s y s t e m s , b u t can a l s o s e r v e as a d e v i c e in the systematic c o m p u t e r - a s s i s t e d deductive search f o r new c h e m i c a l r e a c t i o n s , by a n a l g o r i t h m w h i c h f i n d s all o f t h e m a t h e m a t i c a l l y and c h e m i c a l l y f i t t i n g p a i r s ( Β , E) o f B E - m a t r i c e s f o r a r e p r e s e n t a t i o n R - m a t r i x o f an R - c a t e g o r y .
G e o m e t r i c and G r o u p - T h e o r e t i c a l t u t i o n a l Chemistry
5.
Aspects
of
Consti-
The g e o m e t r i c and g r o u p t h e o r e t i c a l a s p e c t s o f t h e B E and R - m a t r i c e s a r e i m p o r t a n t f o r t h e s o l u t i o n o f c h e m i c a l problems. 1)
. in will
S i n c e these a s p e c t s have been d i s c u s s e d elsewhere d e t a i l , a b r i e f o u t l i n e of the e s s e n t i a l f e a t u r e s suffice here.
The B E - a n d t h e R - m a t r i c e s b e l o n g t o S ( n ) , t h e additive group of a l l η χ η symmetric m a t r i c e s w i t h i n t e g r a l e n t r i e s . L e t P . . be a n η χ η m a t r i x i n w h i c h a l l entries
are
zero,
except
a
"one"
in
the
υ ί
ϋ Ι ί
= 1,
S(n)
whose r a n k
(i,
j)-position.
Then {
is
P
ij
+
1
V' "^'
a basis
of
n(n
the -
η
}
group
1)
ρ
+ η = n(n
2
is
+ 1) 2
We d e n o t e
by
points
an e u c l i d e a n
of
..., η}
Z
s
the
group
whose
elements
s-dimensional
are
the
space R , s
lattice
the
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
46
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
group o p e r a t i o n b e i n g v e c t o r a d d i t i o n . Then, Z , i s a f r e e a b e l i a n group of rank s w i t h ( 1 , 0 . . . 0 ) , ( 0 , 1 , 0 . . . 0 ) ( 0 , . . . , 0 , 1 ) a s a b a s i s . Two f r e e a b e l i a n g r o u p s a r e i s o m o r p h i c i f , a n d o n l y i f , t h e y h a v e t h e same rank. Therefore, Z a n d S ( n ) a r e i s o m o r p h i c . The i m b e d d i n g o f zs i n R i s a geometric i n t e r p r e t a t i o n of S(n). s
s
s
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
The B E - m a t r i c e s a r e c o n t a i n e d i n B ( n ) < S ( n ) , t h e s e t o f a l l η χ η symmetric m a t r i c e s w i t h n o n - n e g a t i v e integral e n t r i e s . The R - m a t r i c e s b e l o n g t o R ( n ) S ( n ) , t h e g r o u p o f a l l η χ η s y m m e t r i c i n t e g r a l m a t r i c e s whose sum o f entries is zero. In the mapping π : Β ( η ) R , π | B(n) | i s v i s u a l i z e d as a cone i n R w i t h t h e v e r t e x a t t h e o r i g i n , and π | R ( η ) | l i e s on a l i n e a r s u b s p a c e g o i n g t h r o u g h t h e o r i g i n and h a v i n g no o t h e r p o i n t i n common w i t h π | Β ( η ) | . s
s
The
mapping P:B(n) b
matrices dean The of
)}
2 1
...,
yields
an FIEM i n
R
n 2
b
l
n
;
b
2 1
,
t> , n l
an i m b e d d i n g o f ,
the
BE-
an n - d i m e n s i o n a l
eucli-
2
space. entries
a vector
nates of
Β of
-> { ( b
of
the
b..
of
Β c a n be
in
R
, or
a point
BE-matrix
n
P(B)
c o n s i d e r e d as t h e
also
in
R
n 2
as t h e
.
cartesian
We c a l l
P(B)
the
component? coordi BE-point
B. 2
n S i m i l a r l y , an R - m a t r i x R r e p r e s e n t s a v e c t o r i n R . A c h e m i c a l r e a c t i o n w h i c h i s r e p r e s e n t e d by t h e t r a n s f o r m a t i o n Β + R = Ε c a n be g e o m e t r i c a l l y interpreted by t h e v e c t o r R f r o m t h e B E - p o i n t P(B) t o t h e B E - p o i n t P(E). The
sum o f D(B,E)
the = g
absolute |b±J
-
values e..|
=
of
the
entries
of
R,
£.|r..|
i s e q u a l t o t w i c e t h e number o f v a l e n c e e l e c t r o n s t h a t p a r t i c i p a t e i n t h e r e a c t i o n EM(B) E M ( E ) . We c a l l D ( B , E ) t h e c h e m i c a l d i s t a n c e between P(B) and P ( E ) , o r EM(B) and E M ( Ε ) , r e s p e c t i v e l y . Note t h a t c h e m i c a l distances r e f e r to R , and t o R . n
s
The o r i g i n o f t h e c o o r d i n a t e s y s t e m i n R corresponds t o t h e z e r o m a t r i x . S i n c e t h e sum o f e n t r i e s S = r i j i j > n
b
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT E T A L .
Computer
Programs
for Chemical
Problems
47
t h e number o f v a l e n c e e l e c t r o n s , i s t h e same f o r a l l B E - m a t r i c e s o f an F I E M , a n d F I E M i s mapped o n t o a l a t t i c e of points with non-negative i n t e g r a l coordinates in R , l y i n g on a s e g m e n t o f a " h y p e r s h e r e s u r f a c e " whose r a d i u s i s S = D ( B , 0 ) . n
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
T h i s r e m a r k a b l e t o p o l o g i c a l o r d e r o f t h e FIEM i s not only t h e o r e t i c a l l y a p p e a l i n g , but a l s o d i d a c t i c a l l y u s e f u l , because t h i s order r e v e a l s w e l l - d e f i n e d u n i v e r s a l l o g i c a l s t r u c t u r e s i n t h e immense w e a l t h o f individual facts. In p a r t i c u l a r , the o r d e r p r o v i d e s a b a s i s f o r t h e computer-oriented r e p r e s e n t a t i o n of molecular systems, and a l l o w s t o f o r m u l a t e c h e m i c a l f a c t s and p r o b l e m s i n a manner w h i c h i s w e l l - s u i t e d f o r t h e i r m a n i p u l a t i o n by c o m p u t e r s . 6.
MATCHEM
The m a t h e m a t i c a l m o d e l o f c o n s t i t u t i o n a l c h e m i s t r y which has been d e s c r i b e d i n the p r e c e e d i n g s e c t i o n s can be u s e d as a b a s i s f o r a m o d u l a r s y s t e m o f c o m p u t e r programs f o r the d e d u c t i v e s o l u t i o n of c h e m i c a l problems. I n i t i a l l y , t h e m a t h e m a t i c a l m o d e l was u t i l i z e d f o r C I C L O P S ^ , a p i l o t program f o r s y n t h e t i c d e s i g n . The i n s i g h t t h a t t h e m a t h e m a t i c a l m o d e l may s e r v e f o r the c o m p u t e r - a s s i s t e d s o l u t i o n o f a wide v a r i e t y of c h e m i c a l p r o b l e m s l e d t o t h e d e s i g n o f MATCHEM, a m o d u l a r s y s t e m o f c o m p u t e r p r o g r a m s , whose i n d i v i d u a l p a r t s may b e c o m b i n e d i n d i f f e r e n t ways t o s u i t d i f f e r e n t p u r p o s e s . The o r i g i n a l s y n t h e t i c d e s i g n p r o g r a m C I C L O P S was m o d i f i e d v i a t h e i n t e r m e d i a t e s t a g e MATSYN t o y i e l d t h e p r e s e n t s y n t h e t i c d e s i g n p r o g r a m EROS ( E l a b o r a t i o n o f R e a c t i o n s f o r O r g a n i c S y n t h e s i s ) w h i c h s e r v e s a s i m i l a r p u r p o s e as SECS ) and t h e o t h e r r e a c t i o n l i b r a r y o r i e n t e d s y n t h e t i c d e s i g n programs LHASA \ SYNC HEM °) , e t c . I n c o n t r a s t t o t h e p i l o t p r o g r a m C I C L O P S , i n EROS t h e m o d u l a r t y p e s t r u c t u r e i s emphasized more, i n o r d e r to enable the c o m b i n a t i o n w i t h o t h e r p a r t s o f t h e MATCHEM s y s t e m . 6
7
T h e s y n t h e t i c d e s i g n p r o g r a m EROS a n d i t s p r e d e c e s s o r s g e n e r a t e a t r e e o f B E - m a t r i c e s s t a r t i n g f r o m one B E matrix which r e f e r s to the s y n t h e t i c t a r g e t . This t r e e may b e i n t e r p r e t e d a s a t r e e o f s y n t h e t i c p a t h w a y s . If, however, the i n i t i a l m a t r i x does not r e p r e s e n t a
American Chemfeaf Society Library 1155 16th St. N. W. In Computer-Assisted Organic Synthesis; Wipke, W., el al.; Washington, D. C. 20031 ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
48
ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
s y n t h e t i c t a r g e t and i t s b y - p r o d u c t s , but a g i v e n c h e m i c a l compound i n c o m b i n a t i o n w i t h a l i s t o f p o t e n t i a l r e a c t a n t s , the generated t r e e of B E - m a t r i c e s can be i n t e r p r e t e d i n t e r m s o f s e q u e n c e s o f p r o d u c t s o b t a i n a b l e f r o m t h e i n i t i a l r e a c t a n t s . In t h i s f o r m t h e p r o g r a m f i n d s new u s e s f o r c h e m i c a l c o m p o u n d s , e . g . industrial by-products, or permits the p r e d i c t i o n of t h e c o n c e i v a b l e f a t e o f a c h e m i c a l compound i n t h e environment. The m a t h e m a t i c a l m o d e l a f f o r d s a l s o e n t i r e l y different a p p r o a c h e s t o c h e m i c a l p r o b l e m s . I f t h e i n i t i a l and f i n a l e n s e m b l e o f m o l e c u l e s , EM(B) a n d E M ( Ε ) , o f a chemical r e a c t i o n , or a sequence of r e a c t i o n s are k n o w n , o r a t t a i n e d by a n e d u c a t e d g u e s s , s u c h a s a comparison of substructures ( s e e s e c t i o n 5.1.3) t h e d i f f e r e n c e of the c o r r e s p o n d i n g B E - m a t r i c e s E-B = R c a n be a n a l y z e d t o y i e l d t h e p a t h w a y s o f individual s t e p s w h i c h l e a d f r o m EM(B) t o E M ( E ) . T h e s e p a t h w a y s may be s e q u e n c e s o f i n t e r m e d i a t e s i n a r e a c t i o n mechanism, or a l s o sequences of s y n t h e t i c inter m e d i a t e s . The p r e c o n d i t i o n f o r t h i s a p p r o a c h i s t h a t t h e i n d i c e s o f t h e a t o m s i n EM(B) a n d E M ( Ε ) a r e a p p r o p r i a t e l y a s s i g n e d . The r e q u i r e d a s s i g n m e n t o f a t o m i c i n d i c e s i s d i s c u s s e d i n s e c t i o n 5.3.1). A l l o f t h e s e a p p l i c a t i o n s o f the m a t h e m a t i c a l m o d e l , as w e l l as t h e s e a r c h o f B E - m a t r i x p a i r s ( Β , E) w h i c h f i t a g i v e n R - m a t r i x w i l l be c o n t a i n e d i n MATCKEM. I n t h e now f o l l o w i n g s e c t i o n s some r e c e n t l y p a r t s o f MATCHEM w i l l b e p r e s e n t e d .
6.1
Manipulation
6.1.1 C a n o n i c a l
of
implemented
BE-Matrices
Ordering
A p r e c o n d i t i o n f o r an e f f i c i e n t m a n i p u l a t i o n o f B E m a t r i c e s i n the computer i s a c a n o n i c a l i n d e x i n g o f t h e atoms i n a m o l e c u l e . In o r d e r t o g e n e r a t e a u n i q u e n u m b e r i n g we u s e t h e c o n n e c t i v i t y m a t r i x o f t h e m o l e c u l a r g r a p h and t h e l a b e l s a l r e a d y a s s i g n e d t o its v e r t i c e s , i . e . the c h e m i c a l symbols. An atom k i s c o n s i d e r e d t h e j - t h n e i g h b o r o f i i f j i s t h e m i n i m a l number o f b o n d s w h i c h s e p a r a t e i a n d k. All a t o m s k w h i c h meet t h i s c o n d i t i o n a r e c a l l e d t h e j - t h n e i g h b o r h o o d o f i . The z e r o t h n e i g h b o r h o o d o f i i s a t o m i i t s e l f . We d e f i n e a n a t o m i c d e s c r i p t o r
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
as
BRANDT E T A L .
Computer
Programs
for Chemical
Problems
49
follows
11 a 1 2 a., is J neighborhood
where
descending
the
...|a22..•a31l 32 a
atomic
and a l l
order.
The
a.,
number for
of
the
operator
11
atom k i n same j
are
[" means
the
j-th
put
in
concatenation.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
The d . a r e c o n s t r u c t e d i n s u c h a manner t h a t a l l j - t h n e i g h b o r h o o d s a r e a l i g n e d . T h i s i s a c h i e v e d by i n s e r t i n g dummy n e i g h b o r s w i t h a t o m i c number o f z e r o i f necessary. T h e s e q u e n c i n g o f a t o m s i s d o n e by p u t t i n g t h e a t o m i c d e s c r i p t o r s i n d e s c e n d i n g o r d e r . Atoms w h i c h c a n n o t b e a s s i g n e d a u n i q u e number by t h i s p r o c e d u r e a r e c o n s t i t u t i o n a l l y e q u i v a l e n t . A f i n a l numbering of the a t o m s i s r e a c h e d by i n t r o d u c i n g a r b i t r a r y information. One a t o m i s p i c k e d o u t o f t h e h i g h e s t r a n k i n g g r o u p o f c o n s t i t u t i o n a l l y e q u i v a l e n t atoms and i t s a t o m i c number s e t h i g h e r t h a n t h o s e o f t h e o t h e r a t o m s . The a t o m i c d e s c r i p t o r s a r e m o d i f i e d a c c o r d i n g l y and s o r t e d a g a i n . The g r o u p o f c o n s t i t u t i o n a l l y e q u i v a l e n t atoms i s thus s p l i t up. If a unique numbering i s s t i l l not p o s s i b l e the m o d i f i c a t i o n of the atomic d e s c r i p t o r s i s r e v e r s e d a n d a n o t h e r a t o m o u t o f t h e now h i g h e s t r a n k i n g g r o u p o f e q u i v a l e n t atoms i s p i c k e d , and t h e p r o c e d u r e j u s t d e s c r i b e d i s r e p e a t e d u n t i l a l l atoms a r e a s s i g n e d a u n i q u e number. I t s h o u l d be n o t e d t h a t i t d o e s n o t m a t t e r w h i c h a t o m i s p i c k e d o u t o f a g r o u p o f a t o m s w h i c h c a n n o t be a s s i g n e d a u n i q u e number i n t h e f i r s t p l a c e b e c a u s e t h e s e atoms a r e i n f a c t c o n s t i t u t i o n a l l y equivalent. I n t h i s a s p e c t o u r new i n d e x i n g r o u t i n e d i f f e r s f r o m o u r p r e v i o u s a p p r o a c h ^ . Our new p r o c e d u r e d o e s n o t have t h e drawbacks o f o t h e r methods ) known, like i n s t a b i l i t y of the c a l c u l a t i o n s , o s c i l l a t o r y behaviour, n o n c o n v e r g e n c e and i n d e t e r m i n a n c y . 9
6.1.2
Direct
Access
to
Structures
and
Substructures
The B E - m a t r i x , t o g e t h e r w i t h i t s a s s o c i a t e d v e c t o r s (atoms, s t e r e o c h e m i c a l p a r i t y b i t s ) c o n t a i n s a l l s t r u c t u r a l i n f o r m a t i o n about a m o l e c u l e . In a v a r i e t y o f a p p l i c a t i o n s , s u c h i n f o r m a t i o n has t o be s t o r e d and
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
50
COMPUTER-ASSISTED
a c c e s s e d r a t h e r f r e q u e n t l y . Examples c a t i o n s are documentation systems or r e a c t i o n graphs (trees or networks).
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
6.1.2.1
Inter-Machine
ORGANIC SYNTHESIS
of such a p p l i the a n a l y s i s of
Manipulation
T h e s t o r a g e r e q u i r e m e n t s o f t h e f u l l B E - m a t r i x make i t r a t h e r u n s u i t a b l e f o r d i r e c t s t o r a g e and r e t r i e v a l . C o m p r e s s i o n c a n b e a c h i e v e d i n two s t a g e s , d e p e n d i n g upon t h e a p p l i c a t i o n . F o r d a t a exchange between d i f f e r e n t computer systems (with p o s s i b l y different c o d e s a n d f i l e s t r u c t u r e s ) , c o m p r e s s i o n down t o t h e l e v e l of characters (purely numerical) i s a useful c o m p r o m i s e . The c o m p r e s s i o n m e t h o d i s straightforward i n t h a t i t f i r s t a s s i g n s e a c h s t r u c t u r e ( i . e . an i n t a c t m o l e c u l e or a m o l e c u l a r fragment) a unique number, w h i c h may b e d e r i v e d f r o m some s o r t o f r e g i s t r y number, o r may b e g e n e r a t e d by t h e s y s t e m i t s e l f , i n w h i c h c a s e i t u s u a l l y w i l l c o n t a i n some g r a p h i n f o r m a t i o n , s u c h a s n o d e number o r l i n k p o i n t e r . The i n f o r m a t i o n t h a t i s c o n t a i n e d i n t h e B E - m a t r i x and its a s s o c i a t e d atom v e c t o r i s c o m p r e s s e d t o t h e mininum number o f b y t e s w h i c h i s n e c e s s a r y f o r t h e e x p e c t e d range of values f o r each of these items. Such a d a t a format a l l o w s r a t h e r e c o n o m i c a l machine i n d e p e n d e n t p a c k i n g and u n p a c k i n g o f B E - m a t r i c e s and i s u s e d f o r d a t a t r a n s f e r on s e q u e n t i a l l y o r g a n i z e d , byte oriented media.
6.1.2.2
Intra-Machine
Manipulation
I n t e r n a l h a n d l i n g o f l a r g e amounts o f s t r u c t u r a l d a t a p u t s more s e v e r e r e q u i r e m e n t s o n d a t a f o r m a t a n d o n d a t a o r g a n i s a t i o n ( i n t h i s p a r a g r a p h , use o f random a c c e s s m e d i a , s u c h a s c o r e memory b a n k s - p o s s i b l y w i t h page o r b l o c k o r g a n i s a t i o n - o r m u l t i - t r a c k d i s c s is assumed). S t r u c t u r a l i n f o r m a t i o n c a n be - a n d i n f a c t i s f u r t h e r c o m p r e s s e d by a . ) r e p l a c i n g s e q u e n c e s o f z e r o e n t r i e s by p o i n t e r s , b . ) r e d u c i n g t h e s t o r a g e s p a c e f o r t h e r e m a i n i n g e n t r i e s (atom s y m b o l : 7 b i t , d i a g o n a l elements: 4 b i t , o f f d i a g o n a l upper t r i a n g l e : 2 b i t ) .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT E T A L .
Computer
Programs
for Chemical
51
Problems
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
F u r t h e r , the f i l e s c o n t a i n r e f e r e n c e s of d i f f e r e n t o r i g i n t o o t h e r e n t r i e s i n t h a t f i l e . These appear i n the form of n u m e r i c a l p o i n t e r s and, i f improperly o r g a n i z e d , c a n u s e up l a r g e s t o r a g e s p a c e . We d e v e l o p e d a f i l e o r g a n i z a t i o n to minimize storage requirement w h i c h makes u s e o f r e l a t i v e p o i n t e r s a n d p o i n t e r length c l a s s e s . Furthermore, short access paths from p o i n t i n g element (son) to p o i n t e d - a t element ( f a t h e r ) , are p o s t u l a t e d , i . e . minimization of core page-switching, o r d i s k h e a d movement respectively. I t h a s b e e n f o u n d t h a t t h e a b o v e two p o s t u l a t e s l e a d t o a d a t a o r g a n i z a t i o n , whereby sons a r e a s s i g n e d s t o r a g e i n t h e v i c i n i t y o f t h e i r f a t h e r . The a d d r e s s t o be a s s i g n e d i s d e r i v e d f r o m t h e d a t a t h e m s e l v e s by a s u i t a b l e h a s h - f u n c t i o n , which a s s i g n s linearly i n c r e a s i n g d i s t a n c e s f o r cases without c o n f l i c t , (addressed storage n o n - o c c u p i e d ) , but switches to f u n c t i o n s o f s e c o n d d e g r e e when c o n f l i c t s ( o v e r f l o w s ) a r e t o be h a n d l e d . S i n c e t h e r e a r e o n l y m o d e r a t e c o m p u t i n g r e q u i r e m e n t s , and s i n c e a c c e s s t i m e s on t h e storage media determines p r o c e s s i n g speed, a r a t h e r moderate computer i s needed f o r h a n d l i n g f a i r l y large a m o u n t s o f d a t a . We a r e a t p r e s e n t i n v e s t i g a t i n g t h e behaviour of a system with s e v e r a l thousand s t r u c t u r e s and t h e i r s u b s t r u c t u r e s ( s e e 5·1-3) on a " m i n i " c o m p u t e r o f t h e w e l l - k n o w n PDP11 f a m i l y t h a t i s equipped w i t h moving head d i s k packs t h a t have a s t o r a g e c a p a c i t y o f a b o u t two mega b y t e e a c h .
6.1.3
Hierarchically
Organized
Substructure
Files
I n a g r e a t many a p p l i c a t i o n s , a g i v e n s t r u c t u r e i s t r e a t e d as a combine o f i t s s u b s t r u c t u r e s . D e p e n d i n g on t h e c o n t e x t , s u b s t r u c t u r e s a p p e a r u n d e r t h e name of " f u n c t i o n a l group", "fragment", " r e a c t i o n i n v a r i a n t " "building block" etc. T y p i c a l a p p l i c a t i o n s are optimization strategies in synthesis planning, avoidance o f p r o h i b i t e d c o m b i n a t i o n s o f f u n c t i o n a l groups in generating chemical r e a c t i o n s , structure-activity correlations etc. Computer g e n e r a t i o n , s t o r a g e and r e t r i e v a l o f s u b s t r u c t u r e s i s g r e a t l y f a c i l i t a t e d , when t h e m a t h e m a t i c a l model of s t r u c t u r a l c h e m i s t r y i s employed to g e n e r a t e a h i e r a r c h i c a l l y organized substructure f i l e in a s y s t e m a t i c f a s h i o n . Such h i e r a r c h i c a l o r d e r i n g i s not only a p r e r e q u i s i t e i n avoiding d u p l i c a t i o n of
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
52
COMPUTER-ASSISTED
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
i n f o r m a t i o n and m i s s i n g l i n k s , i t f o r s e t t i n g up d a t a o r g a n i s a t i o n s previous chapter (5.1.2).
is as
ORGANIC SYNTHESIS
also indispensable d e s c r i b e d i n the
The a p p r o a c h t a k e n h e r e t r e a t s t h e o f f d i a g o n a l e n t r i e s o f a B E - m a t r i x as an o r d e r e d s e t o f numbers. E a c h e l e m e n t i n t u r n i s l o w e r e d by o n e , t h e r e s u l t i n g m a t r i x s e p a r a t e d i n t o b l o c k s and t h e d a t a f i l e i s t e s t e d f o r the presence of each r e s u l t i n g b l o c k . If the t e s t i s p o s i t i v e , then only a p o i n t e r to the g e n e r a t i n g s t r u c t u r e ( " f a t h e r " ) i s a d d e d t o t h e f i l e , a n d no f u r t h e r f r a g m e n t a t i o n i s n e e d e d , s i n c e a l l s u c c e s s o r s must b e i n the f i l e . If the t e s t i s n e g a t i v e , the son i s e n t e r e d i n t o t h e f i l e and i t s f a t h e r (and b r o t h e r if a n y ) a r e p u s h e d on a s t a c k f o r f u r t h e r t r e a t m e n t . E a c h e l e m e n t o f t h e s t a c k i s t r e a t e d i n t h e same way (thus becoming the f a t h e r of the next g e n e r a t i o n ) a f t e r the f i r s t son i s p r o c e s s e d . T h i s d e p t h - f i r s t fragmentation y i e l d s d a t a - f i l e s that f u l f i l the requirement of the p r e v i o u s c h a p t e r , i n t h a t sons a p p e a r as c l o s e as p o s s i b l e to t h e i r f a t h e r i n the g e n e r a t i o n p r o c e s s , thereby l e a d i n g to minimal pointer lengths. With such a f i l e s t r u c t u r e q u e r i e s take the form p a t h s w i t h i n a d i r e c t e d g r a p h and t h e r e f o r e a r e p r o c e s s e d w i t h a minimum o f c o m p u t i n g e f f o r t a n d extremely short response time.
6.2
Operating
6.2.1
The
with
Synthetic
of
R-Matrices Design
Program
EROS
Our p r e v i o u s s t u d i e s i n d e v e l o p i n g C I C L O P S , t h e p r o t o t y p e o f a s y n t h e t i c d e s i g n p r o g r a m , h a v e now l e d t o a new s t a g e . We a r e now p r e s e n t i n g EROS ( E l a b o r a t i o n o f R e a c t i o n s f o r O r g a n i c S y n t h e s i s ) which has matured to the p o i n t of becoming a r o u t i n e t o o l f o r the s y n t h e t i c chemist. A number o f o p t i o n s a l l o w s t h e u s e r t o d i r e c t t h e p r o g r a m t o meet h i s s p e c i f i c p r o b l e m s a n d n e e d s . Rather than t a k i n g the complete set of a l l mathematically possible r e a c t i o n matrices a s e l e c t i o n of only three R - m a t r i c e s i s c o n t a i n e d i n the program. These t h r e e R - c a t e g o r i e s i n c l u d e t h e m a j o r i t y o f a l l known s y n t h e t i c r e a c t i o n s and i t i s b e l i e v e d t h a t most o f t h e r e a c t i o n s w h i c h w i l l be d i s c o v e r e d i n t h e f u t u r e w i l l a l s o f a l l i n t h e s e t h r e e R - c a t e g o r i e s . So t h e a d v a n t a g e s o f t h e
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT
E T
Computer
AL.
Programs
for Chemical
53
Problems
m a t h e m a t i c a l model o f p r o v i d i n g a l s o u n p r e c e d e n t e d r e a c t i o n s i s l a r g e l y r e t a i n e d . On t h e o t h e r h a n d s u b s t a n t i a l r e d u c t i o n of the output of u n l i k e l y intermediates i s achieved.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
EROS c a n b e a p p l i e d t o t h e s t u d y o f c h e m i c a l r e a c t i o n s i n two b a s i c a l l y d i f f e r e n t w a y s : One c a n i n p u t several m o l e c u l e s and g e n e r a t e t h e p r o d u c t s t o be e x p e c t e d i n t h e r e a c t i o n s o f t h e s e m o l e c u l e s . Or o n e c a n l o o k f o r all s y n t h e t i c r e a c t i o n s l e a d t o a t a r g e t compound w h i c h is input. R e a c t i o n s mechanism c a n be s t u d i e d by a l l o w i n g e l e c t r o n i c c o n f i g u r a t i o n s f o r d i s t i n c t atoms.
certain
The r e a c t i o n s i t e i s d e t e r m i n e d by s t a t i n g w h i c h b o n d s a r e b r e a k a b l e . T h e s e c a n be f o u n d by a s t a n d a r d r o u t i n e o r p r e s e l e c t e d b y t h e u s e r . F o r e a c h r e a c t i o n t h e energy i s c a l c u l a t e d u s i n g parameters obtained from thermoc h e m i c a l d a t a - ' ) . By d e f i n i n g e n e r g y l i m i t s , r e a c t i o n s can be r e j e c t e d on thermodynamic grounds. 1
6.2.2
0
Predictor
1 1
Systems
F r o m t h e p r e v i o u s c h a p t e r i t b e c o m e s o b v i o u s t h a t Rm a t r i c e s , when c o m b i n e d w i t h s u i t a b l e s e l e c t i o n r u l e s , are a general t o o l f o r p r e d i c t i n g chemical r e a c t i o n products. I n some a p p l i c a t i o n s , c h e m i c a l s e l e c t i o n r u l e s may b e l e s s s t r i n g e n t b e c a u s e some o t h e r e x t r a n e o u s selection a l l o w s f o r f u r t h e r r e d u c t i o n o f output to manageable size. One g r o u p o f a p p l i c a t i o n s a r e p r e d i c t o r s y s t e m s . S u c h systems have, i n g e n e r a l , a c c e s s t o s t r u c t u r e f i l e s , i . e . f i l e s of i n t a c t molecules or molecular fragments o f t h e n a t u r e d e s c r i b e d i n c h a p t e r s 5.1.2) and 5.1.3) whose e n t r i e s a r e s e l e c t e d u n d e r a g i v e n a s p e c t . A s p e c t s may b e t o x i c i t y , p h a r m a c e u t i c a l a c t i v i t y , environm e n t a l i m p a c t , a v a i l a b i l i t y as an unwanted by-product i n an i n d u s t r i a l environment e t c . In such cases query e n t r i e s a r e p r o c e s s e d through r e a c t i o n g e n e r a t o r s . T h e s e may b e t h e c o m p l e t e s e t o f R - c a t e g o r i e s , b u t o f c o u r s e a s u b s e t o f t h o s e c a n be s e l e c t e d i f t h e a p p l i c a t i o n j u s t i f i e s i t . Output o f t h e r e a c t i o n g e n e r a t o r s i s t h e n c h e c k e d a g a i n s t structure f i l e s and matches a r e o u t p u t t o t h e u s e r .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
54
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
One s u c h s y s t e m i s a t p r e s e n t b e i n g i m p l e m e n t e d u n d e r a r e s e a r c h c o n t r a c t w i t h a p u b l i c agency w i t h the purpose of d e t e c t i n g sources of environmentally active c h e m i c a l s among c h e m i c a l s r e g a r d e d a s h a r m l e s s , themselves. A l t h o u g h an e s s e n t i a l p a r t o f s u c h s y s t e m s , the r e a c t i o n g e n e r a t o r s , due t o t h e i r a l g e b r a i c n a t u r e , t a k e up o n l y a n e g l i g i b l e amount o f t h e t o t a l c o m p u t i n g e f f o r t . It has t h e r e f o r e been r e m a r k e d , not w i t h o u t j u s t i f i c a t i o n , t h a t i n many a p p l i c a t i o n s p r e d i c t o r s y s t e m s a p p e a r t o t h e u s e r a s a mere e n h a n c e m e n t o f the c a p a b i l i t i e s of a documentation system. It should be s t r e s s e d , h o w e v e r , t h a t d o c u m e n t a t i o n s y s t e m s w i l l be a b l e t o s u p p o r t p r e d i c t o r m o d u l e s o n l y i f t h e y m a i n t a i n the complete s t r u c t u r a l i n f o r m a t i o n i n a form t h a t i s c o m p a t i b l e w i t h , o r c o n v e r t i b l e t o an a l g e b r a i c a l l y d e f i n e d B E - m a t r i x . D e s i g n , or r e d e s i g n of systems should take t h i s f a c t i n t o a c c o u n t .
6·3
Generation
and C l a s s i f i c a t i o n o f
6.3.1
Determination
of
the
R-Matrices
Minimal Chemical
Distance
L e t EM(B) and EM(E) be t h e i n i t i a l and f i n a l e n s e m b l e of molecules of a c h e m i c a l r e a c t i o n or sequence of r e a c t i o n . This chemical conversion determines i n a c h a r a c t e r i s t i c manner a c o r r e l a t i o n o f t h e atoms i n EM(E) a n d E M ( B ) . We c o n j e c t u r e t h a t c h e m i c a l r e a c t i o n s p r o c e e d i n s u c h a m a n n e r t h a t a minimum o f v a l e n c e e l e c t r o n s i s r e l o c a t e d . This i s c r i t e r i o n f o r the c o r r e l a t i o n o f t h e atoms o f EM(E) t o t h e atoms o f E M ( B ) . We c a l l t h e number o f v a l e n c e e l e c t r o n s w h i c h must b e r e l o c a t e d d u r i n g t h e t r a n s f o r m a t i o n f r o m E M ( B ) t o EM(E) w i t h a g i v e n c o r r e l a t i o n o f atoms t h e c h e m i c a l d i s t a n c e D ( B , E) between t h e s e EM. M a t h e m a t i c a l l y the chemical d i s t a n c e D(B, f o r m u l a t e d i n t h e f o l l o w i n g way Let
Then
(b..)
be
the
BE-matrix
of
A
(e..)
be
the
BE-matrix
of
Z.
E)
c a n be
and
i> j = l
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT
ET
Computer
AL.
Programs
for Chemical
Problems
55
F o r t h e c h e m i c a l l y m e a n i n g f u l r e p r e s e n t a t i o n E M ( B ) •> EM(E) t h e a t o m s o f E M ( B ) must b e a s s i g n e d t o a t o m s o f EM(E) i n s u c h a manner t h a t t h e c h e m i c a l d i s t a n c e D has i t s minimal value. This 1)
problem
several
ways
E x h a u s t i v e e n u m e r a t i o n : a l l a t o m s w i t h t h e same a t o m i c number a r e p e r m u t e d a n d t h e p e r m u t a t i o n corresponding to the minimal chemical d i s t a n c e i s taken. Let
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
c a n be a t t a c k e d i n
n
EM(E) atoms
2
atomic (n !)
number
0^.
k
method
of
atomic
,..(n !)
2
this
consist of
is
n^ a t o m s
number Then
of
0^,
atomic
. . .,
one h a s
to
carry
permutations.
Since
suitable
for
only
number
n^. a t o m s
10!
very
out
0^,
of (n^!)«
= 3 628
800
small
molecules. 2)
The c r i t e r i o n f o r m i n i m i z a t i o n o f t h e c h e m i c a l d i s t a n c e c a n be p a r t i t i o n e d i n t o t h e t h r e e c o m p l e mentary c r i t e r i a ( i ) , ( i i ) and ( i i i ) . (i) Atoms w i t h t h e same a t o m i c number n u m b e r s f r o m t h e same s e t o f n u m b e r s .
Example
In
I
(i)
I:
H^-C^N:.
^2
suffices
1
to
+
are
given
the
H-N^C:
3 12
minimize
the
chemical distance.
(ii) The atoms a r e a s s i g n e d i n s u c h a manner t h a t , f o r a t o m s w h i c h a r e a s s i g n e d t o e a c h o t h e r , a s many spheres o f n e i g h b o r s as p o s s i b l e c o r r e s p o n d . T h i s c r i t e r i o n f o l l o w s from the assumption, t h a t r e a c t i o n e a c h a t o m w i l l t r y t o m a i n t a i n a s many s p h e r e s o f n e i g h b o r s as p o s s i b l e .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
in
a
56
COMPUTER-ASSISTED
Example
ORGANIC SYNTHESIS
II:
H—
H—CjpH H.
- C l . + H - 0 H -* H - ^ C — C ^ — 0 - H H-^C 1 2 H / 5 |3 2 |3 H• ^ 5 H àr H H - Ç - H o
C
o
+ H-Cl.
1
6
H
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
H
In II ( i ) s u f f i c e s t o a s s i g n 0 and C l ; but f o r t h e assignment of the carbon-atoms ( i i ) i s necessary. (iii)
If there are c o n s t i t u t i o n a l l y symmetric molecules i n EM(B) a n d ( o r E M ( Ε ) ) t h e n a n a t o m A i n E M ( B ) c a n be a s s i g n e d t o more t h a n o n e a t o m Z ^ , · · · * Zk i n E M ( E ) e v e n i f one t a k e s ( i i ) i n t o a c c o u n t . T h e n t h e r e e x i s t two p o s s i b i l i t i e s : l 2 )
1) to to
A i s t h e f i r s t atom i n EM(B) w h i c h s h a l l be a s s i g n e d an atom i n E M ( E ) . Then A can be a s s i g n e d a r b i t r a r i l y one o f t h e a t o m s Ζ ^ , . . . Ζ ^ . .
2) One h a s a l r e a d y a s s i g n e d a t o m s Aj_ -> Z±9...9 A -> T h e n A c a n b e a s s i g n e d t o Z^ i n s u c h a way t h a t t h e
Z .
e
atoms to
Α ^ , - . , , Α
A as
the
are
atoms
Z,
in
the ,Z
same s p h e r e s e
to
In by
neighbors
Z..
T h i s m e t h o d w i l l be i t e r a t e d u n t i l neighbors correspond to each other Example
of
e
a l l spheres of as good as p o s s i b l e ,
III:
example III one c a n f i n d criterion (iii).
the
optimal
assignment
only
I n t h e c o m p u t e r p r o g r a m w h i c h we h a v e d e v e l o p e d f o r t h e m i n i m i z a t i o n o f t h e c h e m i c a l d i s t a n c e we u s e a l g o r i t h m s f r o m t h e f i e l d o f o p e r a t i o n s r e s e a r c h known as " o p t i m a l a s s i g n m e n t " - m e t h o d s . ^ 13
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2.
BRANDT
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
6.3.2
ET
AL.
Computer
Documentation
Programs
Systems
for Chemical
for
Problems
Chemical
57
Reactions
E n c o d i n g and r e t r i e v a l p r o c e d u r e s f o r c h e m i c a l s t r u c t u r e s h a v e b e e n d e v i s e d a n d u s e d s i n c e t h e t i m e s when s t r u c t u r a l o r g a n i c c h e m i s t r y became known. Although t h e y had t o be t a i l o r e d t o t h e t e c h n i c a l n a t u r e o f t h e d a t a c a r r y i n g m e d i a - be i t s p o k e n o r p r i n t e d w o r d s , manually processed card f i l e s , s e q u e n t i a l l y processed punch h o l e cards or magnetic t a p e s , - they a l l f u l f i l l e d t h e i r p u r p o s e a t t h e i r t i m e . The s i t u a t i o n i s r a t h e r d i f f e r e n t f o r c h e m i c a l r e a c t i o n s . We s t i l l r e l y o n s u c h a l c h e m i s t i c means a s n a m i n g - b y - a u t h o r or citing-by m a j o r - p r o d u c t e t c . T h i s i s symptomatic f o r our s t a t e of k n o w l e d g e . A p p a r e n t l y we h a v e no s y s t e m a t i c way o f d e s c r i b i n g r e a c t i o n s u n t i l now. (The n e e d , o f c o u r s e , was l e s s p r e s s i n g , s i n c e t h e number o f r e a c t i o n s i s much s m a l l e r t h a n t h e number o f s t r u c t u r e s . ) A p p r o a c h e s t h a t use s t a r t i n g and/or f i n a l p r o d u c t s as a v e h i c l e f o r d e s c r i b i n g a n d c l a s s i f y i n g r e a c t i o n s must b e regarded u n s u i t a b l e i n the l i g h t of our mathematical m o d e l : A r e a c t i o n i s a p a t h b e t w e e n p o i n t s on t h e s u b s p a c e , t h a t i s d e f i n e d by a n F I E M . A l t h o u g h a n i n d i v i d u a l p a t h may be d e s c r i b e d by i t s s t a r t i n g , f i n a l ( a n d p o s s i b l y some i n t e r m e d i a t e p o i n t s ) e q u i v a l e n c e c l a s s e s o f s u c h p a t h s c a n n o t be a d e q u a t e l y d e s c r i b e d and c l a s s i f i e d t h a t way. What we n e e d i s a d e s c r i p t i o n o f the path i t s e l f . An R - m a t r i x i s s u c h a d e s c r i p t i o n . I n i t s most g e n e r a l f o r m i t c o n t a i n s no r e f e r e n c e t o a p a r t i c u l a r F I E M . Therefore ( a s e x p l a i n e d i n s e c t i o n 4.3) e a c h i r r e d u c i b l e R - m a t r i x d e f i n e s a n R - c a t e g o r y . The m i n i m a l r e f e r e n c e to a s p e c i f i c FIEM, t h e n , i s through those elements of t h e atom v e c t o r , t h a t c o r r e s p o n d t o t h e rows and columns i n t h e i r r e d u c i b l e R - m a t r i x , i n o t h e r words to the nonzero e n t r i e s i n the f u l l R - m a t r i x . Increasing d e g r e e s o f s p e c i f i c i t y a r e o b t a i n e d by a r e f e r e n c e f r o m t h i s f i r s t s e t o f atoms ("the r e a c t i o n c o r e " ) t o t h e i r corresponding elements i n the BE-matrix, that i n t u r n e n l a r g e t h e atom v e c t o r ( " f i r s t s p h e r e " ) e t c . A h i e r a r c h y i s thereby a c h i e v e d , that w i l l lead to a r e a c t i o n f i l e s t r u c t u r e , where e a c h e n t r y i s r e f e r e n c e d t h r o u g h t h e v a r i o u s l e v e l s o f s p e c i f i c i t y up t o t h e R - c a t e g o r i e s . Q u e r i e s c a n be e n t e r e d a t any d e g r e e o f generality.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
58
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
In a p r a c t i c a l system, a c a n o n i c a l o r d e r i n g has t o be imposed onto R - m a t r i c e s , which i s most e a s i l y a c h i e v e d by s e l e c t i n g t h e l e x i c o g r a p h i c a l l y s m a l l e s t i r r e d u c i b l e R-matrix among t h e n! e q u i v a l e n t ones. L e x i c o g r a p h i c a l p r i o r i t y i s given to the d i a g o n a l elements. T h i s , a c t u a l l y , i s an a r b i t r a r y measure, but i t i s j u s t i f i e d on t h e ground, t h a t non-zero d i a g o n a l elements cause changes i n v a l e n c e s t a t e o f the a f f e c t e d atom, thereby earmarking i t as a key atom i n t h e r e a c t i o n c o r e . Remaining symmetries i n R - m a t r i c e s , t h e n , a r e r e s o l v e d on t h e f i r s t occurence o f d i f f e r e n t i a t i o n i n the course of i n c r e a s i n g s p e c i f i c i t y . I t i s worth remarking t h a t the d i f f e r e n t R - c a t e g o r i e s govern w i d e l y d i f f e r e n t p o p u l a t i o n s o f known r e a c t i o n s . One obvious example i s t h e simple f o u r c e n t e r r e a c t i o n of t h e k i n d A-B + C-D = A-C + B-D, which i s i n c i d e n t a l l y r e p r e s e n t e d by t h e s i m p l e s t R-matrix i n c l o s e d s h e l l c h e m i s t r y t h a t has a zero d i a g o n a l . We have s t a r t e d t h e d e s i g n o f a model program t o e x t r a c t i r r e d u c i b l e R-matrices from known r e a c t i o n s , t o put them i n t o c a n o n i c a l o r d e r , and t o s e t t h e sequence o f references i n a h i e r a r c h i c a l f i l e , that i s organized i n a s i m i l a r way as s u b s t r u c t u r e f i l e s d e s c r i b e d i n (5.1.3).
References 1.
J . D u g u n d j i , I . U g i , T o p i c s C u r r . Chem. 39, 19 (1973) see a l s o : I . U g i , P. Gillespie, Angew. Chem. 83, 980, 982 (1971), I b i d . I n t e r n a t . Edit. 10, 914, 915 (1971).
2.
I . U g i , P.D. Gillespie, C. Acad. Sci. 34, 4 l 6 (1972).
3.
J . Blair, J . Gasteiger, C. Gillespie, P. Gillespie, I . U g i in "Computer R e p r e s e n t a t i o n and M a n i p u l a t i o n o f Chemical I n f o r m a t i o n " , Ed. W.T. Wipke, S. H e l l e r R. Feldmann, E. Hyde, W i l e y , N.Y. (1974).
4.
J . B l a i r , J . Gasteiger, C. Gillespie, P.D. G i l l e s p i e I . U g i , T e t r a h e d r o n 30, 1845 (1974).
Gillespie,
Trans.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
N.Y.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002
2. BRANT ET AL. Computer
Programs
for Chemical
Problems
59
5.
I . U g i , J. G a s t e i g e r , J. B r a n d t , J . F . B r u n n e r t , W. S c h u b e r t , IBM-Nachr. 24, 185 (1974) I. U g i , IBM-Nachr. 24, 180 (1974).
6.
W.T. Wipke, T.M. D y o t t , J. Amer. Chem. Soc. 96, 4825 (1974), and earlier r e f e r e n c e s .
7.
E . J . Corey, W.J. Howe, H.W. O r f , D.A. Pensak, G. P e t e r s s o n , J. Amer. Chem. Soc. 97, 6116 (1975), and earlier references.
8.
H. G e l e r n t e r , N.S. S r i d h a r a n , H.J. H a r t , S.C. Yen, F.W. F o w l e r , H.J. Shue, T o p i c s C u r r . Chem. 41, 113 (1973).
9.
M. R a n d i ć , J o u r n . o f Chem. Inform. and Comp. Sci., V o l . 15, No. 2, 1975 (105).
10. S.W. Benson, Thermochemical Kinetics, W i l e y , N.Y. (1968), S.W. Benson e t al. Chem. Rev. 69, 279 (1969). 11. T.L. Allen, J . Chem. Phys. 31, 1039 (1959), A . J . K a l b , A.L.M. Chung, T.L. Allen, J . Amer. Chem. Soc. 88, 2938 (1966). 12. J . G a s t e i g e r , P. Gillespie, D. Marquarding, T o p i c s C u r r . Chem. 48, 1 (1974).
I. Ugi,
13. R.E. B u r k a r d , Methoden d e r g a n z z a h l i g e n Optimierung, S p r i n g e r - V e r l a g , Wien-New York (1972).
Acknowledgements
We acknowledge g r a t e f u l l y or work by Deutsche Fonds d e r Chemischen
the f i n a n c i a l
support o f
F o r s c h u n g s g e m e i n s c h a f t and Industrie.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3 An Organic Chemist's View of Formal Languages H. W. WHITLOCK, JR.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
Dept. of Chemistry, University of Wisconsin, Madison, Wisc. 53706
I t seems g e n e r a l l y recognized t h a t , except f o r the most difficult of cases, the i n t r o d u c t i o n of mathema tics tends to obscure problems r a t h e r than make t h e i r s o l u t i o n e a s i e r . Be that asitmay we are q u i t e intri gued by the r e l a t i o n between formal languages and chemical s t r u c t u r e s , particularly when c o n s i d e r i n g the area of computerization of organic s y n t h e s i s . The pur pose of this paper is to o u t l i n e some of our thoughts on the above s u b j e c t , showing that one can apply common f a c t s derived from language theory to questions of chemical i n t e r e s t . For the chemist readers we will de fine languages and grammars. We will then discuss a number of "theorems" of organic chemistry (formal proof will not be attempted). F i n a l l y , we will point out that a c e r t a i n subset of organic s y n t h e s i s , the Func tional Group Switching Problem, i s amenable t o attack by viewing it as a problem i n the context of formal languages. Molecules as S t r i n g s of Symbols As we will see, the tenets of language theory assume that one is d e a l i n g with s t r i n g s of symbols: one dimensional l i n e a r assays. I f we wish t o analyze molecules according to t h i s theory it behooves us t o i n q u i r e as to what extent we may consider molecules t o be l i n e a r entities. We immediately note that mole cules, in particular the set of all s t r u c t u r e s con t a i n i n g r i n g s , are i n h e r e n t l y nonlinear in nature. On the other hand we recognize that we can represent any t h i n g i n a l i n e a r manner. I f we make the r a t h e r i n t e r e s t i n g equation of r e p r e s e n t a t i o n w i t h s t r u c t u r e then it f o l l o w s that s t r u c t u r e s , as represented, may be l i n e a r . Now one can c a r r y this as far as one likes but it seems t o the author that f o r most purposes one 60 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHITLOCK
Organic
Chemist's
View
of Formal
61
Language
should r e s t r i c t ones a t t e n t i o n to l i n e a r representa t i o n s that make a reasonable amount of i n t u i t i v e chemi c a l sense. For t h i s reason we w i l l not consider the case of c y c l i c s t r u c t u r e s . What types of molecules can be n a t u r a l l y represen ted i n a l i n e a r manner? C l e a r l y s t r a i g h t - c h a i n e d s t r u ctures can. The representation of n-hexane, CH^CHgCHgCI^CHgCHg,
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
i s i n f a c t what one t h i n k s of when considering t h i s molecule (representation equals s t r u c t u r e ) . In p a r t i c u l a r t h i s " s t r u c t u r e " i s i n t u i t i v e l y a s t r i n g of s i x symbols, namely
S i m i l a r l y e t h y l crotonate i s CH CH=CHCOOCH CH . This example p o i n t s up two d i f f e r e n c e s between a l i n e a r notation of a s t r u c t u r e and the a c t u a l s t r u c ture i t s e l f . The above i s a s t r i n g of symbols so we have to s p e c i f y what symbols are involved. P o s s i b i l i t i e s f o r e t h y l crotonate are: CH ,CH,=,CH,CO,OCH ,CH ; CH=CH,COO,CH ,CH ; and CH ,CH=CHCO,0,CH CH . Since i n general one w i l l attach meanings to the symbols i n volved, these d i f f e r e n t representations may have d i f ferent meanings. For example, the l a s t above might be described as s e q u e n t i a l l y a methyl, an αβ-unsaturated carbonyl, a dicoordinate oxygen, and an e t h y l . This i s a p e r f e c t l y good d e f i n i t i o n of t h i s molecule and, as a s t r i n g , i s somewhat more informative than a mere t a b u l a t i o n of the i n d i v i d u a l p a r t s . The second major d i f f e r e n c e between l i n e a r n o t a t i o n and a c t u a l s t r u c tures l i e s i n the observation that s t r i n g s have an i n herent ordering from l e f t to r i g h t while s t r u c t u r e s do not. This leads to a many i n t o one mapping of ordinary l i n e notations i n t o s t r u c t u r e s . Having seen that the ordinary l i n e n o t a t i o n of unbranched s t r u c t u r e s may be more or l e s s equated with the s t r u c t u r e i t s e l f we next turn to the case of branched s t r u c t u r e s . Just as an unbranched s t r u c t u r e corresponds to a s t r i n g of symbols, a branched s t r u c ture has as i t s counterpart a t r e e . The nodes of the t r e e are part s t r u c t u r e symbols and the edges are bonds.* Thus 2,2,4-trimethylhexane may be represented by a number of t r e e s , one of which i s 3
2
3
2
3
3
• M u l t i p l e bonds may be represented example as a n o n s t r u c t u r a l node.
3
2
2
3
s e v e r a l ways, f o r
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3
62
COMPUTER-ASSISTED
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
CH
3
CH
3
CH
ORGANIC
SYNTHESIS
3
One can define " g l o b a l " part s t r u c t u r e s as was done above f o r e t h y l crotonate and one notes that again there w i l l be many tree representations per s t r u c t u r e . The point of t h i s , of course, i s t h a t , as i s w e l l knownQ), trees ( i n p a r t i c u l a r binary t r e e s ) may be r e presented as l i s t s . The above tree has a l i s t represen t a t i o n , CH (C(CH3)(CH3)CH3)CHCCH2CH )CH3. Now t h i s doesn't look too appealing t o the chemists t r a i n e d eye but CH C(CH3)CCH3)CH2CHCCH3)CH CH does. This i s a l i s t representation of the tree 2
3
3
2
3
and i s s u s p i c i o u s l y close t o our usual l i n e notation of branched s t r u c t u r e s . We w i l l show below that the set of a l l a c y c l i c s t r u c t u r e s , wherein by " s t r u c t u r e " we mean that i n t u i t i v e l y w e l l defined s t r u c t u r a l n o t a t i o n used by organic chemists, comprises a context f r e e language. But f i r s t we must define the concept of languages and grammars. Grammars and Languages The f o l l o w i n g i s a very b r i e f i n t r o d u c t i o n t o the subject. We r e s t r i c t ourselves t o those aspects that are d i r e c t l y r e l a t e d t o the problem of applying the theory of languages to organic chemistry. For a more complete i n t r o d u c t i o n the reader i s r e f e r r e d to a num ber of e x c e l l e n t texts.(2-5) As conceived by Chomsky(6) the f o l l o w i n g are the
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
Organic
WHiTLOCK
Chemist's
View
of Formal
Language
63
c e n t r a l aspects of t h i s subject. Symbols. There i s some (normally f i n i t e ) set of symbols from which s t r i n g s (sentences) are made of. This set (Vx, the t e r m i n a l vocabulary) would be {CH , CH } f o r the unbranched alkanes above and would be {CH , CH ,CH,C,(,)} f o r the l i n e a r representation of branched alkanes. As a grammar embodies the concept of d e r i v a t i o n of some sentence i n a language there i s a l s o defined a set of symbols ( V N , nonterminal vocabulary). These are used i n d e r i v a t i o n s but do not appear i n the f i n a l sen tences of the language defined by the grammar of i n t e r est. The b a s i c act of d e r i v a t i o n i n v o l v e s replacement of a nonterminal symbol i n a s t r i n g by a s t r i n g . For example the s t r i n g CH CH(R)R may be turned i n t o the s t r i n g CH CH(CH R)R by r e p l a c i n g the nonterminal R by the s t r i n g CH R. On the other hand i t might be changed i n t o CH CH(CH )R by r e p l a c i n g R by CH , de pending on what our r u l e s are f o r e f f e c t i n g these changes. F i n a l l y , there i s some unique member of V J J , the " s t a r t " symbol, from which a l l sentences may be derived. 3
2
3
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
2
3
3
2
2
3
3
3
Productions. A production i s j u s t a r u l e f o r making the above changes. Replacement of R by CH R i s symbolized by R •> CH R; replacement of R by CH , by R •> CH . As c o n v e n t i o n a l l y t r e a t e d a p p l i c a t i o n of pro ductions i s permissive i n the sense that there are no r u l e s s t a t i n g what production of some set must be a p p l i e d to a given s t r i n g . The problem of determining what s e r i e s of productions w i l l turn the unique s t a r t symbol S i n t o some s p e c i f i e d sentence then becomes an o c c a s i o n a l l y i n t r i c a t e puzzle. The general form of a production i s αΧβ + αω$, where X i s some nonterminal symbol and a, 3, and ω are a r b i t r a r y s t r i n g s . Produc t i o n s of t h i s form w i t h no r e s t r i c t i o n s on a, 3, and ω are of type 0. Those with the r e s t r i c t i o n that ω not be the empty symbol are of type 1. An equivalent statement i s that a type 1 production may be of the form αγ3 αω3, length (ω) >_ length γ, γ being an a r b i t r a r y s t r i n g c o n t a i n i n g at l e a s t one nonterminal sym b o l . The presence of the context α and 3 leads to the term context s e n s i t i v e i n d e s c r i b i n g these productions. Productions of the Form X •> ω are of type 2. The absence of a context, α and 3 leads to the term context f r e e f o r t h i s type of production. F i n a l l y , the sim p l e s t type of production, type 3 or r e g u l a r , i s r e s t r i c t e d to be of e i t h e r the form X + a (X i n V N , a i n V ) or X ·> aY (X and Y i n V N , a i n V ) . 2
2
3
3
T
T
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
64
Note that a l l context s e n s i t i v e productions are of type 0, a l l context f r e e productions are context s e n s i t i v e , and a l l r e g u l a r productions are context f r e e . Note a l s o the d i r e c t analogy between productions as de f i n e d above and chemical r e a c t i o n s . The context s e n s i t i v e production CH=CH CHOH -> CH=CH CO i s e q u a l l y viewed as a r e a c t i o n . A p p l i c a t i o n of t h i s production t o the s t r i n g GH CHOHCH CH2CH=CHCHOHCH3 may produce the new s t r i n g CH CHOHCH CH CH=CHOOCH3 but not CH COCH CH CH= CHCH0HCH . C l e a r l y i f s t r u c t u r e s are equated w i t h s t r i n g s , chemical r e a c t i o n s have as t h e i r counterpart productions. The term "context s e n s i t i v e " has very s i m i l a r meanings i n both cases. Along these l i n e s the production CH(OCH ) -> CHO i s of type 0 as i t leads t o a decrease i n the length of the s t r i n g . * The context free production CHO •> CH(R)0H, corresponds t o Grignard a d d i t i o n t o an aldehyde, while examples of r e g u l a r pro ductions are CH 0H •> CH Br, CHOH -> CO, e t c . Just as type 0 productions lead t o a r i c h e r language the analogous chemical r e a c t i o n s lead t o a r i c h e r more com plex chemistry as we go from the simple r e g u l a r "func t i o n a l group s w i t c h i n g " r e a c t i o n s t o those i n v o l v i n g b l o c k i n g and deblocking r e a c t i o n s . 3
2
3
2
2
3
2
2
3
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3
2
2
3
Grammar. A grammar i s j u s t a defined set of non t e r m i n a l and t e r m i n a l symbols, a s p e c i f i e d member of V N (the s t a r t symbol) and a set of productions. Viewing r e a c t i o n s as productions we may define a chemistry as a set of molecular p a r t s , a s p e c i f i e d s t a r t i n g m a t e r i a l , and a set of r e a c t i o n s . This assumes that the chemis t r y can be thrown i n t o the proper grammatical form as discussed above. Languages. For a defined grammar, i t s attendant language i s the set of a l l s t r i n g s (over V T ) that can be generated by repeated a p p l i c a t i o n of the grammar's pro ductions, s t a r t i n g w i t h the s t a r t symbol. Pursuing our analogy between grammars and s y n t h e s i s , the langu age defined by some chemistry (chemical grammar) i s the set of a l l molecules that can be synthesized from the s p e c i f i e d s t a r t i n g m a t e r i a l by repeated a p p l i c a t i o n of the r e a c t i o n s — a language of s y n t h e s i z a b l e s t r u c t u r e s f o r that chemistry. Just as grammars may be of type 0, 1, 2 or 3 according t o the most complex type of pro duction present, a language i s of type 0 i f s p e c i f i e d by a type 0 grammar, e t c . Note that while a grammar i s •Assuming our symbols are CH, 0CH , (,), 2, CHO. 3
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3.
Organic
WHITLOCK
Chemist's
View
of Formal
65
Language
an exact (although sometimes opaque) d e f i n i t i o n of a language, a language does not i n general s p e c i f y a un ique grammar. Chemical questions which are a d i r e c t t r a n s l i t e r a t i o n of t h e i r corresponding language theory counterparts are: Given two chemical grammars: do they define the same language of s y n t h e s i z a b l e s t r u c tures. Given a p a r t i c u l a r chemical grammar (chemistry) of say type 0, i n v o l v i n g various blocking-deblocking sequences, i s there a simpler chemistry that defines the same language of s y n t h e s i z a b l e s t r u c t u r e s . Given a chemistry and a p a r t i c u l a r molecule i s the molecule a member of the chemistry's language (can i t be synthe s i z e d ? ) . Since t h i s membership question becomes more and more complicated as we go from type 3 to type 0 languages, can we place some sort of upper and lower l i m i t s on the complexity of t h i s problem w i t h i n the context of language types? These questions w i l l be dealt w i t h below. Examples 1) Grammar 1 V = {CH , CH , CEU} V = {ALKANE, R} S t a r t symbol = ALKANE Productions: ALKANE •> CH* ALKANE + CH R R + CH R -> CH R T
3
2
N
3
3
2
Pl.l PI.2 PI.3 PI.4
This i s an example of a s t r u c t u r a l grammar. The productions correspond to r u l e s f o r generating n-alkane s t r u c t u r e s rather than to chemical r e a c t i o n s . The language s p e c i f i e d by t h i s grammar i s the set of a l l nalkanes. D e r i v a t i o n of η-butane i s achieved t h u s l y : P1
2
P1
ALKANE ' >
4
P 1
CH R ' >
4
CH CH R ' )
3
3
2
P1
3
'>
CI^CHgCHgR CH CH CH CH 3
2
2
3
The sentence CH CH(CH )CH i s not i n t h i s language^ nor are CH CH OH or CH R ( t h i s l a t t e r i s a s e n t e n t i a l form). Since a l l productions are of type 3 (X + a or X •> aY) the set of a l l alkanes comprises a r e g u l a r language. The membership question f o r r e g u l a r langu ages i s exceedingly simple. From the r e g u l a r grammar above we may construct the algorithm or "machine" shown i n Figure 1. Rules f o r c o n s t r u c t i n g machines such as t h i s from r e g u l a r grammars are described elsewhere.(7) One s t a r t s i n the s t a r t s t a t e and makes s t a t e 3
3
2
3
3
3
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
66
Figure 1. Finite language defined ALKANE.
ORGANIC SYNTHESIS
state machine for recognizing members of the by grammar 1. The start state is that labelled The accept state is that one labelled F.
t r a n s i t i o n s as one reads the s t r i n g of i n t e r e s t from l e f t t o r i g h t . I f one i s i n the accept s t a t e when no more symbols are l e f t the s t r i n g i s a member of the language. I f not, not. Note that the machine requires only a f i n i t e amount of memory; hence the term f i n i t e s t a t e machine. 2) GRAMMAR 2 V = {CH , CH , OH, MgBr, Br} V = {S, OH, Br, MgBr} S t a r t State = S Productions: S CH OH OH ·* Br Br + MgBr MgBr CH OH T
3
2
N
3
2
P2.1 P2.2 P2.3 P2.4
This i s a chemical grammar, the productions correspond ing t o r e a c t i o n s . * The language i s the set of a l l 1a l k a n o l s , 1-alkylmagnesium bromides and 1-bromoalkanes. *The reader w i l l note that V and V»r are not d i s j o i n t as they are supposed t o be. This was done f o r c l a r i t y ' s sake as t h i s grammar can be e a s i l y r e w r i t t e n t o con form t o the standard d e f i n i t i o n . N
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
Organic
WHiTLOCK
Chemist's
View
of Formal
Language
67
Although the productions are not of a l l of the r e g u l a r form X a or X •> aY i t i s e a s i l y r e w r i t t e n to conform to t h i s format. The language i s thus a regular langu age and a f i n i t e s t a t e machine f o r recognizing members of the language i s e a s i l y constructed. Note that a d e r i v a t i o n of a sentence corresponds d i r e c t l y to i t s synthesis. D e r i v a t i o n of e t h y l bromide proceeds as: 2
1
S Ρ · ; CH OH
P
2
3
'
2
A
P 2
3
P 2
4
CH^Br ' > CH^MgBr ' > CH^CHgOH
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
P
2
'
2
j
CH CH Br 3
2
Theorem. The set of a l l wellformed a c y c l i c s t r u c tures comprises a d e t e r m i n i s t i c context free language. We have defined above what we mean by "structures," the ordinary l i n e n o t a t i o n used by organic chemists wherein we r e l y on a s t r i n g representation with branch ing i n d i c a t e d by p a r e n t h e s i z a t i o n . The reader w i l l r e c a l l that context free grammars allow more complicated nested productions, e.g. S •> aSb than are allowed f o r by r e g u l a r productions. Consider grammar 3 below: GRAMMAR 3 V = {CH , CH , CH, V = {S, R} S t a r t Symbol = S Productions: S S •> R ·> R + R + T
3
2
CH*,
(,)}
N
CH^ CH R CH CH R CH(R) 3
3
2
P3.1 P3.2 P3.3 P3.4 P3.5
This i s j u s t Grammar 1 with the a d d i t i o n of production P3.5. This production i s not regular but i s context f r e e . I t i s t h i s production that allows us to generate a r b i t r a r i l y branched s t r u c t u r e s such as CH CH(CH(CH CH )CH )CH . That the language defined by t h i s grammar i s not regular ( i . e . cannot be generated by a regular grammar) f o l l o w s from i t s being homomorphic a l l y equivalent with the set of balanced parentheses. The r e c o g n i t i o n problem f o r context f r e e languages i s i n h e r e n t l y more d i f f i c u l t than that f o r regular l a n guages i n that one needs u n l i m i t e d memory ( e s s e n t i a l l y for the reason i n t h i s case of remembering what branch one i s c u r r e n t l y examining). A complete grammar that handles a large subset of a c y c l i c s t r u c t u r e s i s pre sented i n Figure 2. The language i s not regular but i s context f r e e , as can be v e r i f i e d by observing the form of the productions. This grammar i s the b a s i s of a d e t e r m i n i s t i c algorithm f o r parsing ( i . e . recogniz3
2
3
3
3
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC
68
V
T
SYNTHESIS
= {CH CH 0 HC0 H HCN CC> CO CH OH COOH 4
2
2
2
3
CHO CN C l Br HO HOgC OHC NC CH Ο 2
0 C H C CH HC C = 2
V
N
2
Ξ
( ) N}
= {S LMV RMV CHN DB RDB TB RTB SM SRP SLP DV LDV RDV TV LTV QV}
S t a r t Symbol:
S
PRODUCTIONS:
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
Structure S -> SM / SLP RMV / (LMV)2 TV RMV / (LMV)3 QV RMV / LDV(RMV)2 / LDV RBD / TLV RTB / LTV(RMV3 / QV(RMV)4 / (LMV3 TV / (LMV)4 QV / (LMV)2 QV RDB L e f t Monovalent LMV -> SLP / SLP CHN / (LMV)2 TV CHN / (LMV)2 TV / (LMV)3 QV / (LMV)3 QV CHN / (LMV)2 QV DB/ LDV DB / LTV TB Right Monovalent RMV + SRP / DV RMV / TV(RMV)RMV) / TV(RMV)2 / TV RDB / QV(RMV)2 RMV / QV(RMV)3 / QV RTB / QV(RMV)RDB / (CHN)N RMV / QV(RMV)(RMV)RMV / QV(RDB)RMV Continued Figure
2a.
Context Free structural grammar for computer input of linear notation via teletype. Meanings of nonterminal symbols are as above.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
structure
3.
Organic
WHITLOCK
Chemist's
View
of Formal
Language
69
Chaining CHN -> DV / DV CHN / TV(RMV) / TV(RMV)CHN / QV TB / QV(RMV)(RMV) / QV(RMV)(RMV)CHN / QV(RMV)2 CHN / TV DB / QV(RMV)DB Double Bond DB -> TV / = TV CHN / = QV(RMV) / = QB(RMV)CHN / = QV DB Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
Right Double Bond RDB •> RDV / « TV RMV / - QV(RMV)RMV) / = QV(RMV)2 / = QV RDB T r i p l e Bond TB -> RTB +
Ξ
QV /
Ξ
AV CHN
Ξ
TV /
Ξ
QV RMV
SM + C H 4 / CH 0 / HC0 2 H / HCN / C 0 2
2
/ CO
SRP + C H 3 / OH / C0 2 H / CHO / CN / CI / Br SLP -> C H 3 / HO / H0 2 C / OHC / NC / CI / Br DV + C H
2
LDV + C H
2
RDV TV
/ Ο / CO / C 0 2 / 0 2 C /
H2C
C H 2 / Ο / CO CH
LTV •> CH / HC QV -> C Figure
2a. Continued
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
70
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Figure 2b.
Parse tree of (CHs),C=CHCH2OH, sentative terpenoid
a repre
ing) s t r u c t u r e s input i n t o a LISP organic synthesis program w r i t t e n by P. Blower (8). This grammar gener ates most common f u n c t i o n a l groups (those not included such as -N0 were l e f t out f o r nongrammatical reasons) and includes chaining, the representation of repeated subunits by the enclosing of them w i t h i n brackets. The representations of η-butane, C H 3 C H 2 C H 2 C H 3 , C H ( C H ) 2 C H , C 2 H 5 C 2 H 5 , C H 3 C H 2 C 2 H 5 , and others are a l l accepted by t h i s program, parsed, and converted i n t o i n t e r n a l r e presentations, p o i n t i n g up the f a c t that there i s a many i n t o o&e mapping of s t r u c t u r a l representations into structure. We have seen above a simple s y n t h e t i c grammar wherein the productions are d i r e c t l y derived from chemi c a l r e a c t i o n s and the r e s u l t i n g language of s y n t h e s i z able s t r u c t u r e s i s the set of a l l molecules that can be synthesized from some s p e c i f i e d precursor by a p p l i c a t i o n of the r e a c t i o n s . In the simple case of 1a l k a n o l s and 1-bromoalkanes the r e c o g n i t i o n problem i s t r i v i a l — w e can answer i t i n a time p r o p o r t i o n a l to the length of the molecule. We are n a t u r a l l y curious as to what i s the minimally complex grammar needed to 2
3
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
2
3
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3.
WHiTLOCK
Organic
Chemist's
View
of
Formal
Language
71
mimic organic s y n t h e s i s of the more c o n v e n t i o n a l l y complex type. This i s not a s i l l y question f o r the f o l l o w i n g two reasons. Although one can quibble as to the extent to which a c y c l i c molecules may be equated with s t r i n g s there i s no question that the s i m i l a r i t y i s marked. Moreover, at l e a s t from a complexity sense i t i s c l e a r that algorithms f o r r e c o g n i z i n g r e g u l a r s t r u c t u r e s are i n h e r e n t l y simpler than those f o r r e cognizing context f r e e ones. S i m i l a r l y the r e c o g n i t i o n of members of context s e n s i t i v e languages i s more d i f f i c u l t yet. This i s true regardless of whether one i s t a l k i n g about the r e c o g n i t i o n of w e l l formed or synthes i z a b l e s t r u c t u r e s , or whether one i s doing t h i s i n a formal mathematical sense or v i a computer programs. Secondly i t i s not obvious that the question of synthes i z a b i l i t y i s i n f a c t answerable at a l l f o r a l l w e l l defined organic molecules. By answerable we mean having a r e c o g n i t i o n procedure that w i l l terminate i n some (not n e c e s s a r i l y short) p e r i o d of time w i t h the answer yes or no. The set of context s e n s i t i v e l a n guages are recognized by the s o c a l l e d l i n e a r bounded automata and the question of membership of some s t r i n g i n a defined context s e n s i t i v e language i s known to be answerable. Type 0 languages on the other hand are r e cognized by Turing machines and the question of member ship i s not answerable f o r type 0 languages as a s e t , although i t may be f o r some subset. Thus i f we cannot develop cogent arguments f o r the s u f f i c i e n c y of con t e x t s e n s i t i v e languages as a model f o r oganic syn t h e s i s we are l e f t w i t h the p o s s i b i l i t y that organic synthesis cannot be "solved" by computer. No guaran tees though, since organic chemistry i s not a c l o s e d science and what may be an a c c e p t i b l e s y n t h e s i s under some circumstances w i l l be unacceptible under others. We f i r s t show by a counterexample that context f r e e languages are i n s u f f i c i e n t model f o r organic s y n t h e s i s . We then argue ( a l a s we cannot prove, f o r the above reasons) that context s e n s i t i v e languages are a suf f i c i e n t model. Theorem. There e x i s t s a language of s y n t h e s i z a b l e s t r u c t u r e s that i s not context f r e e . I t i s w e l l known that languages such as ww, where w i s some s t r i n g over V T , are not context f r e e . For example the set of a l l alkanes CH CH(Ri)R where Ri=R2 i s not a context f r e e language. The set of a l l t e r t i a r y a l c o h o l s derived by Grignard a d d i t i o n to methyl acetate, represented as CH COH(Ri)R , Ri=R i s thus not context f r e e . That i t i s context s e n s i t i v e f o l l o w s from c o n s t r u c t i o n of a context s e n s i t i v e grammar f o r 3
3
2
2
2
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
72
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
generating t h i s set (grammar 4, Figure 3). GRAMMAR 4 = {CH , CH, OH, C H , (,)} V N = {S, R, Δ, V, #, Χ, Y, Ζ, { W i | i c { C H , C H , CH (,)}}} S t a r t symbol = S S -* CH CHOH Δ V R # R -> CH X|CH R|CH(R)R i X + X i , i e { C H , C H , CH, (,)} RX + R XX -> X VX -> VY Yi iY YR -*• R YX + X Y# + Ζ iZ WiZi JW + WiJ, j e { C H , C H , CH, (,)} VW •> WiV AWi + Δ ί Δ - ( V -> ) Ζ Language = C H C H O H ( R i ) R , R =R
VT
3
2
3
2
3
3
2
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
3
±
2
3
2
±
3
2
x
2
Figure 3
Now i f we consider f u r t h e r the r e l a t i o n s h i p be tween d e r i v a t i o n of a sentence and i t s s y n t h e s i s , e s p e c i a l l y the r e l a t i o n s h i p between the intermediate s e n t e n t i a l forms and the precursor s t r u c t u r e s i n the syntheses we are l e d t o the f o l l o w i n g theorem. Theorem ( ? ) . For a chemistry derived from func t i o n a l group switching r e a c t i o n s , condensation reac t i o n s , demasking of masked f u n c t i o n a l groups, and b l o c k i n g and deblocking r e a c t i o n s : the language so de f i n e d i s a context s e n s i t i v e one ( i . e . there e x i s t s an equivalent context s e n s i t i v e grammar that generates the same language). The question of s y n t h e s i z a b i l i t y i s thus answerable. This f o l l o w s from the s o c a l l e d workspace theorem of context s e n s i t i v e languages.(9) The "proof" of t h i s simply e n t a i l s s t a t i n g i n f o r m a l l y the proof of the work space theorem, l e t t i n g s y n t h e t i c intermediates correspond t o i n d i v i d u a l steps i n the d e r i v a t i o n of the target molecule (sentence). 1) Consider a synthesis D of the form: S + Xο + X, 1 + '*' X„ η = target ° where S i s the s t a r t i n g m a t e r i a l (symbol), X
n
is a
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHiTLOCK
Organic
Chemist's
View
of
Formal
73
Language
sentence i n the language of synthesizable s t r u c t u r e s and X i X±+i i s some conversion. Assume moreover that there i s some estimate SIZE ( X i ) of the s i z e of each X i . We define the complexity C(X ,D) of the target X f o r t h i s p a r t i c u l a r synthesis to the max{SIZE(Xi), 0 <_ i £ n}, the s i z e of the l a r g e s t compound involved i n the synthesis. For a reverse salami s y n t h e s i s , C(Xn) i s j u s t the s i z e of X . I f the l a s t step of the syn t h e s i s involves a deblocking r e a c t i o n , C(X ,D) = SIZE ( X n - l ) . C(X ,D) i s thus a measure of the complexity of a p a r t i c u l a r synthesis (D) of a p a r t i c u l a r member X of the language. 2) Considering a l l p o s s i b l e syntheses of X , t h e workspace of X , WS(X ), i s min{C(X ,D )|D i s some synthesis of X }. WS(X ) i s thus a measure of the complexity of the l e a s t complex synthesis of X . While some syntheses may be a r b i t r a r i l y complex i t ' s the l e a s t complex synthesis that i s important. In p a r t i c u l a r the r a t i o WS(X )/SIZE(X ) i s a measure of the d i f f i c u l t y of the synthesis of X . 3) The workspace theorem simply s t a t e s that i f f o r some a r b i t r a r y type 0 grammar there i s some one number ρ such that WS(X) SIZE(X) n
n
n
n
n
n
n
n
n
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
n
n
m
m
n
n
n
n
n
p
f o r a l l X's i n the language, the language i s context s e n s i t i v e and thus there e x i s t s an equivalent context s e n s i t i v e grammar. The argument f o r the existence of some ρ i s as follows : a) I f we use f u n c t i o n a l group switching r e a c t i o n s as discussed below, ρ i s obviously u n i t y . b) Condensation r e a c t i o n s w i l l have a ρ greater than u n i t y . Consider f o r example f o r the s y n t h e t i c problem CH OH 3
>
—
CH3CH2COOH
Assume that the l e a s t complex route i s ^COOCHa CH3OH > CH Br > CH CH —» 3
3
CH3CH2COOH
\00CH3
SIZE
2
3
8
3
The r a t i o WS(X )/SIZE(X ) =2.7. However as the mole cules get l a r g e r , assuming that the condensing agents (e.g. malonic e s t e r ) stay constant i n s i z e , t h i s r a t i o n
n
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
74 decreases. CH3CH2OH
Thus f o r CH CH Br 3
2
—>
^COOCKU CH3CH2CH > CH (CH )2COOH \:000Η 9 4 3
2
3
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
SIZE
3
3
the r a t i o decreases to 2.25. For syntheses i n v o l v i n g condensation r e a c t i o n s i t would seem that there w i l l e x i s t some p. c) The same argument may be a p p l i e d to b l o c k i n g deblocking sequences and to operation i n v o l v i n g masked f u n c t i o n a l groups. I t seems i n e v i t i b l e that i f we con s i d e r b l o c k i n g groups to incrementally increase the complexity of some precursor the r a t i o WS(Xn)/SIZE(X ) must approach some number as s i z e ( X ) gets l a r g e r . In f a c t the only s i u t a t i o n wherein t h i s would not be the case would seem to be one wherein the complexity of some necessary b l o c k i n g group depended e x p o n e n t i a l l y on the complexity of the intermediate being blocked. This type of exponential b l o c k i n g group i s unknown to organic chemistry and indeed seems f o r e i g n to the very concept of i s o l a t e d and i n t e r a c t i n g f u n c t i o n a l groups. F u n c t i o n a l groups, even i n a relaxed d e f i n i t i o n are i n herently l o c a l a f f a i r s . d) Consideration of a number of published syntheses of complex n a t u r a l products suggests a ρ of approximately 1.8, a remarkably low number. The author f i n d s i t s t r i k i n g indeed that those syntheses i n v o l v i n g the s e l e c t i v e reagents c h a r a c t e r i s t i c of " s y n t h e t i c methods" chemistry so much i n the vogue l a t e l y seem to have a smaller workspace than those c a r r i e d out i n the grand t r a d i t i o n a l manner. We conclude that the workspace theorem i s probably v a l i d f o r organic s y n t h e s i s , although i t i s c e r t a i n l y true that the above a n a l y s i s ignores questions d e a l i n g w i t h the formal r e l a t i o n s h i p between s y n t h e t i c schemes and d e r i v a t i o n s . It would be nice to be able to w r i t e a program that would take as input two sets of chemical reactions^ the second being the f i r s t augmented with some new r e a gent, and give as output the answer to the question: "Does t h i s new s y n t h e t i c method allow us to do anything we couldn't do i n i t s absence?". This chemical c r i t i que program would be at l e a s t u s e f u l to e d i t o r s of chemical j o u r n a l s . We note a l a s that f o r the set of context s e n s i t i v e languages the problem of equivalence of two grammars i s not answerable. Thus the above undertaking would seem to be a dubious one. Now of n
n
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHiTLOCK
Organic
Chemist's
View
of Formal
Language
75
course the f a c t that the equivalence problem i s un answerable f o r the set of a l l context s e n s i t i v e l a n guages does not mean that i t i s so f o r some subset, f o r example augmenting a chemistry c o n t a i n i n g sodium hydroxide by a d d i t i o n of the reagent potassium hydr oxide. On the other hand we note that the equivalency problem i s not answerable f o r even the simpler set of context f r e e languages so i t seems u n l i k e l y that one could w r i t e a general program that would compare two chemistrys based on context s e n s i t i v e r e a c t i o n s and that would h a l t i n some f i n i t e time w i t h the e q u i valence answer.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
The F u n c t i o n a l Group Switching Problem. (10) One frequently has occasion when d e v i s i n g a s y n t h e t i c scheme t o adjust f u n c t i o n a l i t y i n a molecule i n a manner that i s only i n d i r e c t l y r e l a t e d t o the s y n t h e t i c problem at hand. For example i f one d e s i r e s the transformation
i t i s necessary t o block the ketone toward a c t i o n of the organometallic reagent employed. Bearing i n mind the assumptions b u i l t i n t o t h i s problem we may define a "molecule" as being simply an ordered set of func t i o n a l groups. Chemically t h i s i s equivalent t o the idea of a molecule s being a set of f u n c t i o n a l groups imbedded i n a s t a t i c molecular framework. D e f i n i t i o n of a r e a c t i o n as an ordered t r i p l e t , (reagent, precursor f u n c t i o n a l group, product func t i o n a l group) then corresponds t o the assumption that r e a c t i o n s only i n t e r c o n v e r t f u n c t i o n a l groups; they do not lead t o increments of the carbon skeleton, or i f they do, i t i s only t o f i n i t e and l i m i t e d degree. Now t h i s sounds l i k e a rather r e s t r i c t e d p i c t u r e of organic chemistry, and i t i s , but i t i s s u r p r i s i n g l y c l o s e to the way one t h i n k s about b l o c k i n g group i n t e r conversions. We can pose the f u n c t i o n a l group s w i t c h ing problem w i t h i n the context of t h i s s t r u c t u r a l no t a t i o n i n the f o l l o w i n g manner. What i s the shortest sequence of reagents that w i l l transform a defined s t a r t i n g m a t e r i a l S = ( S i , S 2 , * ' ' S ) , where S i i s the i t h f u n c t i o n a l group of compound S, i n t o the target Τ = (Ti,T2,···Τ ). The reagent sequence turns S i i n t o T i , S2 i n t o T2. etc. With t h i s d e f i n i t i o n of the f
n
η
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
76
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
f u n c t i o n a l group switching problem we are lead to the following. Theorem. The set of a l l sequences of reagents that a f f e c t s a s t a t e d f u n c t i o n a l group switching pro blem comprises a r e g u l a r language. The proof of t h i s theorem i s f a i r l y obvious and not too i n t e r e s t i n g . What i s i n t e r e s t i n g i s the con sequence of the theorem. The proof f o l l o w s from the r e c o g n i t i o n that our f u n c t i o n a l group r e a c t i o n d i c t i o n a r y i s a f i n i t e d i r e c t e d graph wherein the nodes are l a b e l l e d with f u n c t i o n a l groups and the edges with reagents. A small r e a c t i o n graph i s shown i n Figure 4. This i s of the same form as the f i n i t e s t a t e ma chine i n Figure 1 and i f we define a s t a r t s t a t e (e.g. CO) and an accept s t a t e (e.g. CHOAc) i t i s a f i n i t e s t a t e machine f o r r e c o g n i z i n g a l l sequences of reagents that w i l l turn a ketone i n t o a secondary ace t a t e . The reagent sequence (NaBH^ DHP H 0 NaOH A c 0 ) i s a member of the language so defined while (NaBHi* DHP 3
H0 3
C r 0 / p y Α ο 0 ) i s not. 3
2
2
The r e l a t i o n s h i p between r e
gular languages, r e g u l a r grammars, and f i n i t e s t a t e machines i s such that most i n t e r e s t i n g questions d e a l ing w i t h them are answerable. Whether a p a r t i c u l a r s t r i n g i s i n the language i s answerable ( i . e . does t h i s reagent sequence do the t r i c k ) . More i n t e r e s t i n g how ever i s the f o l l o w i n g which represents a general s o l u t i o n to the f u n c t i o n a l group switching problem. Our r e a c t i o n d i c t i o n a r y defines, f o r the f u n c t i o n a l group switching problem Sx •> Τχ (S^ and Τχ s i n g l e f u n c t i o n a l groups) a language of s u f f i c i e n t s y n t h e t i c sequencesL For a problem S2 T2 a language L i s defined. I f we want to convert the binary compound ( S i S 2 ) i n t o the compound (Τι T ) , the language f o r t h i s i s the i n t e r s e c t i o n of L i and L , i . e . those members common to L i and L are sequences that convert S i i n t o T i and S into T . I t i s known that the i n t e r s e c t i o n of two r e gular languages i s i t s e l f regular so the problem of f i n d i n g the shortest sequence of reagents f o r e f f e c t i n g ( S i S ) —^> ( T i T ) i s that of f i n d i n g the shortest s t r i n g i n a r e g u l a r language. The d e t a i l s of construc t i n g the algorithm are presented elsewhere ( 1 0 ) but t h i s p o i n t s up what the author f e e l s to be one of the r e a l l y p r e t t y aspects of language theory. The conven t i o n a l proof of the statement that the membership ques t i o n i s answerable f o r regular languages i s construc t i v e i n that a proveably c o r r e c t algorithm f o r doing so i s developed. One may then s t a r t from t h i s f a c t and develop more e f f i c i e n t ways of achieving t h i s end. We o f f e r as evidence that the l i n g u i s t i c approach to 2
2
2
2
2
2
2
2
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
WHITLOCK
Organic
Chemist's
View
of Formal
Language
a e{DHP, NABH^, Cr0 Py, Ac 0, NaOH} 3
2
b e{DHP, NaBH , Cr0 Py, Ac 0, H 0} 4
3
2
3
c είϋΗΡ, Cr0 py, Ac 0, HgO, NaOH} 3
2
Figure 4. A small reaction graph for the four functional groups CO (ketone), CHOH (secondary alcohol), CHOTHP (tetrahydropyranyl ether), and CHOAc (secondary acetate)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
78
ORGANIC SYNTHESIS
organic synthesis i s of more than j u s t i d l e i n t e r e s t by presenting i n Figure 5 some representative problems w i t h t h e i r s o l u t i o n s . These examples, although pre sented i n the rather disembodied order n-tuplet s t r u c t u r a l n o t a t i o n , c l e a r l y show that the r e s u l t s of t h i s approach represent nonobvious answers t o n o n t r i v i a l problems. I t would seem that the b a s i c feature of t h i s approach t o organic synthesis i s a p p l i c a b l e t o more complicated s t r u c t u r a l models as long as s e v e r a l condi t i o n s are met. F i r s t l y the s y n t h e t i c problem of i n t e r e s t must be capable of d i s s e c t i o n i n t o some inde pendent subproblems. This i s so because the s o l u t i o n procedure i n v o l v e s at l e a s t i m p l i c i t c o n s t r u c t i o n of the part s o l u t i o n s and working with t h e i r i n t e r s e c t i o n . Secondly the f i n i t e n e s s of the various sub-reaction d i c t i o n a r i e s i s important since one i s guided i n s o l v ing a problem by the exhaustive s o l u t i o n of subproblems For r e a l molecules r e a c t i o n d i c t i o n a r i e s are i n f i n i t e . A p p l i c a t i o n of ones chemists i n t i u t i o n suggests that only a f i n i t e part of a r e a c t i o n d i c t i o n a r y i s chemi c a l l y i n t e r e s t i n g however. While there i s an i n f i n i t e number of s t r u c t u r e s that can be involved as i n t e r mediates i n the conversion 1
only a small number are r e a l i s t i c i n nature when one has t h i s p a r t i c u l a r end i n mind. One problem of im mediate concern i s that of automating the making f i n i t e of the r e a c t i o n d i c t i o n a r y f o r f u n c t i o n a l i t i e s such as ketones that are destined f o r annulation. I n t e r a c t i o n of the computer w i t h the chemist i s c l e a r l y necessary i n t h i s respect. A r e l a t e d c o n d i t i o n deals w i t h the very s t a t e na ture of t h i s approach. The problem RCH=CH-COR—£-> RCH=CH-CHOHR i s not properly viewed as (CH=CH, CO) ^ (CH=CH, CHOH) since t h i s e n t a i l s the i n c o r r e c t assump t i o n that one i s d e a l i n g w i t h an i s o l a t e d double bond and ketone. The i n t e r s e c t i o n of the two r e a c t i o n d i c t i o n a r i e s would i n c o r r e c t l y have "no r e a c t i o n " f o r the reagent (CH )2CuLi. One thus must t r e a t i n t e r a c t i n g f u n c t i o n a l groups as l a r g e r e n t i t i e s . This i s an acceptable p r i c e t o pay except f o r two consequences. Reaction d i c t i o n a r i e s of aggregate f u n c t i o n a l groups can be very large and t h e i r generation by hand i s a tedious and time consuming a f f a i r . I t appears worth9
3
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
3.
WHITLOCK
Organic
Chemist's
View
of Formal
START:
[RCHOEE RCO COOH]
TARGET:
[RCHOEE RCO RCHOH]
SEQUENCE:
Language
79
( G l y c o l , EVE, R L i , NaBH^, Ac 0, H 0, EVE, 2
3
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
OH) START:
[CH OH RCO COOH]
TARGET:
[CH^OH RCO R COH]
2
2
SEQUENCE: (GLYCOL, R L i RMgX H 0) 3
START:
[CH OH CHgOAc COOMe CHgOEE]
TARGET:
[CHgOAc CHgOH COOMe CHgBr]
2
SEQUENCE: (Cr0 /py H 0 TsCl NaBr OH EVE NaBH^ ACgO 3
CH N 2
3
2
H 0) 3
START:
[CH OH CH OAc COOMe CHgOEE]
TARGET:
[CH OH CH OAc COOMe CHgBr]
2
2
2
2
SEQUENCE: (Cr0 /py HgO TsCl NaBr NaBH^) 3
Figure 5. Functional is the ethoxyethyl
Group Switching problems solved as in Ref. 10. ether of a secondary alcohol; EVE is ethylvinyl
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
RCHOEE ether.
80
COMPUTER-ASSISTED ORGANIC SYNTHESIS
while t o solve t h i s reaction dictionary generation pro blem by a c o m b i n a t i o n o f c h e m i s t and computer w i t h t h e c h e m i s t e x e r c i s i n g h i s judgement i n p a r i n g t h e growth of t h e r e a c t i o n d i c t i o n a r y and making r a t h e r d e l i c a t e v a l u e judgements on e x a c t l y what r e a c t i o n i s e x p e c t e d of some a g g r e g a t e f u n c t i o n a l group under some s e t o f reaction conditions.(11)
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003
Literature Cited
Parsing, Prentice
(1) McCarthy, J. Abrahams, P. W., Edwards, D. J., H a r t , T. P., Levin, M. I., "LISP 1.5 Programmer's Mannual", MIT P r e s s (1965). (2) H o p c r o f t , J. Ε., and U l l m a n , J. D., "Formal Languages and Their Relation t o Automata", A d d i s o n -Wesley, R e a d i n g , MA, 1969. (3) G i n s b u r g , S., "The M a t h e m a t i c a l Theory o f C o n t e x t F r e e Languages", M c G r a w - H i l l , New Y o r k , 1966. (4) K a i n , R. Υ., "Automata Theory: Machines and Languages", M c G r a w - H i l l , New Y o r k , 1972. (5) Aho, Α. V., and U l l m a n , J. D., "The Theory o f Translation and Compiling", two volumes, Hall, Englewood Cliffs, N.J., 1973. (6) Chomsky, Ν., Handbook o f Math. P s y c h . (19 ), 2 W i l e y , New Y o r k , pp. 323-418. (7) Hennie, F. C., "Finite-State Models for Logical M a c h i n e s " W i l e y , New Y o r k , 1968. (8) B l o w e r , P., Ph.D. Thesis, University o f W i s c o n s i n -Madison, 1975. (9) Salomaa, Α., "Formal Languages", Academic P r e s s , New York, 1973. (10) T h i s s u b j e c t is d i s c u s s e d a t l e n g h t in, W h i t l o c k , H. W., Jr., J. Am. Chem. Soc. ( 1 9 7 6 ) , 98, 0000. (11) Support o f this work by t h e National S c i e n c e F o u n d a t i o n is gratefully acknowledged.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
4 A Chemical Engineering View of Reaction Path Synthesis RAKESH GOVIND and GARY J. POWERS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
Dept. of Chemical Engineering, Carnegie-Mellon Univ., Pittsburgh, Pa. 15213
The synthesis, analysis, and evaluation of reaction paths i s of fundamental importance i n the chemical process industries. Research and development chemists and chemical engineers are often confronted with problems which are best solved by u t i l i z i n g chemical reactions. Figure 1 shows a matrix of the common problems and possible solutions encountered i n the chemical process industries. Problems associated with products may be classified into those which deal with compounds or mixtures of compounds which are a new market for the company and those which are existing products. Separation problems which might be solved by chemical reactions (e.g. use chemical reactions to convert one or more of the species i n a mixture
m
fH Q Ο
α, ο W OS
«
m
1
55
55
M pH
M C-l
m
Χ
<<
a, Ό OS i
1
55 Μ
C/Î
r/î M
« 55 REACTION PATHS
<
23
ο :=>
Μ
Μ 55
55
EXISTING _NEW EXISTING
REACTIONS-
NEW REACTION NEW CONDITIONS
Figure 1. Problems commonly encountered in the chemical process industries which could be solved by use of reactions or reaction paths
81 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
82
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
to species which are more easily separated or need not be separated at a l l ) may also be c l a s s i f i e d by whether they are new problems or existing ones. New problems arise with new products or new legislation related to pollution or product purity. Problems may also arise due to changing a v a i l a b i l i t y of raw materials. For example, a new reaction path may make an existing process and associated raw materials highly available. The problem i s to find new uses for the raw materials and the process. In addition, i t might be desirable to develop reaction paths which u t i l i z e (or avoid) certain technology, patents, etc. Each of these problems may be attacked by inventing new reaction paths. The paths might be composed of known reaction steps i n a new combination. Or, a new reaction step could be developed and u t i l i z e d i n a sequence of reactions. The new reaction could be a reaction of functional groups which was previously unknown or a previously known reaction which i s made to go under new conditions. Evaluation of Industrial Reactions The evaluation of industrial reaction paths depends on a detailed understanding of the costs associated with each reaction step. A simple cost function i s Cost of Reaction = f(Overall Raw Material Cost, U t i l i t y Path Costs, Equipment Costs, Safety, Reliability, Flexibility) (1) Raw Material Costs
= g(Raw Material Cost/mole, Stoichiometry, Yield)
U t i l i t y Costs
= h(Thermochemistry, Reaction Conditions, Separation Difficulty)
(2) (3)
Separation D i f f i c u l t y = i(Property Differences, Concentrations, Recovery, Separation Conditions) (4) Equipment Costs
= j(Reaction Kinetics, Separation D i f f i c u l t y , Corrosion, Conditions) (5)
Safety
= k(Species Properties, Conditions, Other Possible Reactions)
Reliability
Flexibility
(6)
= I (Knowledge of Reaction, Separation, etc.)
(7)
» m(Range of Operating Conditions)
(8)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
4.
GOviND
AND
POWERS
Reaction
Path
Synthesis
Obviously a great d e a l of i n f o r m a t i o n i s r e q u i r e d to a c c u r a t e l y compute the value o f t h i s f u n c t i o n . One of the major f u n c t i o n s of an i n d u s t r i a l r e s e a r c h and development e f f o r t i s to generate the knowledge and data necessary f o r decision-making r e l a t i v e to each r e a c t i o n step and p a t h . ( l )
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
I n d u s t r i a l R e a c t i o n Path Synthesis The challenge i n systematic r e a c t i o n path s y n t h e s i s i s t o generate r e a c t i o n paths and steps which could be v i a b l e a l t e r n a t i v e s i n an i n d u s t r i a l environment. The r e a c t i o n paths used i n i n d u s t r i a l environments have the f o l l o w i n g c h a r a c t e r i s t i c s : 1.
Small Target Molecules. The b a s i c petrochemical and f i n e chemical i n d u s t r i e s commonly d e a l w i t h molecules w i t h fewer than twenty h e t e r o - and carbon-atoms.
2.
M u l t i p l e Target Molecules (Reaction Networks). The chemical i n d u s t r y i s a network o f r e a c t i o n s fed by 3 to 5 b a s i c m a t e r i a l s and producing hundreds of t a r g e t molecules. The paths to each t a r g e t molecule i n t e r a c t w i t h each other by sharing raw m a t e r i a l s and byproducts. See F i g u r e 2 f o r p a r t of a r e a c t i o n network.
3.
Stoichiometry Must Be Known. The s t o i c h i o m e t r y f o r the path must be known i n d e t a i l . T h i s means t h a t a l l main r e a c t i o n products must be considered a t each r e a c t i o n s t e p . The f a c t that 2 moles o f NaCl or other simple r e a c t i o n products are produced duri n g a r e a c t i o n can have a major impact on the r e a c t i o n s economics. 1
4.
Y i e l d . The f r a c t i o n of the l i m i t i n g reagent which i s transformed i n t o the d e s i r e d t a r g e t molecule must be known.
5.
Byproducts. The amount and type of byproduct produced are necessary p i e c e s o f i n f o r m a t i o n . The main r e a c t i o n byproducts and side r e a c t i o n byproducts are needed f o r both the y i e l d c a l c u l a t i o n and the d e t e r mination of s e p a r a t i o n d i f f i c u l t y , c o r r o s i v e n e s s , safety, e t c .
6.
Impurity R e a c t i o n s . The l a r g e - s c a l e p r o d u c t i o n o f molecules demands a knowledge o f the f a t e o f impur i t i e s which enter w i t h the raw m a t e r i a l s or are produced i n the r e a c t i o n s t e p s . These i m p u r i t i e s can cause c o n s i d e r a b l e p o l l u t i o n and product q u a l i t y problems. I n many processes more equipment and energy i s devoted to c o n t r o l l i n g the i m p u r i t i e s than the main reagents and products. F i g u r e 3 i l l u s t r a t e s the types of r e a c t i o n s which could occur between i m p u r i t i e s and other species i n the r e a c t i o n mixture.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
83
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
METHACR YLIC ACIO
METHANOL
POLYESTER FIBER
— POLYSTYRENE PLASTIC
• STYRENE BUTADIENE RUBBER-
• NYLON FIBER
"OIMETHYL TEREPHTHALATE
ETHYLENE GLYCOL -
•ADiPlC ACID
URE A PHENOL FORM ALDEHYDE (ADHESIVE)
POLYETHYLENE • ACETYL SALICYLIC ACID "
IONOMERIC RESINS -
•ACETIC ANHYDRIDE
- CARBON BLACK ( • PETROLEUM SOLVENTS)-
•PARAXYLENE-
* ^ A M Î N E " ^ ^
•SALICYLIC ACID
CYCLOHEXANE
PHENOL
• ^
•FORMALDEHYDE
·» AMMONIA
» HYDROGEN CYANIDE
ETHYL ALCOHOL - ACETALDEHYDE
• ACETONE -
^•BUTADIENE—•ADIPONITRILE
•ETHYLENE -
- PROPYLENE-
CRUDE OIL 8 NAT. GAS «
XYLENE-
BENZENE
BUTANE-
METHANE•
ETHANE -
PROPANE-
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
COATING
PRINTING
INK
CABINETS
DRESSES
TV
TIRES
PLYWOOD
FERTILIZER
ASPIRIN
WIRE
GOLF BALL COVERS
4. GOviND
AND
Reaction
POWERS
Path
Synthesis
85
MAIN REACTIONS AT OTHER SITES PARALLEL
R^ + R^ = Ρ
SERIAL
R
+ Ρ
x
= BP
2
:
^
+ R
=
ΒΡ
χ
:
^
+ ΒΡ =
BP
3
2
χ
OTHER REACTIONS THAT OCCUR AT THE SAME CONDITIONS R
R
l
+
R
2
+
R
Ρ
2 2
R
l
R
+
R
+
R
l
P
R
2
+ Ρ
Ρ
+ ΒΡ
l
+
P
+
B
i
χ
P
i etc.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
IMPURITIES IN REAGENTS +
T
l
R
I
l
+
Figure species
7.
2
+ ΒΡ
R
+
h h I
+P
h h h h +
I 2
+
BP
1
etc.
χ
3. Reactions which could occur between the (R = reagents, Ρ = products, I = impurities, BP == byproducts) in a reaction mixture
R e a c t i o n Conditions. A l l of the features discussed above depend on the r e a c t i o n c o n d i t i o n s . I t i s nec essary to have i n f o r m a t i o n on a. b. c. d. e. f. g.
Phase(s) i n which the r e a c t i o n takes p l a c e . Solvents r e q u i r e d (the fewer the b e t t e r ) . Catalysts Temperature Pressure Concentrations Time (mixing)
Of course, t h i s i s asking f o r a great d e a l of i n f o r m a t i o n . Can modern t h e o r i e s and a p p l i c a t i o n s of r e a c t i o n path synthesis make any c o n t r i b u t i o n to t h i s problem? Are the i n d u s t r i a l l y impor tant molecules too "simple f o r current r e a c t i o n path synthesis programs? Is too much s p e c i f i c data required? 11
I n the f o l l o w i n g s e c t i o n s we present a b r i e f review of work i n computer-as s i s ted r e a c t i o n path s y n t h e s i s . The work i s com pared w i t h the needs of i n d u s t r i a l r e a c t i o n - p a t h problems. F i n a l l y , a program c a l l e d REACT which we have developed f o r r e a c t i o n path synthesis i s d e s c r i b e d and i l l u s t r a t e d .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
86
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Representation o f Molecules and Reactions One key element i n developing a r e a c t i o n path program i s the s e l e c t i o n o f r e p r e s e n t a t i o n s o f molecules and t h e i r r e a c t i o n s which a r e appropriate f o r a g i v e n problem. F i g u r e 4 conceptually compares the p r e d i c t i v e power and g e n e r a l i t y o f three r e p r e s e n t a t i o n s . P r e d i c t i v e power i s d e f i n e d as the a b i l i t y to p r e d i c t the c o n d i t i o n s , s t o i c h i o m e t r y , byproducts, y i e l d , e t c . , o f a given r e a c t i o n step i n each r e p r e s e n t a t i o n . G e n e r a l i t y i s the a b i l i t y t o generate " a l l possible reactions.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
1 1
The three r e p r e s e n t a t i o n s are shown s c h e m a t i c a l l y i n F i g ure 5. The most general types o f r e p r e s e n t a t i o n use a mathe m a t i c a l a b s t r a c t i o n o f r e a c t i o n which consider a l l the p o s s i b l e means f o r making and breaking bonds between atoms. Hendrickson (2) has i n v e s t i g a t e d a general r e p r e s e n t a t i o n i n which molecules are represented by the " c h a r a c t e r " o f the carbon s i t e s which occur w i t h i n the molecule. The character o f a s i t e i s a two d i g i t number which i n d i c a t e s whether a carbon atom i s a t tached t o σ other carbon atoms by sigma bonds or t o " f u n c t i o n a l " groups. The f u n c t i o n a l i t y o f a carbon s i t e i s defined as the number o f ττ bonds and heteroatoms attached to the carbon. The character o f a s i t e i s g i v e n by c = 10σ + f
(9)
where f - π + ζ
(10)
Valence c o n s t r a i n t s give σ + π + η + ζ £ 4
(11)
where h i s the number o f bonds to H or l e s s e l e c t r o n e g a t i v e atoms. With t h i s d e f i n i t i o n o f carbon s i t e , Hendrickson was able to c l a s s i f y a l l p o s s i b l e carbon s i t e s and the " r e a c t i o n s " which might i n t e r c o n n e c t them. F i g u r e 6 gives the r e a c t i o n t r i a n g l e for this representation. Camp and Powers extended t h i s r e p r e s e n t a t i o n by a l l o w i n g the f u n c t i o n a l i t y ( f ) to be separated i n t o π and ζ dimensions. The character o f a s i t e then becomes c = ΙΟΟσ + 10 f + ζ
(12)
I n a d d i t i o n , Camp (3) developed the s i t e c h a r a c t e r i s t i c s f o r heteroatoms. These extensions transform Hendrickson s t r i a n g l e 1
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
GOviND
AND
POWERS
Reaction
Path
Synthesis
PREDICTIVE POWER (ABILITY TO PREDICT THE YIELD, STOICHIOMETRY, CONDITIONS, BYPRODUCTS, ETC.)
ROTE
1
SUBHENDRICKSON S STRUCTURE METHOD TRANSFORMS (COREY e t a l . )
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
GENERALITY (ABILITY TO GENERATE "ALL" REACTION PATHS)
Figure
Figure
5.
4.
Ability to predict performance vs. generality three representations of reaction
A schematic
drawing of three representations their reactions
for
of molecules
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
and
ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
COMPUTER-ASSISTED
F = functionolity = 7T and /or ζ
f = 7Γ + Ζ
c = Ι Ο σ • f = character
Figure 6. Character triangle for carbon sites and interconversions. The char acter C, is shown beneath each carbon site. The shaded areas indicate possibte double bond sites (C = Ο.Υ///λ ; possible triple bond sites (C == C), GS3; and possible aromatic sites I Mill.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
4.
GOViND
AND
POWERS
i n t o a pyramid. F i g u r e 5.
Reaction
Path
Synthesis
89
I t i s t h i s r e p r e s e n t a t i o n t h a t i s shown i n
The advantage of t h i s r e p r e s e n t a t i o n i s that i t i s v e r y compact. Hendrickson s t r i a n g l e has only 15 s i t e s and 70 " r e actions." Camp's pyramid has 24 s i t e s and 132 r e a c t i o n s . The development of computer codes which u t i l i z e these r e p r e s e n t a t i o n s i s extremely simple. The r e p r e s e n t a t i o n i s a l s o able to generate " a l l " p o s s i b l e r e a c t i o n s which lead to a p a r t i c u l a r t a r g e t molecule. The procedure could s t a r t a t the t a r g e t mole cule and generate " a l l " p o s s i b l e p r e c u r s o r s by using the r e a c t i o n s on the pyramid. Each s i t e i n the t a r g e t i s developed independently and c e r t a i n dependencies must be checked (e.g. π bonds between atoms). I n essence, t h i s r e p r e s e n t a t i o n i n v o l v e s making (or breaking) every p o s s i b l e bond i n the molecule sub j e c t to the v a l e n c y and s i t e c o n s t r a i n t s .
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
1
T h i s g e n e r a l i t y i s obtained w i t h the l o s s o f power to p r e d i c t the consequences of each r e a c t i o n step. The r e a c t i o n s are not c l a s s i f i e d by the exact nature of the f u n c t i o n a l i t y . Hence r e a c t i o n s w i t h a f u n c t i o n a l i t y of f = 1 ( a l c o h o l , e t h e r , o l e f i n ) are a l l t r e a t e d s i m i l a r l y . I n a d d i t i o n , some of the r e a c t i o n s are not known to have analogs i n nature. These r e a c t i o n s j u s t don't occur (or a t l e a s t we haven't observed them y e t ) . 1
Hendrickson s r e p r e s e n t a t i o n i s an example o f a generateand-test problem-solving approach i n which the generator i s uninformed. I t simply generates p o s s i b l e r e a c t i o n s and i t i s up to the t e s t e r ( e v a l u a t o r ) to determine the f e a s i b i l i t y and value of each step or path. 1
REPAS: A Case Study Using Hendrickson s Representation Govind (4) developed a r e a c t i o n path synthesis program based on Camp's g e n e r a l i z a t i o n o f Hendrickson's r e p r e s e n t a t i o n . The program was used to generate paths f o r a wide range o f i n d u s t r i a l molecules. For even a simple molecule such as a c r y l i c 0 II
(C=C-C-0H), w e l l over 10,000 r e a c t i o n paths were generated. The e v a l u a t i o n of these paths was a d i f f i c u l t task. Many of the paths i n c l u d e d "strange" r e a c t i o n steps f o r which no meaningful mechanism could be e n v i s i o n e d . While a few o f these r e a c t i o n steps provided i n t e r e s t i n g a l t e r n a t i v e s , there was a general l a c k of correspondence between these paths and those which ex perienced chemists and chemical engineers might execute i n the l a b o r a t o r y , p i l o t p l a n t s , or i n d u s t r i a l process. Our i n a b i l i t y to be s p e c i f i c about each r e a c t i o n step g r e a t l y l i m i t s the use f u l n e s s of t h i s r e p r e s e n t a t i o n . I n an a b s t r a c t sense t h i s ap proach could produce " a l l " r e a c t i o n paths. However, the over whelming number o f combinations of f u n c t i o n a l groups and r e a c t i o n s s e v e r e l y l i m i t s the p r o b a b i l i t y o f f i n d i n g a new and
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
90
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
r e a l i z a b l e r e a c t i o n path. A r e p r e s e n t a t i o n o f r e a c t i o n i s needed which i s perhaps l e s s general but has more s p e c i f i c chemical data and theory i n i t . Substructure Representation E.J. Corey and h i s co-workers (5) have developed a r e p r e s e n t a t i o n o f molecules and r e a c t i o n s based on the combination of atoms i n the molecule which change during known r e a c t i o n s . Molecules are represented by l i n k e d data l i s t s . The l i s t s cont a i n the complete t o p o l o g i c a l and atom type i n f o r m a t i o n f o r the molecule. Reactions are represented by transforms which i n d i cate how substructures w i t h i n a molecule change during a spec i f i c type o f r e a c t i o n . C l a s s i f i c a t i o n o f the r e a c t i o n s by only the atoms which change allows the transforms to be a p p l i e d t o a wide range o f molecules which might c o n t a i n these s u b s t r u c t u r e s . F i g u r e 7 i l l u s t r a t e s the general steps i n applying a transform. FIND SUBSTRUCTURES WITHIN THE TARGET MOLECULE
CAN THE SUBSTRUCTURES BE •MADE BY REACTIONS IN THE REACTION LIBRARY?
NO
NEED MORE REACTIONS
NO SET UP SUBGOALS TO MAKE REACTION FEASIBLE
J
YES
IS EACH REACTION FEASIBLE GIVEN -THE ATOMS, FUNCTIONAL GROUPS, RINGS, ETC. LOCAL TO THE REACTION SITES?
I
YES
WHAT IS THE VALUE OF THE REACTION? CONSIDER GLOBAL REACTIONS RAW MATERIAL COSTS MECHANISM THERMOCHEMISTRY BYPRODUCTS, ETC.
IF THE VALUE IS ACCEPTABLE APPLY THE REACTION
CONTINUE WITH PRECURSORS AND SUBGOALS UNTIL AVAILABLE RAW MATERIALS ARE ENCOUNTERED Figure
7.
Steps
in applying a transform change during
based on the substructures reaction
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
which
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
4. GoviND
A N D POWERS
Reaction
Path
Synthesis
91
F i r s t , s t a r t w i t h the t a r g e t molecule and f i n d a l l the subs t r u c t u r e s w i t h i n the molecule f o r which transforms ( r e a c t i o n s ) are a v a i l a b l e . Then check to see i f the transform i s a p p l i cable f o r the s p e c i f i c molecule i n question. I n the transform there are three l e v e l s o f e v a l u a t i o n . The f i r s t l e v e l i s simp l y a check to ensure t h a t the c o r r e c t substructures are present. The second l e v e l o f e v a l u a t i o n checks the atoms " l o c a l " to the r e a c t i o n s i t e s . I f the l o c a l atoms p r o h i b i t the part i c u l a r transform the r e a c t i o n i s e i t h e r k i l l e d ( i . e . no longer considered) or a subgoal i s s e t up to change the atoms which are b l o c k i n g the r e a c t i o n . A t the t h i r d l e v e l o f e v a l u a t i o n the l o c a l atoms are checked to see how much they w i l l improve or d e t r a c t from the r e a c t i o n , (e.g. i f alpha i s e l e c t r o n withdrawing add 10 to score.) I n a d d i t i o n , i t i s important t o check the g l o b a l features o f the molecule f o r competing r e a c t i o n s i t e s f o r the same transform, or f o r other r e a c t i o n s which might take place a t the c o n d i t i o n s r e q u i r e d f o r the d e s i r e d r e a c t i o n . Corey and co-workers (5) have developed a l i b r a r y o f over 300 transforms o f t h i s type. Most o f the secondary checks on the r e a c t i o n s ( i . e . the a f f e c t o f l o c a l attachments) are knowledgeable g e n e r a l i z a t i o n s o f the reviews published by Marsh ( 6 ) , House ( 7 ) , and Buehler and Pearson ( 8 ) . This approach has considerably more p r e d i c t i v e power than the method o f Hendrickson. Of course, i t r e q u i r e s a l a r g e r data base. The g e n e r a l i t y o f the approach i s l i m i t e d by the transforms i n the data base and the secondary and t e r t i a r y checks made on each r e a c t i o n . I f the substructure transform i s not i n the data base i t w i l l not be i n c l u d e d i n any r e a c t i o n path generated using t h i s approach. I f the secondary and t e r t i a r y checks are too conservative the r e a c t i o n may not be used when i t a c t u a l l y might work. I t might be p o s s i b l e to systema t i c a l l y r e l a x the l o c a l c o n s t r a i n t s (and use a method l i k e Hendrickson s) i n some f u t u r e program. These r e a c t i o n steps would then be the subject o f a research and development program to see i f they can be made t o work. 1
Rote Memory The most powerful p r e d i c t o r and l e a s t general approach i s that o f simply s t o r i n g s p e c i f i c instances o f known r e a c t i o n s along with the conditions under which the r e a c t i o n s were performed. The p a r t i c u l a r t a r g e t molecule i s then simply patterned matched with r e a c t i o n s i n the data base. Search S t r a t e g i e s The main i s s u e i n search i s whether the synthesis i s planned by working backwards from the t a r g e t or forwards from the s t a r t i n g m a t e r i a l s . F o r many syntheses the t a r g e t i s a s i n g l e molecule and the s t a r t i n g m a t e r i a l s are many (commonly
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
92
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
10,000 or more). Hence working backwards i s an e f f i c i e n t s t r a t egy. However, when the molecules are s m a l l e r , the s e t o f s t a r t i n g m a t e r i a l s may not be n e a r l y so l a r g e . I n petrochemical syntheses n e a r l y a l l paths s t a r t w i t h molecules such as e t h y l ene, propylene, butane, e t c . , which o f t e n number l e s s than 100. For cases l i k e these i n which the s t a r t i n g m a t e r i a l s may be def i n e d , a forward s t r a t e g y may be more e f f i c i e n t . Powers and Jones (9) and Powers e t a l . (10) found that a forward-branching search s t r a t e g y ( d i s c r e t e dynamic programming) was undoubtedly the b e s t one f o r planning the chemical s y n t h e s i s o f b i h e l i c a l DNA. The advantage o f working forward from the raw m a t e r i a l s i s t h a t the e v a l u a t i o n may be more a c c u r a t e l y computed. When working backwards i t i s not p o s s i b l e to know the c o s t s o f p r e cursors s i n c e t h e i r paths have not y e t been developed. I n a d d i t i o n to search d i r e c t i o n i t i s necessary to d e t e r mine the order of enumeration o f the s y n t h e s i s t r e e . Depth, breadth and h y b r i d search procedures are a l l p o s s i b l e . For most current programs the e v a l u a t i o n i s so u n c e r t a i n that human i n t e r a c t i o n w i t h the program during e x e c u t i o n i s necessary and undoubtedly more f r u i t f u l . A p p l i c a t i o n s to I n d u s t r i a l R e a c t i o n Paths Most c u r r e n t programs have been aimed a t l a b o r a t o r y syntheses. The d e t a i l and data r e q u i r e d f o r i n d u s t r i a l paths i s not commonly i n these programs. Many o f these programs s t a r t w i t h raw m a t e r i a l molecules which are the t a r g e t s of i n d u s t r i a l chemists. We are i n t r i g u e d by the p o s s i b i l i t y o f a p p l y i n g these ideas to smaller molecules. To our knowledge, only two a p p l i c a t i o n s o f these techniques have been made to i n d u s t r i a l r e a c t i o n problems. The Chioda Chemical Engineering and C o n s t r u c t i o n Company i n Tokyo, Japan has used a program c a l l e d CHEMONICS i n which known s p e c i f i c r e a c t i o n s numbering approximately 100 are s t o r e d . The thermochemical p r o p e r t i e s , and i n some cases k i n e t i c parameters, are known f o r these r e a c t i o n s . The r e a c t i o n s are mixed and matched by the user w i t h the program performing the a n a l y s i s o f the r e a c t i o n network. I n the f o l l o w i n g s e c t i o n s we d e s c r i b e a prototype program, REACT, which i s c u r r e n t l y being, t e s t e d on a number of i n d u s t r i a l problems. REACT
—
A r e a c t i o n path s y n t h e s i s program f o r the p e t r o chemical i n d u s t r y .
We are developing a program f o r use i n the petrochemical i n d u s t r y . The f e a t u r e s o f the program are
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
4. GoviND A N D P O W E R S
1.
Reaction
Path
Molecule Representation:
Synthesis
93
Connection M a t r i x
For example,
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
0 II
READ TARGET MOLECULE
c=c=c=&
COMPUTE CONNECTION TABLE
6 0 0 2 0
FIND FUNCTIONAL GROUPS AND POSITIONS
0 4 2 0 0
0 2 4 0 0
42 24
2 0 1 4 1
0 0 0 1 8 C=C
The current s i z e l i m i t i s twenty carbon
sites.
2.
R e a c t i o n Representation Substructure Transforms Primary E v a l u a t i o n on Substructure Presence Secondary E v a l u a t i o n on L o c a l Atoms, Rings, F u n c t i o n a l Groups and C o n d i t i o n s . T e r t i a r y E v a l u a t i o n on G l o b a l Atoms, Rings, F u n c t i o n a l Groups, Competing S i t e s f o r the Desired Transform, Impurity Reactions Conditions Included Solvent, C a t a l y s t , Phase, C o n c e n t r a t i o n s ) , Temperature, Pressure, Time C u r r e n t l y 300 i n d u s t r i a l l y important substructure transforms are i n the data base.
3.
Search D i r e c t i o n :
Backwards from S i n g l e Target Molecule.
4.
Search S t r a t e g i e s
a. b. c.
5.
Evaulation
a. b. c. d. e.
D e p t h - f i r s t guided by transform scoring function B r e a d t h - f i r s t (exhaustive or w i t h breadth c o n t r o l h e u r i s t i c s ) User d r i v e n by i n t e r a c t i o n through a t e l e t y p e w r i t e r F e a s i b i l i t y by transform constraints H e u r i s t i c s c o r i n g a t transform level Stoichiometry w i t h raw m a t e r i a l costs Side r e a c t i o n s User
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
94
COMPUTER-ASSISTED ORGANIC SYNTHESIS
6.
Input/Output Input: Free format input o f the molecule i n a 15 row χ 15 column drawing box. Input by cards or t e l e t y p e w r i t e r Output: Synthesis tree or r e a c t i o n paths on l i n e p r i n t e r or t e l e t y p e w r i t e r
7.
Program Features: FORTRAN, some l i n k e d l i s t data s t r u c t u r e s , r e a c t i o n s are subroutines, i n t e r a c t i v e or batch, 2500 l i n e s of code.
Example: e-caprolactam The development o f new r e a c t i o n paths to nylon monomer i n termediates i s an a c t i v e i n d u s t r i a l r e s e a r c h area. M i l l i o n s of kilograms of these monomers are produced each year and even small f r a c t i o n a l improvements can have large economic impact. The u n c e r t a i n f u t u r e of feed stocks d e r i v e d from crude o i l has renewed i n t e r e s t i n a l t e r n a t e r e a c t i o n paths. I n a d d i t i o n , the search f o r r e a c t i o n paths w i t h lower energy costs i s i n t e n s i f y i n g . The f o l l o w i n g example i l l u s t r a t e s the use of the REACT program to generate r e a c t i o n paths which lead to e-caprolactam. Over 500 paths were generated i n l e s s than 10 CPU minutes on an IBM-360/67 computer. Several o f the paths are i l l u s t r a t e d on F i g u r e 8. T h i s study was performed as p a r t of an o v e r a l l r e view o f a chemical company's p o s i t i o n i n t h i s monomer area. The r e s u l t s of the study i n d i c a t e d s e v e r a l new r e s e a r c h d i r e c t i o n s which the r e s e a r c h and development teams are now pursuing.
c-c-c=o
Ç-C-C-C-&
+ cc-
1
H SO 2
4
1
+
NOHSO,
C-C-C
c-c 0 Ç-C-C-C-&
C-C-C-C-& 170__ 10_atm_. Pd. C a t . , , +
3 H
c-c-c C-C-C C-C-C-C-& . * .
H0
C-C-C-C
2
C-C-C 160 *
Figure
8.
Reaction
path
10 a t n . C
-
« * » C
output from the REACT (sigma bond), * = aromatic
-
+
C
program: ring
à- =
1
'
5
0
2
—OH,
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
$ ==
Reaction Path
A N D POWERS
GOviND
Synthesis
c-c-c=o •
«
C
Ν
c-c-c=o I
+
$ '
$ »
c-c
1
K_SO.(SOi
1
N-H.H S0 ** o
C-C-C=N-&
^_r.-^...f Bcckmann R e a r r .
4
1
$
C-C
C-C-C=N-& 1 ' +
*2
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
c-c-c=o
i i
+
*
c-c-c
z
^-^59_.??:-Ç?i:
J. *
C-C-C
1
» · c-c-c
c
„
n
t
^.152
c-c-c=o
ι
C Ν $ ' C-C
4 +
2HC1
C-C-C -N-&.HC1
t
t
+ l^SO
2HNOS04 + H 2 0
N
2°3
8.
+ 3 H
·
v
2°
2
HC1
4
* *
C-C-C-&
, * , c-c-c
4
Beckmann R e a r r .
C-C-C
N0C1
C-C-C-&
t
c-c-c
C-C-C-&
r
· ι C-C-C
C-C-C-0 ' ' + NH2OH.H2S04
(NH.) SO. + H.0
C-C-C
Figure
+ 2NH-
L
C-C
c-c-c=o C
I
N-H.H-ISO,
C
Light
+
C-C-C-N-&.HC1
» '
C-C-C
C-C-C 1 » + K0C1
c-c-c
HNOSO, + HC1 4 2 H
2S°4
2NH +
+ N
2°3
30 2
Continued
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
96
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004
Program Maintenance The a b i l i t y of programs like REACT to generate reasonable and potentially interesting reaction paths depends on the number and quality of reactions i n the data base. A strong commitment to reaction documentation needs to be i n i t i a t e d within a company i f the computer program i s going to be useful. The spei c a l reactions known to the company as well as reactions reported i n the open literature need to be considered. We have found that an intensive two-day training program i s commonly sufficient to teach research and development chemists and chemical engineers how to code reactions for use i n the program. A bimonthly review and generalization of each reaction should be performed by a group which includes the program spec i a l i s t s and chemists and chemical engineers drawn from across the company. Preliminary indications are that this approach i s an effective way of actively capturing and applying a wide range of reaction know-how that i s now sometimes lost i n company reports or i n the open literature. More effort needs to be made i n the area of systematic evaluation of reaction paths. These programs simply need to do more evaluation. To spend five minutes to generate 500 paths i s simply not enough time. We need to see i f i t i s possible to spend more time trying to find better paths. This need for systematic evaluation i s going to push our understanding of approximate quantum mechanical calculations.
LITERATURE CITED 1. 2. 3. 4. 5. 6.
7. 8. 9. 10.
Rudd, D.F., G.J. Powers, and J.J. Sirrola, "Process Synthesis," Prentice-Hall, N.J., 1973. Hendrickson, J.B., J. Am. Chem. Soc., (1971) 93, 6847. Camp, D.F., MSChE Thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1974. Govind, R., MSChE Thesis, Carnegie-Mellon University, Pittsburgh, Pennsylvania, 1976. Corey, D.J., W. Todd Wipke, R.D. Cramer III, and W.J. Howe, Science, (1972) 166, 440. Marsh, J . , "Advanced Organic Chemistry: Reactions, Mechanisms, and Structure," McGraw Hill, New York, N.Y., 1965. House, H., "Modern Synthetic Reactions," Benjamin, Menlo Park, California, 1972. Buehler, C.A. and D.E. Pearson, "Survey of Organic Syntheses," Wiley-Inter science, New York, N.Y., 1970. Powers, G.J. and R.L. Jones, AIChE J . , (1973) 19, No. 6. Powers, G.J., Russell Jones, George Randall, Marvin Caruthers, J . Van de Sande, and H.G. Khorana, JACS, (1975) 97, 875.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5 SECS—Simulation and Evaluation of Chemical Synthesis: Strategy and Planning W. T. WIPKE, H. BRAUN, G. SMITH, F. CHOPLIN, and W. SIEBER
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
Board of Studies in Chemistry, University of California, Santa Cruz, Calif. 95064
1.
INTRODUCTION
As the f i e l d of organic synthesis grows i n sophistication it is appropriate that some thought should be given to the problem solving processes used i n designing a synthesis and i n selecting a synthesis from among many. Just as mathematicians have found the computer useful i n studying the structure of f i n i t e groups and i n searching for and testing new proofs, so too the organic chemist i s finding the computer useful i n exploring the hypersurface of syntheses. Because actual execution of a synthesis i s time-consuming and costly, the selection of which synthetic sequence to use becomes very important. The "Eureka Syndrome" leads one to stop generating alternative approaches after the f i r s t attractive solution i s formulated, but for an informed selection of the "best" route to try i n the laboratory, one should examine several possible routes. Some of the advantages of computer-assisted design derive from the fact that the computer i s not subject to the "Eureka Syndrome" or other human biases unless it i s directed to be. A more complete discussion of these advantages i s given elsewhere. 1
2
2
Another important benefit of developing a program for computer-assisted design of synthesis i s that i n building such a program we have the opportunity to study the analytical processes chemists use i n synthesis design and i n some ways the computer program allows us to test our understanding of these processes and principles and the completeness of those principles. In this way research i n computer-assisted design of syntheses can be expected to contribute to manual analysis through better understanding of principles and processes. Our goal, given target T, i s to generate good syntheses by working backward frgm the target toward simpler compounds (problem type III). This paper i s concerned mainly with this
97 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
98 I II III
s t a r t —L->
?
S t a r t — 2 — » Target ?
— T a r g e t
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
type I I I problem, as are the other papers i n t h i s symposium. However techniques developed f o r type I I I problems are a l s o a p p l i c a b l e to types I and I I as we have p r e v i o u s l y demonstrated/
1
Our goal s t a t e , the t a r g e t molecule, i s a s p e c i f i c organ i z a t i o n of atomic n u c l e i and e l e c t r o n s . The t a r g e t molecule i s connected by chemical transformations to many other s t a t e s which we c a l l p r e c u r s o r s . Some of the precursors are more complicated than the t a r g e t ( i . e . , more d i f f i c u l t to synthesize than the target) and others are simpler than the t a r g e t (see F i g . 1). The precursor and t a r g e t are not r e q u i r e d to have the same number of atoms of each type s i n c e there may be a d d i t i o n of a reagent to Ρ or fragmentation of Ρ to form T.^ The p h y s i c a l laws of the universe
Figure
1
d e f i n e a l l p o s s i b l e precursors which by some chemical change can lead to the t a r g e t T, but we have l i t t l e knowledge of t h i s absolute space of s t a t e s and most of what we have i s gained by analogy and e x t r a p o l a t i o n from chemical transformations observed i n r e l a t e d systems. Laboratory e x p l o r a t i o n of the absolute space only one step away from Τ would r e q u i r e s u b j e c t i n g a l l compounds to a l l r e a c t i o n c o n d i t i o n s and r e c o r d i n g those which y i e l d Τ as a product!
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5. w i P K E E T A L .
SECS:
Strategy
and Planning
99
The space i s very l a r g e as the example below i l l u s t r a t e s .
R 1 o, —
2
-
>
2) Me S Ρ
Cf
+
RCHO
Τ
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
I f cyclohexanone i s the t a r g e t , by a r b i t r a r i l y v a r y i n g R one can create as many precursors as one wishes, a l l of which do form cyclohexanone under these c o n d i t i o n s , and hence are r e a l s t a t e s connected to Τ i n the absolute synthesis space. Now the purpose of the computer-assisted design program i s to help the chemist v i s u a l i z e , explore, and evaluate t h i s space. We would l i k e to explore i n the d i r e c t i o n of simpler compounds (toward the bottom of f i g u r e 1) r e a l i z i n g that some i n c r e a s e i n complexity of the precursors must be permitted because they —1
—9
may e v e n t u a l l y lead to even simpler compounds, eg. P.. ,P 1
. 6
The completeness of a s y n t h e s i s program i s the percentage of paths i n absolute space which are generated (considered) by the program. The accuracy of the program i s the percentage of paths generated by the program which are contained i n the absolute space, s i n c e i f a path does not belong to the absolute space i t i s an erroneous p r e d i c t i o n . Completeness assures no good s o l u t i o n s w i l l be missed, and accuracy assures that p r e d i c t e d s o l u t i o n s are r e a l i s t i c and have high p r o b a b i l i t y of working i n the l a b o r a t o r y . In a d d i t i o n we g e n e r a l l y want e f f i c i e n t s o l u t i o n s c o n s i s t i n g of few steps and high o v e r a l l y i e l d s . This paper describes how we approach these goals i n the SECS program. 2.
DESCRIPTION OF SECS SYSTEM
The SECS p r o j e c t f o r Simulation and E v a l u a t i o n of Chemical Syntheses was i n i t i a t e d i n 1969 to focus on stereochemistry, and the s p a t i a l and e l e c t r o n i c aspects of chemistry, p r o x i m i t y , s t e r i c e f f e c t s , and s t e r e o e l e c t r o n i c e f f e c t s , which the f i r s t synthesis program OCSS-LHASA^Ί d i d not consider. Our g o a l was a program which would be u s e f u l i n complex p o l y f u n c t i o n a l p o l y c y c l i c s y n t h e s i s , where success i s dependent on s e l e c t i v e r e a c t i o n s and s e l e c t i v i t y i s h i g h l y dependent on the s t e r i c and s t e r e o e l e c t r o n i c nature of the r e a c t i n g centers. SECS 1.0 was f i r s t demonstrated p u b l i c a l l y v i a t e l e t y p e at a Gordon Conference J u l y 1972, and SECS 1.5 was used with a DEC GT40 graphics t e r m i n a l over the t r a n s - A t l a n t i c cable a t the NATO Advanced Study I n s t i t u t e i n Holland, June 1973. Since 1975 a 3
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
100
COMPUTER-ASSISTED ORGANIC SYNTHESIS
v e r s i o n of SECS has been a v a i l a b l e to the general p u b l i c over TELENET and TYMNET. This paper d e s c r i b e s the current v e r s i o n as of A p r i l 1976, SECS 2.4.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
SECS 2.4 occupies 130K 32-bit words of memory, but when overlayed f i t s i n 48K. A d d i t i o n a l l y the program uses about 1M bytes of a u x i l i a r y d i s k storage f o r permanent and temporary files. SECS i s w r i t t e n i n FORTRAN IV and runs on PDP-10, PDP-20, UNIVAC 1108, IBM 370, and HONEYWELL-BULL systems. To run the SECS program one needs only a t e l e t y p e t e r m i n a l or a DEC GT40 graphics t e r m i n a l and access to a telephone. The chemist enters the t a r g e t molecule on a GT40 system using the l i g h t pen (an i n t e g r a l p a r t of the GT40 t e r m i n a l ) . The s t r u c t u r e i s simply drawn with the l i g h t pen j u s t as one would with a p e n c i l . D e t a i l s of the process has been given e a r l i e r . SECS i s c a r e f u l l y optimized to make such i n t e r a c t i v e g r a p h i c a l input e f f i c i e n t even over slow communication l i n e s of 300 baud. For example, as each new bond i s drawn i n with the l i g h t pen, only the new bond i s transmitted from the host computer r a t h e r than the e n t i r e updated molecular s t r u c t u r e . For p i c t u r e s of the input d i s p l a y the reader i s d i r e c t e d to r e f e r e n c e 3. A l l f u n c t i o n s of the SECS program can a l s o be invoked from a t e l e t y p e device or alphanumeric CRT t e r m i n a l . S t r u c t u r a l input of 1 from the t e l e t y p e i s i l l u s t r a t e d by the d i a l o g i n F i g u r e 2. O
H
When SECS begins the user i s given an opportunity to see what changes have r e c e n t l y been made to t h i s v e r s i o n . Then i t asks i f there i s a RESTART f i l e , t h i s i s to allow c o n t i n u a t i o n from some previous s e s s i o n . I f t h i s i s a new problem t h i s i s s i g n a l e d by a c a r r i a g e r e t u r n (CR). The colon prompt i n d i c a t e s the SECS executive i s executing. I f the user types "HELP, SECS l i s t s a l l commands a v a i l a b l e to the user at t h i s p o i n t . TTYIN i s then requested which asks f o r a name f o r the molecule and the number of non-hydrogen atoms. The user then enters the molecular c o n n e c t i v i t y by s p e c i f y i n g a path of connected atoms. Note the m u l t i p l e bond i s i n d i c a t e d by g i v i n g the path between atoms 4 and 5 twice. A l l atoms are assumed to be carbon u n t i l s p e c i f i e d otherwise. In the example, atom 1 was changed to oxygen. F i n a l l y the stereochemistry of the s t r u c t u r a l diagram i s conveyed. Approximate atomic coordinates are entered to complete the " s t r u c t u r a l diagram." Ζ values may be entered i f one wishes to f i x the conformation, but normally the SECS model b u i l d e r generates these when needed. The TPLOT command p r i n t s a 11
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
WIPKE E T A L .
SECS:
Strategy
and
Pfonning
101
.RUN SECS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
SECS VERSION 2.4, 4/1/76 LIST CHANGES (Y OR N)? RESTART FILE NAME (OR CR): :TTYIN MOLECULE NAME (A20) :TEST ENTER NUMBER OF NEW ATOMS: 5 TO CREATE BONDS, TYPE BOND PATHS;END WITH BLANK LINE BONDS: 1 2 3 4 5 2 BONDS: 4 5 BONDS [user types c a r r i a g e r e t u r n ] DO YOU WISH TO CHANGE ATOM TYPES, CHARGES, OR STEREO INFORMATION (Y OR N) :Y TO CHANGE ATOM TYPES, TYPE ATOM # AND NEW TYPE READY: 1 0 READY: CHARGES: TYPE ATOM // AND CHG(+, ,-) READY: TYPE CHIRAL ATOM //, CONNECTING ATOM //, AND DIRECTION (U OR D) READY: 2 1 U READY: TYPE ATOM # + Χ,Υ,Ζ COORDINATES READY: 1 20 10 READY: 2 10 10 READY: 3 0 10 READY: 4 0 0
Figure
2.
Input of 1 via teletype
(top) vs. display
(bottom)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
102
COMPUTER-ASSISTED ORGANIC
SYNTHESIS
s t r u c t u r a l diagram of the molecule on the TTY using the given r e l a t i v e coordinates. The ">" symbol i n d i c a t e s the OH i s above the plane d e f i n e d by the 4-membered r i n g . Any e r r o r s i n the s t r u c t u r e can be e a s i l y c o r r e c t e d by the s t r u c t u r e e d i t o r TTYED. TTYED can a l s o be used to modify p r e v i o u s l y defined molecule f i l e s .
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
:TTYED MOLECULE EDITOR - TYPE A COMMAND, OR HELP ED:HELP THE FOLLOWING ARE THE ONLY COMMANDS RECOGNIZED BY TTYED END ADDED DELBD ATYPE CHAGR STERE XYZ ADDAT DELAT Figure
EXIT FROM TTYED - ADD BONDS TO CURRENT STRUCT. - DELETE BONDS - CHANGE ATOM TYPES - CHANGE ATOM CHARGES - CHANGE STEREO INFORMATION GIVE ATOM COORDINATES - ADD NEW ATOMS AND BONDS TO STRUCT. - DELETE AN ATOM AND ALL BONDS TO IT 3.
Molecule
Editor Options command
listed
by
HELP
This method TTY of s t r u c t u r e input i s not as elegant as that developed by Feldmann,^ but i t gives the user complete c o n t r o l over the layout or p r e s e n t a t i o n of the s t r u c t u r a l diagram and includes s tereochemistry. The t a r g e t molecule i s now entered and ready f o r s y n t h e t i c a n a l y s i s . The molecule may be saved i n a d i s k f i l e with the WRITF command so the work of input w i l l not be l o s t i n the event of telephone d i s c o n n e c t i o n . I t can be read back i n by a READF command. To begin a n a l y s i s with the standard d e f a u l t s t r a t e g i e s (which w i l l be described l a t e r ) the user simply types RUN or pushes l i g h t button "PROCESS." This generates the f i r s t l e v e l of the r e t r o s y n t h e t i c t r e e . Each precursor i s shown to the user on h i s t e r m i n a l . He i s then f r e e to VIEW a precursor and PROCESS i t f u r t h e r . Thus the standard usage of SECS i s q u i t e simple f o r the user. With t h i s overview l e t us now examine the components or modules of the SECS system shown i n Figure 4. The executive i s always present i n memory together with one other module. T h i s modular c o n s t r u c t i o n not only s i m p l i f i e s the program and makes i t easy to maintain, but a l s o allows us t o keep memory requirements f o r execution of SECS below 48K 32 or 36 b i t words.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
wiPKE ET AL.
SECS:
Strategy
and
Planning
103
GRAPHICS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
GRAPH PERCEPTION
3-D
EVALUATION
ELECTRONIC MODEL
MODEL
SYNCOM COMPILER
ALCHEM TRANSFORMS Figure
4.
SECS
2.4
ALCHEM INTERPRETER J
R
CHM
(. L I B .
modules
A f t e r a s t r u c t u r e has been entered i n t o SECS, i t i s "perceived" f o r the presence of r i n g s , ^ f u n c t i o n a l groups,^*? a r o m a t i c i t y , and i s assigned a s t e r e o c h e m i c a l ! / unique name.10 Molecular symmetry i s a l s o recognized at t h i s t i m e . H This i s the l i m i t of information that can be derived from only a s t r u c t u r a l diagram. A chemist would not stop at t h i s p o i n t however, but would a l s o d e r i v e v a r i o u s perceptions from the shape of the molecule, e i t h e r from an i m p l i c i t imagined 3-dimensional molecular model i n the chemists mind or from an e x p l i c i t p h y s i c a l Dreiding model which the chemist might b u i l d . Consequently SECS also has a model b u i l d e r m o d u l e ^ * ^ which constructs by c o n s t r a i n t s a t i s f a c t i o n techniques a 3-D model of minimum energy. SECS then has a v a i l a b l e C a r t e s i a n coordinates and Van der Waals r a d i i of each atom, and thus "knows" the shape of the molecule. Now a second order p e r c e p t i o n determines the p r o x i m i t y of atoms, s t e r e o e l e c t r o n i c o r i e n t a t i o n of bonds, s t r a i n energy, and even steric congestion.^ SECS a l s o has an e l e c t r o n i c model b u i l d e r to p e r c e i v e s t a b i l i z a t i o n and l o c a t i z a t i o n energies of conjugated systems as w e l l as e l e c t r o n d e n s i t i e s . This i s discussed l a t e r i n t h i s paper.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
104
COMPUTER-ASSISTED ORGANIC SYNTHESIS
At t h i s p o i n t SECS has made q u i t e a study of the target molecule. Information gained from t h i s study i s used to s e l e c t and evaluate which chemical transforms should be a p p l i e d to the target to generate precursors. A l l other a n a l y s i s has been p r e p a r a t i v e to t h i s , the productive stage. To see how transforms are s e l e c t e d and evaluated l e t us f i r s t examine the r e p r e s e n t a t i o n of a transform.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
3.
DESCRIPTION OF A CHEMICAL TRANSFORM.
A chemical transform i s a chemical s t r u c t u r a l change, or r e d i s t r i b u t i o n of e l e c t r o n s , g e n e r a l l y described i n the a n a l y t i c a l d i r e c t i o n (the inverse of the s y n t h e t i c d i r e c t i o n . Note that i t i s not necessary that a transform correspond to a complete r e a c t i o n . We have defined three d i f f e r e n t l e v e l s of representation: the ab i n i t i o l e v e l , name r e a c t i o n l e v e l , and common r e a c t i o n sequence l e v e l . 3 At the ab i n i t i o l e v e l a transform represents an electron-pushing step or sequence (equations 1 and 2).
Ar"-A
2
ο -
A*
+
A"
2
ο
(1)
(2)
The l e f t hand s i d e represents a p a t t e r n which must e x i s t i n the target molecule and the r i g h t hand s i d e , the p a t t e r n as i t w i l l e x i s t i n the p r e c u r s o r ( s ) . The double shafted arrow means " i m p l i c a t i o n , i e , seeing the l e f t hand s i d e i n the target " i m p l i e s " by t h i s transform that i t can be transformed to the r i g h t hand s i d e . 11
The "name r e a c t i o n " l e v e l , i l l u s t r a t e d by the A l d o l (equation 3), corresponds more c l o s e l y with s y n t h e t i c steps, and the precursors produced are normally s t a b l e i s o l a b l e compounds. Of course a "name r e a c t i o n " i s a sequence of ab i n i t i o steps.
RCOOH
RCH C00H 2
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
(4)
5.
wiPKE ET AL.
SECS:
Strategy
and
Planning
105
S i m i l a r l y a sequence of "name r e a c t i o n s " can be combined i n t o a "common sequence" (equation 4).
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
Each l e v e l has i t s m e r i t s . The "ab i n i t i o " l e v e l i s u s e f u l f o r d i s c o v e r i n g new r e a c t i o n s by searching f o r new sequences of e l e c t r o n pushing steps. The "name-reaction" l e v e l i s most usef u l f o r novel t o t a l s y n t h e s i s whereas the " r e a c t i o n sequence" l e v e l i s u s e f u l f o r r a p i d l y generating c l a s s i c a l syntheses, but has l i t t l e i n n o v a t i v e power. 3.1 TRANSFORM LEVEL AND COMPLETENESS. In order f o r a synt h e s i s program to consider the complete space of problem s o l u t i o n s i t i s necessary to have a "complete" set of chemical transforms. I f the "ab i n i t i o " l e v e l i s used, then "complete ness" i s achieved with very few transforms l i k e equation 1. The problem with the "ab i n i t i o " l e v e l i s that the accuracy of p r e d i c t i o n s i s q u i t e low and very few paths i n the s y n t h e s i s t r e e generated are l i k e l y to succeed i n the l a b o r a t o r y . One could improve the accuracy by adding to equation 1 c o n s t r a i n t s r e l a t i n g the f e a s i b i l i t y of the process to the e l e c t r o n e g a t i v i t y and environment of atoms A-^ and A2. At the "name-reaction" l e v e l , thousands of r e a c t i o n s might be needed f o r reasonable completeness. The exact number i s not known—we do not know how many "name r e a c t i o n s " there are s i n c e there i s no agreed upon indexing system f o r r e a c t i o n s . The "name r e a c t i o n " l e v e l has the disadvantage of r e q u i r i n g a l a r g e r knowledge base, but the advantage of p r o v i d i n g the s y n t h e s i s program with more accurate p r e d i c t i v e c a p a b i l i t i e s . Except f o r mechanistic s t u d i e s , ^ our s y n t h e t i c work has been based p r i m a r i l y on the "name-reaction" l e v e l , because our i n t e r e s t i s i n generating accurate s y n t h e t i c p l a n s . By s e l e c t i n g t h i s l e v e l we are saving the program the e f f o r t of r e d i s c o v e r i n g r e a c t i o n s that are already known, and we are preventing the generation of sequences of ab i n i t i o steps f o r which there i s no known analogy. This i s an example of t r e e r e d u c t i o n through g r e a t e r chemical accuracy. 3.2 ALCHEM - A LANGUAGE FOR CHEMISTRY. In the SECS p r o j e c t we view a transform as a concise s c i e n t i f i c statement of what a p a r t i c u l a r chemical transformation i s and the f a c t o r s which a f f e c t when the transformation w i l l or w i l l not occur. T h i s packet of knowledge i s s t r u c t u r e d through the language ALCHEM, an E n g l i s h - l i k e language readable to chemist and computer. A transform contains the f o l l o w i n g types of i n f o r m a t i o n :
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
106
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
NAME REFERENCE SUBSTRUCTURE PRIORITY CHARACTER CONDITIONS SCOPE AND LIMITATIONS MANIPULATIONS "END" The NAME i s a text s t r i n g f o r i d e n t i f i c a t i o n o f the transform. REFERENCE i s a l e a d i n g l i t e r a t u r e reference s u b s t a n t i a t i n g the v a l i d i t y o f the transform. The SUBSTRUCTURE describes a p a t t e r n which must be present i n the target molecule f o r t h i s transform to " f i t . " I t describes the r e a c t i o n s i t e as i t appears i n the product, a f t e r completion of the s y n t h e t i c r e a c t i o n , g e n e r a l l y i n c l u d i n g a l l r e q u i r e d f u n c t i o n a l groups. For example, the subs t r u c t u r e f o r the a l d o l r e a c t i o n transform (eq. 3) could be "ALCHOHOL KETONE PATH 3", d e s c r i b i n g a p a i r of f u n c t i o n a l groups and a path o f three atoms between them. This can a l t e r n a t i v e l y be g e n e r a l i z e d using c l a s s e s o f f u n c t i o n a l groups: "ALCOHOL 0X0 PATH 3" where 0X0 = {KETONE, ALDEHYDE, ESTER}, or even more g e n e r a l l y t o : "DGR0UP WGROUP PATH 3" where DGROUP = {ALCOHOL, AMINE, THIOL} and WGROUP = {KETONE, ALDEHYDE, ESTER, CYANO, NITRO, ACID HALIDE, etc.}. These examples i n v o l v e a p a i r o f groups i n the t a r g e t . Other r e a c t i o n s (e.g., r e d u c t i o n o f a ketone t o an a l c o h o l ) only i n v o l v e one group i n the t a r g e t , hence the substructure only i n v o l v e s one group. The l a t t e r are c a l l e d FGIs f o r f u n c t i o n a l group interchange. ALCHEM provides an a l t e r n a t i v e r e p r e s e n t a t i o n which allows d e s c r i p t i o n of any subs t r u c t u r e , even where normal f u n c t i o n a l groups are not i n v o l v e d . A l i n e a r symbolic code has been developed f o r t h i s purpose.14 The a l d o l substructure could be represented as "0=C-C-C-0H/". PRIORITY i s a value representing the i n i t i a l p l a u s i b i l i t y of the transform, before c o n s i d e r a t i o n o f the r e a c t i o n s i t e context. Values range from -50 to 100. CHARACTER describes what kinds o f s t r u c t u r a l m o d i f i c a t i o n s the transform can make, e.g., ALTERS GROUP, BREAKS RING. CONDITIONS are g e n e r a l i z e d c l a s s e s of r e a c t i o n c o n d i t i o n s , r a t h e r than s p e c i f i c reagents. The advantage being that any new reagent can always be represented as belonging to a general c l a s s . These are described l a t e r i n t h i s paper. SCOPE and LIMITATIONS comprise the main body of the t r a n s form. They are " s i t u a t i o n - a c t i o n " r u l e s which explore the cont e x t o f the r e a c t i o n s i t e and a l t e r the PRIORITY or other values to r e f l e c t the i n f l u e n c e c e r t a i n s t r u c t u r a l features have on the p l a u s i b i l i t y o f t h i s r e a c t i o n . PRIORITY i s one o f eight v a l a b l e s
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
WIPKE E T A L .
SECS:
Strategy
and
Planning
107
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
to be a l t e r e d i n ALCHEM. Figure 5 shows excerpts taken from the a l d o l transform t o i l l u s t r a t e ALCHEM statements. Lines 2-5 represent the NAME, REFERENCE, SUBSTRUCTURE, PRIORITY, and CHARACTER, r e s p e c t i v e l y . Group 1 i s the DGROUP and Group 2 i s the WGROUP. Atom 1 r e f e r s to the f i r s t atom on the path, the l o c a t i o n of the DGROUP. L i n e 6 prevents the r e a c t i o n i f the DGROUP i s an a l c o h o l which i s e s t e r i f i e d . Line 7 says a n i t r i l e i s not a good e l e c t r o n withdrawing group i n t h i s r e a c t i o n (WGROUP), and lowers the p r i o r i t y by 20. Line 8 evaluates the a l k y l s u b s t i t u t i o n on the a t t a c k i n g anion, lowering the p r i o r i t y f o r every s u b s t i t u e n t . B a s i c c o n d i t i o n s are p r e f e r r e d ( l i n e 9 ) , but i f those c o n d i t i o n s i n t e r f e r e with other s e n s i t i v e groups i n the s t r u c t u r e , then a c i d i c c o n d i t i o n s can be used with a lowered p r i o r i t y ( l i n e 10). A manipulation statement ( l i n e 11) breaks bond 1 on the path connecting the DGROUP and WGROUP. L i n e 12 examines the h e t e r o atom i n the DGROUP and c a l l s i t ( 1 ) " . I f i t i s a quaternary n i t r o g e n atom the transform i s k i l l e d , but i f i t i s a t e r t i a r y n i t r o g e n a p o s i t i v e charge i s placed on the n i t r o g e n atom. T h i s shows that manipulation statements can a l s o be c o n d i t i o n a l as a r e s u l t o f a "IF...THEN" query. L i n e 13 puts i n the m u l t i p l e bond to the heteroatom. L i n e 14 increments the p r i o r i t y i f there are no e n o l i z a b l e hydrogens adjacent to atom 1. Lines 15-18 lower the p r i o r i t y i f the carbanion i s r e q u i r e d to a t t a c k the m u l t i p l e bond from the most congested s i d e . I n e v a l u a t i n g congestion, SECS b u i l d s a 3-D model and i n t e g r a t e s the a c c e s s i b i l i t y o f the r e a c t i o n s i t e . ^ Lines 15-18 a l s o i l l u s t r a t e the powerful Set operations a v a i l a b l e i n ALCHEM. The word END s i g n a l s the end of the transform. The a l d o l transform i n the SECS transform l i b r a r y contains many more d e t a i l e d scope and l i m i t a t i o n s statements. 11
1
The ALCHEM transforms are gathered together i n unordered s e q u e n t i a l source f i l e s . These source f i l e s are compiled by SYNCOM i n t o e f f i c i e n t b i n a r y d i r e c t access CHM f i l e s complete with v a r i o u s d i r e c t o r i e s . The CHM f i l e s are l a t e r i n t e r p r e t e d by SECS at run time. SYNCOM a l s o provides very important syntax checking of the ALCHEM transforms. Since the transform l i b r a r y i s not p a r t o f the SECS program, transforms can be added or modified without changing the SECS program. Since the transform l i b r a r y i s not s t o r e d i n main memory, the number and length o f transforms i n the l i b r a r y i s e f f e c t i v e l y u n l i m i t e d . The current l i b r a r y contains over 400 transforms. 1 6
A transform i s not viewed as a program, but r a t h e r as a statement of f a c t s . I t does not contain GO TO's, LOOPs, SUBROUTINE c a l l s , and does not reference any other transform. Each transform i s independent of any other transform. Further a transform contains no s t r a t e g i e s or h e u r i s t i c s about when i t s
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
108
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12)
13) 14) 15) 16) 17) 18)
19)
ORGANIC SYNTHESIS
; HA-C-C-W => A=C H-C-W ALDOL-COND ;REF: H.O. HOUSE, MOD. SYN. RXNS.', (1972) ρ 629. DGROUP WGROUP PATH 3 PRIORITY 100 CHARACTER BREAKS CHAIN BREAKS RING IF GROUP 1 IS ESTERX THEN KILL IF GROUP 2 IS A NITRILE THEN SUBT 20 IF AN RGROUP IS ALPHA TO ATOM 2 THEN SUBT 10 FOR EACH CONDITIONS BASIC AND NUCLEOPHILIC OR CONDITIONS ACIDIC AND NUCLEOPHILIC THEN SUBT 20 BREAK BOND 1 I F ATOM 1 IN GROUP 1 IS NITROGEN (1) THEN BEGIN IF (1) IS QUATERNARY THEN KILL ELSE I F (1) IS TERTIARY THEN ADD + TO (1) DONE MAKE BOND FROM ATOM 1 IN GROUP 1 TO ATOM 2 IN GROUP 1 IF ATOMS ALPHA TO ATOM 1 OFFPATH ARE ALL ATTACHED & TO 0 HYDROGENS THEN ADD 50 IF ATOM 1 IS A STEROCENTER (1) THEN BEGIN I F ATOM IS ALPHA TO ATOM 1 OFFPATH (2) PUT (1) OR (2) INTO (1) IF ATOM 2 IS ON MOST HINDERED SIDE OF (1) THEN SUBT 50 DONE END T
Figure
5.
Excerpts
from the Aldol
Transform
a p p l i c a t i o n w i l l l e a d to a s i m p l i f i c a t i o n of the s y n t h e t i c problem. S t r a t e g i e s and transforms are separated completely, with the r e s u l t that new transforms can be e a s i l y added without a l t e r i n g s t r a t e g i e s , y e t the s t r a t e g i e s w i l l use the new t r a n s forms when a p p r o p r i a t e . This s e p a r a t i o n lends c l a r i t y and m a i n t a i n a b i l i t y to the SECS knowledge base.l? ALCHEM i s a powerful d e s c r i p t i v e language. To date we have found no r e a c t i o n that couldn't be represented. P a r t of the power of ALCHEM stems from the r i c h p e r c e p t u a l i n f o r m a t i o n base i n SECS which i s accessed by terms such as STERIC HINDRANCE, and the c a p a b i l i t y of SECS to evaluate these expressions i n terms of models. I n f o l l o w i n g s e c t i o n s we show how these expressions and models are used to achieve greater chemical accuracy and thus e f f e c t i v e l y reduce the s i z e of the s y n t h e s i s t r e e generated. 4.
SYNTHESIS TREE REDUCTION THROUGH GREATER CHEMICAL ACCURACY
4.1 STERIC EFFECTS. A chemist p r e d i c t s s t e r i c e f f e c t s by e v a l u a t i n g the p o s s i b l e approach of a reagent to a t h r e e dimensional model of the r e a c t i n g molecule. SECS does the same
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
wiPKE ET AL.
SECS:
Strategy
and
Planning
109
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
t h i n g . The SYMIN module constructs a 3-D s p a c e - f i l l i n g model according to the r e q u i r e d stereochemistry using molecular mechanics.12 Any transform can invoke a s t e r i c c a l c u l a t i o n as i l l u s t r a t e d i n l i n e 18 of f i g u r e 5. Of p a r t i c u l a r importance are r e a c t i o n s which create new stereocenters by attack on a double bond. For example, i n the U c l a f synthesis of andrenosterone, i m p l i c a t i o n of the ketone precursor from the β - a l c o h o l i s v a l i d because i t a l s o implies hydride attack from the l e s s con gested s i d e of the carbonyl which i s p r e f e r r e d .
We developed a c a l c u l a t i o n a l model f o r t h i s type of r e a c t i o n to permit e v a l u a t i o n of s t e r i c a c c e s s i b i l i t y and i t s i n v e r s e . congestion, on any type of molecular s t r u c t u r e (see f i g . 6)A$
b Figure 6. Cone of preferred approach of R to χ allowed by atom i. The accessibility of χ on side 3L with respect to i is defined by this solid angle and numerically equals the area on a unit sphere cut by this cone (shaded area).
Using t h i s model to p r e d i c t the major product from ketone r e ductions, we found p r e d i c t i o n s c o r r e l a t e w e l l with experiment (n = 34, r = 0.924). We a l s o found t h i s model a p p l i c a b l e to epoxidation and hydroboration of o l e f i n s . ^ The s t e r i c ALCHEM statement ( l i n e 18, f i g . 5) i s evaluated v i a the s t e r i c c a l c u l a t i o n to determine i f the r e a c t i o n would create the r e q u i r e d 2
2
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
110
stereoisomer, and depending on the t r u t h v a l u e , the appropriate c o n d i t i o n a l phrase i s i n t e r p r e t e d . Note t h i s i s comparison of congestion between the two s i d e s of a double bond.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
Because the c a l c u l a t e d congestion i s absolute, one can a l s o make comparisons between f u n c t i o n a l groups w i t h i n a s t r u c t u r e . For example, i n Corey's l o n g i f o l e n e s y n t h e s i s i t was necessary to remove one ketone i n the presence of another as shown below. ^ Normally t h i s would r e q u i r e p r o t e c t i n g the ketone which i s to remain. Both ketones are s a t u r a t e d , but they d i f f e r i n s t e r i c environment. The t o t a l congestions c a l c u l a t e d by SECS i n d i c a t e that the ketone which i s to remain i s already p r o t e c t e d by the
Σ CONG. = 35
(4a)
Σ CONG. = 114 greater s t e r i c congestion present, hence r e a c t i o n s should occur on the a,α-dimethyl ketone s e l e c t i v e l y and t h i s was observed ex p e r i m e n t a l l y . This i s expressed i n ALCHEM by "IF STERIC AT GROUP 1 BETTER THAN KETONE ANYWHERE THEN..." This approach of e v a l u a t i n g s t e r i c e f f e c t s has proven e f f e c t i v e - ^ on a wide v a r i e t y of molecular skeletons and i s not r e s t r i c t e d to any l i b r a r y of s p e c i a l r i n g systems. We b e l i e v e SECS s p r e d i c t i v e c a p a b i l i t y i n an unknown system compares f a v o r a b l y with the p r e d i c t i v e c a p a b i l i t y of most chemists. 1
4.2 ELECTRONIC EFFECTS. The e l e c t r o n i c p r o p e r t i e s of con jugated systems could be s t o r e d i n a l i b r a r y according t o r i n g system and heteroatom s u b s t i t u t i o n . Such an approach however f a i l s when faced with a s k e l e t o n not contained i n the l i b r a r y . For d i r e c t i n g e f f e c t s on benzene- one could use the Hammett equation*** or t o p o l o g i c a l r u l e s , but these cannot e a s i l y be e x t r a p o l a t e d t o p o l y c y c l i c or h e t e r o c y c l i c aromatic systems. As a more general and f l e x i b l e approach, we decided SECS should construct i t s own molecular o r b i t a l (MO) model of the conjugated
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
WIPKE ET
AL.
SECS:
Strategy
and
Planning
111
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
system from which i t could derive whatever p r o p e r t i e s i t needed. The HUckel MO method was s e l e c t e d to minimize computational e f f o r t . Wtihin t h i s framework, l o c a l i z a t i o n energies20 have proved more s u c c e s s f u l than e l e c t r o n d e n s i t i e s ^ l i n p r e d i c t i n g aromatic substitution reactions. The HAREHM module recognizes s u b s t i t u e n t s on aromatic r i n g s and heteroatoms i n the r i n g s , and assigns to each of them jvparameters c h a r a c t e r i z i n g t h e i r e l e c t r o n withdrawing or donating properties. For each aromatic r i n g system HAREHM analyzes the p o s s i b l e s u b s t i t u t i o n r e a c t i o n s : f o r each s u b s t i t u e n t on the r i n g system, the s u b s t i t u e n t i s removed and the e l e c t r o n i c energy i s c a l c u l a t e d . Then f o r each f r e e p o s i t i o n on t h i s modified s t r u c t u r e the f r e e atom i s temporarily removed, breaking up the conjugated system, and the energy again c a l c u l a t e d . The d i f ference between these energies i s the l o c a l i z a t i o n f o r attack at that f r e e p o s i t i o n . This i s computed f o r anion, r a d i c a l , and c a t i o n i c intermediates when appropriate, and f o r a l l f r e e p o s i t i o n s . I f the p o s i t i o n having the lowest l o c a l i z a t i o n energy i s the same p o s i t i o n as the o r i g i n a l s u b s t i t u e n t X then i t i s c o r r e c t to i n f e r that k i n e t i c attack of X w i l l occur i n the c o r r e c t p o s i t i o n . See examples i n equations 9 and 10. HAREHM can t r e a t s t r u c t u r e s with m u l t i p l e conjugated systems. Within an aromatic s u b s t i t u t i o n transform the terms RLENERGY, NLENERGY, and ELENERGY r e f e r to the r a d i c a l , nucleop h i l i c , and e l e c t r o p h i l i c l o c a l i z a t i o n energies r e s p e c t i v e l y . Several types of statements use these terms, e.g.: IF ELENERGY ON ATOM 1 BETTER THAN ATOM 2 ELSE KILL
(5)
IF ELENERGY ON ATOM 4 BETTER THAN (1) THEN ADD
(6)
20
IF NLENERGY ON ATOM 4 BETTER THAN (1) THEN BEGIN ADÇ 40 CONDITIONS NUCLEOPHILIC AND BASIC DONE ELSE BEGIN IF ELENERGY ON ATOM 4 BETTER THAN (1) BEGIN ADD 20 CONDITIONS ACIDIC AND ELECTROPHILIC DONE DONE
(7)
THEN
IF ELENERGY ON ATOM 4 BETTER THAN (1) THEN CORRECT
(8)
Statement 5 shows comparison of e l e c t r o p h i l i c l o c a l i z a t i o n energy (LE) on 2 atoms, whereas i n statement 6 the LE of atom 4 i s compared to the LE's of each atom i n the set of atoms ( 1 ) . The compound statement 7 s e l e c t s n u c l e o p h i l i c conditions i f p o s s i b l e , otherwise e l e c t r o p h i l i c c o n d i t i o n s . In statement 8, "CORRECT"
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
112
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
adjusts the p r i o r i t y upward or downward according to whether the LE i s lower or higher than the LE of benzene. The magnitude of change i s p r o p o r t i o n a l to the d i f f e r e n c e between the two LE's. Factors such as s t e r i c e f f e c t s on thermodynamics are described by other statements i n the transforms. The examples below i l l u s t r a t e c o n t r o l by e l e c t r o n i c e f f e c t s . The numbers i n equation 10 show c a l c u l a t e d LE v a l u e s .
The accuracy of the HMO method decreases as the number of heteroatoms i n the r i n g system i n c r e a s e s , but the same can be s a i d f o r p r e d i c t i o n s by chemists i n complex cases. The HMO method has s a t i s f i e d our need f o r a r a p i d , f l e x i b l e , general approach. 4.3 RECOGNITION OF INTERFERING FUNCTIONALITY AND PROTECTING GROUPS. An important p a r t of e v a l u a t i n g the a p p l i c a b i l i t y of a given transform i s determining whether the c o n d i t i o n s r e q u i r e d f o r the transform w i l l a l s o cause unwanted r e a c t i o n s elsewhere i n the molecule. T h i s i s i n f a c t an important s e l e c t i o n r u l e which helps f u r t h e r reduce the p o t e n t i a l s o l u t i o n space and prevent some p o s s i b l e orderings of transforms. However, i t i s too harsh simply to k i l l any transform i n which there i s an i n t e r f e r e n c e when that i n t e r f e r e n c e could be avoided by modifying the group. We have also found that i t i s d e s i r e able f o r the user to be n o t i f i e d of such i n t e r f e r e n c e s , because he may not know the exact c o n d i t i o n s and can not evaluate i n t e r ferences e a s i l y simply l o o k i n g at the p r e c u r s o r g i v e n the 9
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
wiPKE ET A L .
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
name of a
SECS:
Strategy
and
113
Planning
transform.
Thus, i n 1973 we incorporated p r o t e c t i n g groups i n t o SECS 2.0, increased the number of f u n c t i o n a l groups recognized, and d i f f e r e n t i a t e d between f u n c t i o n a l group environments. (See Table I f o r a l i s t i n g of the s e n s i t i v i t i e s of d i f f e r e n t i a t e d f u n c t i o n a l groups to d i f f e r e n t r e a c t i o n conditions.) Each f u n c t i o n a l group i s c l a s s i f i e d i n t o e x a c t l y one category which best describes i t . The d e f i n i t i o n of r e a c t i o n conditions was a l s o changed i n SECS to cover more types of c o n d i t i o n s . There are many ways to desc r i b e r e a c t i o n c o n d i t i o n s , one can describe p h y s i c a l parameters such as temperature, time, pressure, a c i d i t y , redox p o t e n t i a l , p o l a r i t y , e t c . , and l e t a given r e a c t i o n be represented by a sum of such parameters; or one can d e f i n e reagent prototypes; or one can d e f i n e equivalences between reagent prototypes and p h y s i c a l parameters. The current categories i n SECS 2.4 are shown i n Table I I . Each category i s subdivided i n t o s l i g h t , normal, and strong, corresponding to the continuous p h y s i c a l p r o p e r t i e s (represented i n Table I by -, = and * r e s p e c t i v e l y ) , but f o r some chemical categories the s u b d i v i s i o n s represent d i f f e r e n t r e a c t i o n types. A group s e n s i t i v e to s l i g h t l y a c i d i c conditions i s u s u a l l y more s e n s i t i v e to s t r o n g l y a c i d i c condit i o n s , but a group s e n s i t i v e to s l i g h t l y halogenating c o n d i t i o n s may be s t a b l e to s t r o n g l y halogenating conditions because the two halogenations are m e c h a n i s t i c a l l y completely d i f f e r e n t . A l l r e a c t i o n conditions used i n a r e a c t i o n up to the next workup (one pot) are s p e c i f i e d i n one CONDITION statement i n ALCHEM. Thus i f there i s more than one step (workup) i n a transform, there would be separate c o n d i t i o n statements f o r each s e t of simultaneous c o n d i t i o n s . The reason i s that a f u n c t i o n a l group might be s t a b l e to every s i n g l e r e a c t i o n step, but not to an attack of a l l conditions at the same time. The current v e r s i o n takes the c o n d i t i o n categories as independent and a d d i t i v e , so one s p e c i f i e s a l l the pure p h y s i c a l p r o p e r t i e s (pH, temp., solvent) and only the r e l e v a n t chemical p r o p e r t i e s . The b a s i c chemical information used f o r f u n c t i o n a l groups, p r o t e c t i v e groups, s e n s i t i v i t i e s and r e a c t i o n c o n d i t i o n s i s represented by binary s e t s , and manipulated with l o g i c a l s e t operat i o n s , AND, OR, XOR, and SUBSET. A binary s e t containing the current f u n c t i o n a l groups i s e s t a b l i s h e d during p e r c e p t i o n of the t a r g e t : 3
GROUPS
1001000000
(group types)
Every b i t p o s i t i o n corresponds to one of the categories of funct i o n a l group, 1 means present, 0 means absent. Every transform contains a c o n d i t i o n statement. In the transform corresponding to the s y n t h e t i c r e d u c t i o n of a ketone shown below, the condit i o n s are a l t e r n a t i v e s from which only one s e t of conditions i s
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
CARBOXYLIC ACID DECARBOXYLASE ACID ACID ANHYDRIDE ACID HAL I DE PHENOL ENOL BENZYL,ALLYL ALCOHO PRIM ALCOHOL SEC ALCOHOL TERT ALCOHOL AROM ALDEHYDE A, Β UN S AT ALDEHYDE CH 3,R CH2 ALDEHYDE R2 CH2 ALDEHYDE R3C ALDEHYDE PRIM AMIDE SEC AMIDE TERT AMIDE AROM AMINE ENAMINE BENZYL,ALLYL AMINE PRIM AMINE SEC AMINE TERT AMINE C=C AROMATIC C=C CONJ TO C=C C=C CONJ TO W-GROUP C=C ISOLATED CH3,RCH2 ESTER
Table I
* * * =
*
*
= *
*
* * *
=* * =
*
*
* =*
= * *
=* * * =*
-=* -=*
ACI
=* -=* -=*
BAS
=
* * * *
*
*
= *
* *
=* =*
=* =*
* =*
* * =* =*
RED
*
-=* * * * =* =* *
*
*
* * = * *
-=* -=* -=*
-=*
0X1
_
=
=
=
*
*
=*
HOT
*
=* * -=* -=* * -=* -=*
-=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* =*
OMT
* =* =*
* * * *
PHO
* *
SOL
=* =*
=* *
HAL
*
* *
*
* * =* =*
NUC
=* =* = * = * _ = =* =*
=* =*
=* =* =* =* =*
=*
-=*
HYD
S e n s i t i v i t i e s of F u n c t i o n a l Groups to Reaction Conditions
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
* *
*
-=* -=*
=* =*
=*
=* =* =* =* =* =* * * * * *
=* =*
ELE
g g g "
w
g > | H α § £ 2
α
g £
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
R2 CH ESTER OTHER ESTER Α-HALOGEN ETHER PHENOL ETHER ENOL ETHER BENZYL, ALLYL ETHER ALKYL ETHER ARYL HAL I DE VINYL HALIDE BENZYL, ALLYL HALIDE PRIM HALIDE SEC HALIDE TERI HALIDE ARYL KETONE A,Β UNSAT KETONE CH3,RCH2 KETONE R2 CH KETONE R3C KETONE QUINONE NITRO NI TRILLE ALKYNE-H ALKYNE-R EPOXIDE CYCLOPROPANE PHENOL,ENOL ESTER-X BENZYL,ALLYL ESTER-X TERT ESTER-X SEC,PRIM ESTER-X IMINE
IMINE-Z
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60
=* * -=* -=* * =* =*
=*
*
* *
=* * * =* * -=* -=* * =* =*
*
-=* * * -=*
*
*
-=* * * -=*
*
*
* * *
ACI
BAS
=
*
* *
*
=*
*
ΟΧΙ
*
*
*
*
* *
*
*
=* =* =* =* * -=* -=* -=*
=
RED
*
*
* =*
* *
*
*
=*
*
=* * =* -=* =* =* =* =* * =* -=* * = * * * =* =* * =* -=* * * =*
=*
=*
=* -=* * -=* -=* =* *
* =* * * *
=*
=*
=*
=*
=*
HYD
*
SOL
=*
=*
PHO
=* *
*
*
OMT
*
=
=
HOT
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
=*
=* =* * *
* *
HAL
* =*
=* =*
=*
*
=*
= =
-=* -=* -=* -=* =
* *
*
NUC
* *
* *
* * * * *
*
*
ELE
Κ
ι—
^ !" S. c§
^ t*i 3 ^ 3" | §§ &
w w H > Γ"
g
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
116
COMPUTER-ASSISTED
STRONGLY BASIC SLIGHTLY
pH(pKa)>18 12 - 18 8-12
SLIGHTLY ACIDIC STRONGLY
2 - 6 -2 - +2 < -2
STRONGLY OXIDIZING SLIGHTLY
KMNOi*, CR03 JONES REAGENT MN02, MILD SELECTIVE
SLIGHTLY REDUCING STRONGLY
ZN/H+, MILD SELECTIVE NABHi+, ALKYL HYDRIDES LIALHI+
STRONGLY COLD SLIGHTLY
-100 -19
SLIGHTLY HOT STRONGLY
30 101
ORGANIC SYNTHESIS
Table I I D e f i n i t i o n s of Reaction Conditions
< -100 - -20 - +10 >
100 200 200
SLIGHTLY ORGANOMETALLIC STRONGLY
ORGANOCADMIUM AND ZINC ORGANOMAGNE SIUM ORGANOLITHIUM
SLIGHTLY PHOTOCHEMICAL STRONGLY
UNSENSITIZED 350 - 700 NM UNSENSITIZED 250 - 700 NM SENSITIZED
PROTIC APROTIC SLIGHTLY HYDROGENATING STRONGLY
DEACTIVATED CATALYSTS MILD CATALYSTS STRONG CATALYSTS OR/AND HIGH PRESSURE
SLIGHTLY HALOGENATING STRONGLY
N-BROMOSUCCINIMID IN CCLi+ HALOGEN WITHOUT CATALYST OR R-X/SNCLi+ HALOGEN WITH ACID OR BASE CATALYSIS
SLIGHTLY NUCLEOPHILIC STRONGLY
SUBSTITUTION AT SATURATED CARBON AND MICHAEL ADDITION AT CARBONYLS SUBSTITUTION AT CARBOXYLS (ADDITION-ELIMINATION)
SLIGHTLY ELECTROPHILIC STRONGLY
ALKYLATION WITH R-X ACYLATION WITH RCOX, (RCO) 0, TSCL LEWIS ACID CATALYZED ALKYLATION OR ACYLATION 2
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
WIPKE E T A L .
SECS:
Strategy
and
117
Planning
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
selected. CH2 => C=0 WOLFF-KISHNER, THIOKETAL DESULFURIZATION, OR CLEMMENSON REDUCTION OF A KETONE KETHYDROGENOLYSIS KETONE PATH 1 PRIORITY 50 CHARACTER INTRODUCES GROUP IF ATOM 1 IS NOTE ATTACHED TO 2 HYDROGENS THEN KILL IF HETEROATOM IS ALPHA TO ATOM 1 THEN KILL IF ATOM 1 IS ALLYLIC THEN SUBT 70 CONDITIONS STRONGLY BASIC AND REDUCING OR CONDITIONS ACIDIC AND REDUCING OR CONDITIONS SLIGHTLY HYDROGENATING AND NUCLEOPHILIC ADD 0 OF ORDER 2 TO ATOM 1 END CONDIT compares the s e t of conditions f o r the r e a c t i o n (or r e a c t i o n step) with the s e t of s e n s i t i v i t i e s of each group to those c o n d i t i o n s , leading to a s e t of i n t e r f e r i n g groups (IGRPS) (acid i n t e r f e r e s ) : SENS (acid) SENS (ketone)
1000100010 0010001000
(condition
types)
COND
0001100000
(condition
types)
GROUPS IGRPS
1001000001 0001000000
(group types)
The program now looks f o r p r o t e c t i v e groups based on the f o l lowing information: NAME PROGR STABIL INTRO REMOV DEDUC
2-0XAZ0LINE 0001100000 1011010001 0000000001 0100000000 35
(group types) ( c o n d i t i o n types) ( c o n d i t i o n types) ( c o n d i t i o n types) ( p r i o r i t y change)
The group-set PROGR i n d i c a t e s the group types that can be protected by t h i s p a r t i c u l a r p r o t e c t i v e group, STABIL contains the r e a c t i o n conditions to which the p r o t e c t i v e group i s s t a b l e . INTRO and REMOV contain the r e a c t i o n conditions needed f o r i n t r o d u c t i o n and removal of the p r o t e c t i v e group. DEDUC i s a measure i n p r i o r i t y value u n i t s of the expenditure needed f o r an a p p l i c a t i o n of that c e r t a i n p r o t e c t i v e group, based on the number of steps, losses i n y i e l d , and d i f f i c u l t y i n experimental procedure. The algorithm f o r choosing the proper p r o t e c t i v e group i s based on the f o l l o w i n g c r i t e r i a :
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
118 a) b) c) d)
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
e)
the PG must f i t the FG the PG must be s t a b l e to r e a c t i o n conditions ( a l l steps) the PG should simultaneously p r o t e c t as many as p o s s i b l e of the i n t e r f e r i n g groups the PG should p r o t e c t none or as few as p o s s i b l e of the n o n - i n t e r f e r i n g groups the p r i o r i t y l o s s should be minimized
I f there remain any unprotected i n t e r f e r i n g groups, an a d d i t i o n a l penalty i s imposed which may k i l l the transform i f the p r i o r i t y c u t o f f i s high enough. I f the transform was s u c c e s s f u l i n a l l p a r t s , the names of the used p r o t e c t i v e groups are s t o r e d with the manipulation i n s t r u c t i o n s f o r the generation of the precursor from the current t a r g e t . The atoms bearing a p r o t e c t i v e group are marked i n the connection t a b l e . When the precursor i s d i s played a (P) appears near the s t a r t i n g atom of the f u n c t i o n a l group along with the name of the s p e c i f i c p r o t e c t i n g group (the name can be suppressed by the u s e r ) . (See equation 4a.) The p r o t e c t i n g groups are a u t o m a t i c a l l y removed from the precursor when i t i s processed as the current t a r g e t . Thus, the user i s n o t i f i e d g r a p h i c a l l y of the f a c t that i n t h i s given transform c e r t a i n groups must be p r o t e c t e d , but the d e c i s i o n of when to put them on or take them o f f must be done a f t e r the e n t i r e sequence has been generated. At that time one would p o s s i b l y s e l e c t a d i f f e r e n t group which s a t i s f i e s the needs of s e v e r a l transforms and minimizes operations. Work i s under way to optimize p r o t e c t i o n f o r sequences of r e a c t i o n s and to generate subgoals f o r f u n c t i o n a l group i n t e r c o n v e r s i o n f o r p a r t i c u l a r l y troublesome groups, using p r i n c i p l e s of l a t e n t and equivalent groups. Mutually r e a c t i v e f u n c t i o n a l group combinations l i k e a c i d h a l i d e and a l c o h o l i n the same precursor are recognized i n the precursor e v a l u a t i o n r o u t i n e . The user s p e c i f i e s to EVAL whether such precursors should be d e l e t e d . 5.
TREE REDUCTION THROUGH SYMMETRY
In the absence of symmetry information, a p p l i c a t i o n of a transform to a s t r u c t u r e possessing s e v e r a l elements of symmetry produces redundant p r e c u r s o r s . The precursors must be created, the redundancies recognized by graph-matching or c a n o n i c a l naming, and f i n a l l y the d u p l i c a t e precursors must be d e l e t e d . With symmetry information, i t i s p o s s i b l e to avoid generating such redundant p r e c u r s o r s . SECS perceives the complete s t e r e o chemical graph isomorphism groups of the target s t r u c t u r e and uses t h i s symmetry group i n applying a transform so that the transform i s a p p l i e d i n a l l p o s s i b l e unique ways without r e dundancy. The synthesis of a s t e r a n e ( f i g u r e 7) i l l u s t r a t e s how r e c o g n i t i o n of molecular symmetry reduces the number of 11
2 5
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
wiPKE ET AL.
pathways
SECS:
and
Planning
119
generated.
Figure
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
Strategy
7.
Redundancy
from molecular
symmetry
In a d d i t i o n to u t i l i z i n g molecular symmetry, i t i s a l s o necessary to u t i l i z e transform symmetry where i t e x i s t s . A transform possesses a symmetry element i f the SUBSTRUCTURE possesses the element and i t i s preserved by the MANIPULATION and SCOPE/LIMITATIONS statements, e.g., the D i e l s A l d e r transform:
3
This symmetry i s represented i n the s u b s t r u c t u r e , "C=C-C-C-C-C@1<2,1,6,5,4,3>/", by the operator enclosed w i t h i n angle brackets and prevents SECS from f l i p p i n g the SUBSTRUCTURE over on the target and applying the transform again, which would produce an i d e n t i c a l precursor. SECS a l s o uses the stereochemical graph isomorphism group w i t h i n a transform to c o r r e c t l y determine i f atoms or groups are chemically e q u i v a l e n t : "IF ATOM 1 AND ATOM 2 ARE EQUIVALENT THEN... Complete d e t a i l s of the symmetry r e c o g n i t i o n algorithm and i t s a p p l i c a t i o n are given e l s e w h e r e . ^ 11
6.
TREE REDUCTION THROUGH PRUNING
Precursors are screened independently by the EVAL module to e l i m i n a t e i l l e g a l s t r u c t u r e s which might have been created by a f a u l t y transform, and s t r u c t u r e s which are unstable or otherwise undesireable. The e v a l u a t i o n c r i t e r i a l i s t e d below can be enabled or d i s a b l e d by the user through the EVAL command. 1. 2. 3. 4. 5.
Bredt r u l e v i o l a t i o n Antiaromatic r i n g Cumulene T r i p l e bond i n s m a l l r i n g Trans o l e f i n i n s m a l l r i n g
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
120 6. 7.
8. 9. 10. 11.
ORGANIC SYNTHESIS
Trans bridged r i n g system Trans fused 3-membered r i n g Valency v i o l a t i o n D i - i o n of same charge Four-membered r i n g Unstable group combinations
The previous s e c t i o n s d e a l t with reducing the s i z e of synt h e s i s tree generated by i n c r e a s i n g the accuracy of p r e d i c t i o n s , i . e . , by making transforms more s e l e c t i v e i n t h e i r output. We now turn our d i s c u s s i o n to how the program s e l e c t s which t r a n s forms to apply.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
7. CONTROLLING TREE GROWTH THROUGH STRATEGIES AND PLANS FOR TRANSFORM SELECTION The o b j e c t i v e of the SECS program i s to f i n d sequences of transforms which not only are chemically accurate, but a l s o lead to "simpler" s t r u c t u r e s and form an " e f f i c i e n t " s y n t h e s i s . We have f o r many reasons s p e c i f i c a l l y avoided b i a s i n g SECS with preprogrammed sequences or t h e i r equivalent binary d e c i s i o n t r e e s . Such b i n a r y d e c i s i o n trees produce r i g i d orderings of transforms, must be updated as new transforms are discovered, and are n e c e s s a r i l y incomplete s i n c e they are manually programmed. Instead, our approach i s to have SECS dynamically generate i t s own s y n t h e t i c sequences guided by i t s current s t r a t e g i e s using i t s current l i b r a r y of transforms. Strategies are high l e v e l p r i n c i p l e s of molecular c o n s t r u c t i o n and m o d i f i c a t i o n , and are independent of the transform l i b r a r y . When a strategy i s a p p l i e d to a t a r g e t s t r u c t u r e , goals are generated which describe the s t r u c t u r a l changes necessary to implement the s t r a t e g y . Goals are a l s o independent of the transform l i b r a r y . The n u l l s t r a t e g y generates no goals, hence s e l e c t s no t r a n s forms . 7.1 THE OPPORTUNISTIC STRATEGY i s the opposite of the n u l l s t r a t e g y , f o r i t s e l e c t s every transform which " f i t s " the t a r g e t , i . e . , s e l e c t i o n by a p p l i c a b i l i t y , regardless of whether the precursor i s simpler than the t a r g e t . This generates a l l " l e g a l moves" i n chess terminology. A major problem with t h i s s t r a t e g y i s that the l a r g e c l a s s of f u n c t i o n a l group interchange (FGI) and f u n c t i o n group i n t r o d u c t i o n (INTRO) transforms, are n e a r l y always a p p l i c a b l e and are thus f r e q u e n t l y a p p l i e d , but do not normally lead to simpler p r e c u r s o r s . F G I s are the equivalent of the s u b s t i t u t i o n operators i n GPS. 1
27
7.2 GOAL-DIRECTED SELECTION of transforms operates q u i t e d i f f e r e n t l y — t h e f i r s t c o n s i d e r a t i o n becomes "does t h i s t r a n s form achieve one of my goals?", i . e . , i f i t were a p p l i e d would i t be a move i n the d e s i r e d d i r e c t i o n ? Goals define the " d e s i r e d
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
wiPKE E T A L .
SECS:
Strategy
and
"Planning
121
11
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
direction. A goal i s the d i f f e r e n c e between the current t a r g e t s t r u c t u r e and the d e s i r e d s t r u c t u r e or d e s i r e d s u b s t r u c t u r e . Transforms are then f i r s t s e l e c t e d as to whether they are r e l e v a n t to the g o a l ( s ) , and l a t e r are examined f o r a p p l i c a bility. Goals provide a sense of d i r e c t i o n and a j u s t i f i c a t i o n f o r expending computational resources. This enables a s e l e c t i v e e x p l o r a t i o n of the space of syntheses, and promises a higher p r o p o r t i o n of e f f i c i e n t syntheses among the s o l u t i o n s generated, the t r a d e o f f being l o s s of completeness and p o s s i b l e l o s s of s o l u t i o n s . However, with good goals t h i s l o s s i s n e g l i g i b l e . As the number of chemical transforms a v a i l a b l e to the program i n c r e a s e s , the b e n e f i t s of g o a l - d i r e c t e d s e l e c t i o n of transforms become even more obvious. 7.3 GOALS ARE CREATED BY STRATEGIES. An important o b j e c t i v e of the SECS p r o j e c t i s to determine what c o n s t i t u t e good s t r a t e g i e s f o r s y n t h e s i s . Many s t r a t e g i e s are s e l f evident but others are yet to be discovered. Our fundamental aim i s to obtain simple p r e c u r s o r s . Network o r i e n t e d s t r a t e g i e s focus on s i m p l i f i c a t i o n of the molecular s k e l e t o n , i n p a r t i c u l a r the r i n g systems. Fused r i n g systems are considered simpler than bridged or s p i r o systems owing to t h e i r greater ease of p r e p a r a t i o n . A good s y n t h e t i c s t r a t e g y i s t h e r e f o r e to create the complex r i n g system from a simple fused system. T r a n s l a t e d i n t o the backward d i r e c t i o n , t h i s means t r y to break those bonds which l e a d to fused or a c y c l i c r i n g systems with a minimum of appendages. The f i r s t statement of t h i s s t r a t e g y appeared i n Corey's l o n g i f o l e n e synthesis. Thus bonds 1-7, and 3-4 are s t r a t e g i c , but so i s 2 5
the p a i r of bonds {3-4,1-2}. In place of a set of s t r a t e g i c bonds used by other programs, SECS represents t h i s s t r a t e g y as a l o g i c a l l y s t r u c t u r e d goal l i s t shown above. Other s t r a t e g i e s focus on s i m p l i f y i n g the skeleton by r e connecting appendages to form a new r i n g or by j o i n i n g opposite s i d e s of l a r g e r i n g s to form common r i n g s . These s t r a t e g i e s
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
122
COMPUTER-ASSISTED ORGANIC SYNTHESIS
produce goals such as "MAKE 1-6". Note t h i s goal can not be r e presented as a s e t o f bonds s i n c e the bond 1-6 does not y e t exist. Symmetry based s t r a t e g i e s are i n d i c a t e d i n the syntheses of 3-carotene which constructed bonds and ID,28 ^ and
n
(
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
a
l i s t shown. These examples i l l u s t r a t e how s t r a t e g i e s create goals and why the goals are s t r u c t u r e d . There are of course many v a r i e d s y n t h e t i c s t r a t e g i e s , r e l a t i n g to many s t r u c t u r a l a t t r i b u t e s . The goal l i s t i s the common denominator that r e l a t e s these s t r a t e g i e s to s e l e c t i o n of transforms. 7.4 EXPLICIT GOAL REPRESENTATION. In SECS, goals are generated dynamically and represented e x p l i c i t l y , i n c o n t r a s t to being represented i m p l i c i t l y by a s t a t i c programmed procedure. E x p l i c i t r e p r e s e n t a t i o n of goals has the advantages that goals can be rearranged i n p r i o r i t i e s , the order of goal s a t i s f a c t i o n does not need to be the same as the order i n which the goals were created, and the chemist can see what the goals are and can modify them i n t e r a c t i v e l y .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
WIPKE ET
AL.
SECS:
Strategy
and
The g o a l l i s t i s a general l i s t r e n t l y four types of e n t i t i e s :
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
1. 2. 3. 4.
Phnning
123
s t r u c t u r e c o n t a i n i n g cur-
L o g i c a l connectives F u n c t i o n a l group m o d i f i c a t i o n s Structural modifications Attention f o c i
The l o g i c a l connective can be OR, XOR, or AND, together with NOT, and contains an a c t i o n which w i l l occur i f the goal beneath i t i s achieved. C u r r e n t l y the a c t i o n i s a m o d i f i c a t i o n to the PRIORITY value or KILL. F u n c t i o n a l group m o d i f i c a t i o n s are requests f o r interchange or i n t r o d u c t i o n of a f u n c t i o n a l group at a p a r t i c u l a r l o c a t i o n on the t a r g e t , or the converse, to leave a group unchanged. S t r u c t u r a l m o d i f i c a t i o n s c u r r e n t l y s t a t e bonds to be made or broken or the converse. Attention f o c u s s i n g i n s t r u c t i o n s r e q u i r e a transform to use s p e c i f i e d atoms, bonds, or groups or the converse, don't use the s p e c i f i e d item. A g o a l i s a l o g i c a l connective j o i n i n g a l i s t of one or more of the other e n t i t i e s which may i n c l u d e another g o a l . The l o n g i f o l e n e example would be goal 1: g o a l 2:
(OR, BREAK 1-7, BREAK 3-4, GOAL 2) (AND, BREAK 3-4, BREAK 1-2).
The s t r a t e g y module can be i n t e r r u p t e d p r i o r to s e l e c t i o n of transforms, but a f t e r the r e l e v a n t s t r a t e g i e s have s e t up t h e i r goals. The g o a l l i s t can be p r i n t e d out, new goals added, or e x i s t i n g goals modified or deleted by the user. This i n t e r a c t i v e c a p a b i l i t y to add goals f a c i l i t a t e s t e s t i n g s t r a t e g i e s and allows the user to supply h i s / h e r own s t r a t e g i e s i f they are not i n c l u d e d i n the SECS s t r a t e g y module. The goal l i s t can be used to s e l e c t only those transforms s a t i s f y i n g the goals or i t can be used to simply r a i s e the p r i o r i t i e s of those s a t i s f y i n g the g o a l s . Note that the goals may not r e f e r by name to any transform, i n s t e a d goals are s t r i c t l y defined i n terms of the target s t r u c t u r e . 7.5 TRANSFORM SELECTION by relevance to the goal l i s t r e quires that the goal l i s t i n t e r p r e t o r have some knowledge of what kinds of things a transform might do. This i s described g e n e r a l l y by the transform CHARACTER (see s e c t i o n 3) which cont a i n s any number of the phrases given below. Thus a transform whose s o l e character was ALTERS GROUP would not be considered r e l e v a n t to a goal of breaking a carbon-carbon bond. The t r a n s form character d e s c r i p t i o n s are c o l l e c t e d together i n a d i r e c t o r y for r a p i d determination of p o s s i b l e relevancy to the g o a l s . I f the character i s s u i t a b l e then the SUBSTRUCTURE of the transform i s examined s i n c e i t a l s o helps d e f i n e the character of the transform. At t h i s p o i n t i f the transform s t i l l appears to be
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
124
COMPUTER-ASSISTED ORGANIC
SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
BREAKS CHAIN, RING, 3RING, 4RING, or 6RING MAKES BOND, 3RING, 4RING INCREASES or DECREASES CHAIN EXPANDS or CONTRACTS RING INTRODUCES or REMOVES GROUP, STEREO, or AROMATICITY ALTERS GROUP INVERTS/RETAINS STEREO MIGRATES BOND
r e l e v a n t , the transform i s checked f o r a p p l i c a b i l i t y , i . e . , the SUBSTRUCTURE i s compared to the t a r g e t . I f the transform " f i t s , " then i t i s i n t e r p r e t e d i n d e t a i l as described i n s e c t i o n 3.1. The transform manipulation i n s t r u c t i o n s are compared to the goal l i s t to determine i f the transform a c t u a l l y was r e l e v a n t , and app r o p r i a t e a c t i o n s are taken. The transform may be aborted, i n which case no precursor i s ever generated. When the t a r g e t molecule contains elements of symmetry the goal i n t e r p r e t e r a l s o checks to see i f the a c t i o n of a transform i s equivalent by symmetry to a s p e c i f i e d g o a l although the a c t u a l atom numbers may be d i f f e r e n t . 7.6 SUBGOAL GENERATION. I f the SUBSTRUCTURE does not " f i t " the t a r g e t , y e t the transform appears r e l e v a n t , a subgoal i s generated to c o r r e c t the mismatch so the transform w i l l " f i t . " The technique of generating subgoals as a r e s u l t of mismatch was even used i n the very f i r s t s y n t h e s i s program. The d i f f e r e n c e between the r e q u i r e d SUBSTRUCTURE and the t a r g e t i s used to create a subgoal, normally of the f u n c t i o n a l group m o d i f i c a t i o n type. This i s an example of decomposing a problem i n t o subproblems, and approaching each subproblem by Means-Ends a n a l y s i s . ' The r e s u l t of t h i s g o a l - d i r e c t e d approach i s that the troublesome FGI and INTRO transforms are only a p p l i e d when they are needed i n order to enable a p p l i c a t i o n of another transform r e l e v a n t to the goal l i s t . This i s an extremely e f f i c i e n t method f o r reducing the number of poor pathways generated without l o s i n g any good pathways. Subgoals are i d e n t i c a l i n form and f u n c t i o n as g o a l s , but are kept on a separate l i s t and become a c t i v e only a f t e r the o r i g i n a l g o a l l i s t r e l i n q u i s h e s c o n t r o l . Subgoals are not created e x p l i c i t l y by transforms, thus the transform w r i t e r need never consider what p o s s i b l e subgoals might be, or where they may be invoked. This i s another aspect of our e f f o r t s to maintain complete s e p a r a t i o n between s t r a t e g i e s and transforms. 6
2
8.
CONCLUSION
We have discussed two important methods used i n SECS f o r improving the chemical accuracy and e f f i c i e n c y of syntheses generated i n the s y n t h e s i s t r e e . The f i r s t method i n v o l v e s improving the accuracy of chemical i n f e r e n c e through a n a l y s i s of
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
5.
wiPKE E T AL.
SECS:
Strategy
and
Planning
125
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
various models and through accurate scope and l i m i t a t i o n s i n the transforms. As i l l u s t r a t i o n s we described the e v a l u a t i o n o f s t e r i c e f f e c t s , e l e c t r o n i c d i r e c t i n g e f f e c t s , and i n t e r f e r i n g f u n c t i o n a l groups i n c l u d i n g p r e s c r i p t i o n of p r o t e c t i n g groups. The second method i n v o l v e s s e l e c t i o n o f transforms by relevance t o goals created by s y n t h e t i c s t r a t e g i e s r a t h e r than s e l e c t i o n by a p p l i c a b i l i t y . This was made p o s s i b l e by the development o f a uniform r e p r e s e n t a t i o n of goals as dynamic l o g i c a l l i s t s t r u c t u r e s d e s c r i b i n g d e s i r e d changes i n the t a r g e t molecule, and by i n c l u d i n g i n each transform a d e s c r i p t i o n of the t r a n s form's character. Of course i n v o l v i n g the user-chemist i n dec i s i o n s concerning which precursor t o process next i s another important method, fundamental to our i n t e r a c t i v e approach. Add i t i o n a l l y we s t r e s s e d the importance of separating s t r a t e g i e s from transforms, not only to s i m p l i f y the a d d i t i o n of new t r a n s forms, but a l s o t o minimize b i a s and maximize c r e a t i v i t y i n the s y n t h e t i c sequences generated. ACKNOWLEDGEMENT. This work was supported i n p a r t by NIH grant RR00578, and by an a l l o c a t i o n o f the SUMEX-AIM resource a t Stanford (RR00785, J . Lederberg, p r i n c i p a l i n v e s t i g a t o r ) . The authors a l s o wish to thank IBM and Merck, Sharp and Dohme f o r f e l l o w s h i p s to G.S., the CNRS f o r a f e l l o w s h i p to F.C., the Deutsche Forschungs-gemeinschaft f o r a f e l l o w s h i p t o H.B., and Sandoz, Basle f o r support of W.S. We a l s o g r a t e f u l l y acknowledge the c o n t r i b u t i o n s of Dr. S. Krishnan to the goal l i s t e d i t o r and i n t e r p r e t e r and of P r o f . C l a r k S t i l l t o the f i r s t v e r s i o n of the goal l i s t i n t e r p r e t e r .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
126
References 1.
D.B. Lenat, "AM: An A r t i f i c a l Intelligence Approach to Discovery i n Mathematics as Heuristic Search," Ph.D. Thesis, Computer Science Dept., Stanford University, 1976.
2. W.T. Wipke, "Computer Planning of Research i n Organic Chemistry," Proceedings of Third I n t l . Conf. on Computers i n Chemical Education, Research and Technology, Plenum Press, July 1976.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
3. W.T. Wipke, "Computer-Assisted Three-Dimensional Synthetic Analysis," i n Computer Representation and Manipulation of Chemical Information, ed. W.T. Wipke, J . Wiley, 1974, pp. 147-174. 4. T.M. Gund, P.v.R. Schleyer, P.H. Gund, and W.T. Wipke, J . Am. Chem. Soc., 97, 743 (1975). 5.
This i s i n contrast to the FIEM of Ugi described elsewhere i n this volume.
6. E.J. Corey and W.T. Wipke, Science, 166, 178 (1969). 7.
E.J. Corey, W.T. Wipke, R.D. Cramer I I I , and W.J. Howe, J. Am. Chem.Soc.,94, 421, 431 (1972).
8. R.J. Feldmann, "Interactive Graphical Chemical Structure Searching," i n Computer Representation and Manipulation of Chemical Information, ed. W.T. Wipke, J . Wiley, 1974, pp. 55-82. 9. W.T. Wipke and T.M. Dyott, J . Chem. Info. and Computer S c i . , 1 5 , 140 (1975). 10.
W.T. Wipke and T.M. Dyott, J . Am. Chem.Soc.,96,4825, 4834 (1974).
11.
W.T. Wipke and H. Braun, submitted for publication.
12. W.T. Wipke, P. Gund, T.M. Dyott, and J.G. Verbalis, i n preparation. 13.
W.T. Wipke and P. Gund, J . Am. Chem.Soc.,96,299 (1974); 98, 8107 (1976).
14.
W.T. Wipke, T.M. Dyott, C. preparation.
Still,
and P. Friedland, i n
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005
5. WIPKE ET AL.
SECS: Strategy and Planning
127
15.
W.T. Wipke and P.H. Gund, J . Am. Chem. Soc., 96, 299 (1974); 98, 8107 (1976).
16.
W.T. Wipke and P. Friedland, unpublished.
17.
E.J. Corey, W.J. Howe, and D.A. Pensak, J . Am. Chem. Soc., 96, 7724 (1974).
18.
H.H. Jaffe, Chem. Rev., 53, 191 (1953).
19.
J.B. Hendrickson, J . Am. Chem. Soc., 93, 6854 (1974).
20.
G.W. Wheland, J . Am. Chem. Soc., 64, 900 (1942).
21.
Physical Methods in Heterocyclic Chemistry I, Academic Press, N.Y., 1963, p. 140.
22.
A. Streitweiser, J r . , Molecular Orbital Theory for Organic Chemists, J . Wiley, N.Y., 1961, p. 117.
23.
L. Velluz, G. Nominé, J . Mathieu, Angew. Chem., 72, 725 (1960).
24.
W.T. Wipke, T.F. Brownscombe, G. Birkhead, and P. Gund, i n preparation.
25.
E.J. Corey, M. Ohno, P.A. Vatakencherry and R.B. Mitra, J. Amer. Chem. Soc., 86, 478 (1964).
26.
U. Biethan, U.v.Gizycki, and H. Musso, Tetrahedron Letters, 1477 (1965).
27.
G. Ernst and A. Newell, GPS: A Case Study i n Generality and Problem Solving, Academic Press, N.Y. 1969.
28.
P. Karrert and C.H. Eugster, Helv. Chim. Acta., 33, 1172 (1950).
29.
H.H. Inhoffen, F. Bohlmann, K. Bartram, G. Rummert, H. Pomner, Ann., 570, 54 (1950).
30.
N.A. Milas, P. Davis, I. Belic, and D. Fles, J . Am. Chem. Soc., 72, 4844 (1950); See also Carotenoids, Otto Islor, ed., Birkhaüser Verlag, Basel, 1971.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
6 Rapid Generation of Reactants in Organic Synthesis Programs MALCOLM BERSOHN
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
Dept. of Chemistry, University of Toronto, Toronto, Canada M5S 1A1
The question o f e f f i c i e n c y of reactant generation has not r e c e i v e d primary a t t e n t i o n and this is as it should be s i n c e i n a problem r e q u i r i n g more than about four steps it is more important to develop b e t t e r h e u r i s t i c s to r e s t r a i n the generation of r e a c t a n t s than it is to f i n d ways to generate the l a t t e r more r a p i d l y . Furthermore, the program of the future may w e l l spend as much o f its time searching an e x t e r n a l data base as it does generating r e a c t a n t s . The e x t e r n a l data base would be, i n essence, the s y n t h e t i c p a r t of Chemical A b s t r a c t s . I t would contain the s o l u t i o n s of standard problems so whenever the molecule at hand is recognized to be s i m i l a r to a standard problem then the s t o r e d s o l u t i o n , a sequence of r e a c t i o n s , i s retrieved. In t h i s f u t u r e s i t u a t i o n the r e p r e s e n t a t i o n of molecular s t r u c t u r e used i n t e r n a l l y by the s y n t h e s i s program may have to be the same as t h a t of the e x t e r n a l data base. Hence the molecular s t r u c t u r e r e p r e s e n t a t i o n might have to be chosen p r i m a r i l y from t h i s p o i n t o f view r a t h e r than to optimize the speed o f r e a c t a n t generation. All this being s a i d , reactants still have to be generated and the r a p i d generation of reactants is an economic b e n e f i t . Speeding up the generation o f reactants means speeding up the component r o u t i n e s . Of these the most time consuming are 1, c a n o n i c a l i z a t i o n of the molecular s t r u c t u r e r e p r e s e n t a t i o n , 2, f i n d i n g the r i n g s , 3, f i n d i n g the f u n c t i o n a l groups and 4, r e t r i e v i n g the r e a c t i o n s and performing the t e s t s to decide whether the p a r t i c u l a r product molecule a t hand i s s u i t a b l e as a product o f the r e a c t i o n . Comparatively speaking, the a c t u a l generation o f the s t r u c t u r e of the r e a c t a n t molecule(s) is a b r i e f o p e r a t i o n . In my programs the most time consuming s i n g l e r o u t i n e is the c a n o n i c a l i z a t i o n of the molecular s t r u c t u r e r e p r e s e n t a t i o n and t h e r e f o r e approaches to the a c c e l e r a t i o n o f t h i s r o u t i n e will be d i s c u s s e d first. I.
C a n o n i c a l i z a t i o n of the Molecular
Structure
Representation
B a s i c a l l y , c a n o n i c a l i z a t i o n c o n s i s t s of numbering the
128 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
6.
BERSOHN
Reactants
in
Organic
Synthesis
Program
129
non-hydrogen atoms o f the molecule according to a s e t of r u l e s . T h i s means t h a t a molecule can be represented i n only one way. Having numbered the atoms according t o the r u l e s we are rewarded with s e v e r a l advantages, namely: 1. We can s p e e d i l y recognize whether the molecule a t hand i s the same as a p r e v i o u s l y generated r e a c t a n t molecule or the same as a molecule known i n the program as being a v a i l a b l e . Without c a n o n i c a l i z a t i o n we would be f o r c e d t o do some k i n d o f atom by atom matching (1) i n order to determine i f two s t r u c t u r e s are the same. 2. In the course o f d e c i d i n g the precedence o f the atoms we n e c e s s a r i l y have t o d i s c o v e r any equivalence between atoms t h a t e x i s t i n the molecule. Atoms are s a i d to be e q u i v a l e n t i f they are c a r r i e d i n t o each other's p o s i t i o n by g l o b a l or l o c a l symmetry operations o f the molecule. Global symmetry operations are r o t a t i o n s or r e f l e c t i o n s or combinations of these with respect to i n f i n i t e l y long axes or i n f i n i t e planes passing through the center of the molecule. L o c a l symmetry operations are r o t a t i o n s about bounded bounded axes and r e f l e c t i o n s i n bounded planes. In the f i g u r e below we see a molecule
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
130
l,l-Dimethyl-3-trichloromethylcyclohexane, which has l o c a l symmetry, A l o c a l C 3 a x i s terminates i n atom 1 and a l o c a l C2 a x i s terminates i n atom 2. (The p o i n t group of the molecule i s the d i r e c t product group C 3 ^ ) C 2 L r where the s u f f i x L means l o c a l . ) The equivalence o f the c h l o r i n e s and the two methyl groups can be discovered by c a n o n i c a l i z a t i o n without having to b u i l d a model and a c t u a l l y perform a l o c a l r e f l e c t i o n or r o t a t i o n to see i f the r e s u l t i s the same molecule. S i m i l a r l y , with molecules t h a t have g l o b a l symmetry, such as methylcyclohexane, the presence of two p a i r s o f equivalent methylene carbon atoms can be found without performing a g l o b a l r e f l e c t i o n . Knowing which atoms are equivalent to each other i s necessary f o r concluding t h a t c h i r a l i t y i s absent i n the common l i g a n d of e q u i v a l e n t atoms. Under v a r i o u s c o n d i t i o n s , many r e a c t i o n s produce two products i n s i g n i f i c a n t y i e l d and are t h e r e f o r e not to be recommended when a c e r t a i n p a i r of r e a c t i n g atoms are not e q u i v a l e n t . When t h i s p a i r i s equivalent the r e a c t i o n produces one product and the r e a c t i o n i s to be recommended. I t i s t h e r e f o r e a b s o l u t e l y necessary f o r a synthesis program to know which atoms i n the molecule a t hand are e q u i v a l e n t . The reactions include e l e c t r o c y c l i c reactions, reactions involving the carbon atom alpha to a ketone when both alpha atoms have the same number o f attached hydrogen atoms, W i t t i g type r e a c t i o n s etc. 3. Having numbered the atoms c a n o n i c a l l y , these numbers provide a g l o b a l o r d e r i n g of the atoms which can be used l o c a l l y a t a c h i r a l center to determine whether the center should be c a l l e d R or S. Thus, i f the l i g a n d atoms o f a c h i r a l atom are c a n o n i c a l l y numbered 3,7,9 and 14 and atoms 3,7,9 are arranged i n a counterclockwise f a s h i o n when viewed from the s i d e opposite atom number 14 then we can mark the atom with an S. This i n t e r n a l R,S n o t a t i o n may d i f f e r f o r some centers from those provided by the Cahn-Ingold-Prelog procedure (2_) but there i s no d i f f i c u l t y i n t r a n s l a t i n g between the systems since the absolute c o n f i g u r a t i o n i s embodied i n the molecular r e p r e s e n t a t i o n . (Normally t h i s w i l l not be necessary as most reactants never see the l i g h t o f day: i n my n o n i n t e r a c t i v e s y n t h e s i s programs, tens o f thousands o f molecular s t r u c t u r e s are o f t e n generated before an acceptable s y n t h e t i c pathway i s produced.) Thus we are spared the t r o u b l e of determining the sequence of the four l i g a n d s i n the Cahn-Ingold-Prelog sense. 4. I f the atoms are numbered, other things being equal, i n order of t h e i r atomic number and t h e r e a f t e r i n order o f t h e i r degree of unsaturation, then choosing the descending orders, oxygen atoms precede n i t r o g e n atoms which precede carbon atoms, unsaturated oxygen atoms precede saturated oxygen atoms e t c . Now i f we f u r t h e r order the l i s t o f the l i g a n d s of each atom i n ascending order of the numbers of the atoms then i t i s o f t e n p o s s i b l e f o r a subprogram to know " i n advance" which l i g a n d s are which. For example, i f a program i s examining the d e s c r i p t i o n V
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
v L
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
6.
BERSOHN
Reactants
in
Organic
Synthesis
Program
131
of the n i t r o g e n o f an amine oxide, then the f i r s t number encountered i n the l i s t o f l i g a n d s of the n i t r o g e n i s t h a t of the oxygen atom and the next number i s t h a t o f the carbon atom l i g a n d . Again, i f the program i s examining an e s t e r carbonyl carbon, the f i r s t l i g a n d encountered i n the l i s t w i l l a u t o m a t i c a l l y be the unsaturated carbonyl oxygen, the next l i g a n d w i l l be the saturated ether oxygen and the l a s t l i g a n d w i l l be the alpha carbon atom. The program can p i c k up these numbers and use them i n other contexts without examination o f the data about the atoms to which the numbers r e f e r . Accepting t h a t the a p p l i c a t i o n of a set o f r u l e s f o r numbering the atoms o f each molecule considered by the program i s necessary, one might ask why not use the book f u l l of IUPAC r u l e s ? (_3) The problem here i s t h a t s i n c e the IUPAC numbering r u l e s depend upon the r i n g system being considered, the programming o f t h i s book f u l l o f s p e c i a l cases i s an enormous task, not worth while. I t i s much e a s i e r to have b r i e f l y s t a t e d rules. Computerized Procedures
f o r Numbering the Atoms o f a Molecule
W.T. Wipke {4) f i r s t achieved the c a n o n i c a l i z a t i o n o f a molecular s t r u c t u r e r e p r e s e n t a t i o n i n a computer program t h a t i n c l u d e s a l l aspects o f stereochemistry. In some other schemes, stereochemistry i s not invoked to a i d i n the numbering o f the atoms. Such schemes are incomplete. I f we r e l e g a t e stereochemistry to a footnote and the numbering of the atoms and the connection t a b l e s o f two isomers can be the same, then we l o s e most o f the above-stated advantages of c a n o n i c a l i z i n g . H. G e l e r n t e r ' s program (5_) i s unique i n t h a t the s o r t i n g i s done v i a the Wiswesser l i n e n o t a t i o n . The Wiswesser scheme (6) s o r t s the groups and th~ numbering o f the atoms f o l l o w s from t h e i r p o s i t i o n s i n the groups and the order i n which the groups are given i n the Wiswesser symbols t h a t convey the s t r u c t u r e of the given molecule. A l l other schemes reported i n the l i t e r a t u r e Ç7'8y2.'i2.) c a n o n i c a l i z i n g a molecular s t r u c t u r e r e p r e s e n t a t i o n r e q u i r e a d i r e c t o r d e r i n g o f the non-hydrogenic atoms o f the molecule. In what f o l l o w s we w i l l omit the m o d i f i e r "non-hydrogenic" and ask the reader to note t h a t the hydrogen atoms are not numbered but are considered to be p r o p e r t i e s o f t h e i r l i g a n d s . We can d i v i d e the methods already i n use i n t o t r e e algorithms and sum algorithms, depending on how the e x t e r n a l environment of each atom i s represented. The atoms of a molecule can be p a r t i t i o n e d i n t o equivalence c l a s s e s on the b a s i s of a s i n g l e property or a set of p r o p e r t i e s . The value of each c l a s s f o r the p r o p e r t y ( s ) w i l l be c a l l e d the i n i t i a l c a n o n i c a l v a l u e . The set o f p r o p e r t i e s could i n c l u d e the atomic number, predominant i n the Cahn-IngoldP r e l o g system i f t h i s i s used to number the atoms o f the molecule, or the number o f l i g a n d atoms, which i s the property f
o
r
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
132
COMPUTER-ASSISTED
ORGANIC
SYNTHESIS
o f the g r e a t e s t use i n the Morgan algorithm. We can a l s o i n c l u d e the degree o f unsaturation, the number of attached hydrogen atoms, the charge and information about the s i z e o f the r i n g ( s ) o f which the atom i s a member. I t i s evident t h a t the more o f these p i e c e s o f information t h a t comprise the i n i t i a l c a n o n i c a l value, the more equivalence c l a s s e s we w i l l have. Thus i f we use only the atomic number we have a l a r g e equivalence c l a s s with the value 6. I f we add t o t h i s a number r e p r e s e n t i n g the degree o f unsaturation, the number o f equivalence c l a s s e s i s increased and the s i z e o f each c l a s s i s reduced. S t i l l there may be many atoms with the i n i t i a l c a n o n i c a l value 6.0, i . e . saturated carbon atoms. There are many o t h e r s , perhaps with the value o f 6.1, aromatic carbon atoms o r 6.2, doubly bonded carbon atoms, e t c . I f we f u r t h e r i n c l u d e the number o f attached hydrogen atoms we can have the equivalence c l a s s e s o f 6.0.2 and 6.0.1 r e f e r r i n g t o saturated methylene and methinyl carbon atoms r e s p e c t i v e l y . Adding the r i n g s i z e we then d i s t i n g u i s h the c l a s s with the i n i t i a l c a n o n i c a l value 6.0.2.5 from the c l a s s with the i n i t i a l c a n o n i c a l value 6.0.2.6. D i f f e r e n t schemes o f c a n o n i c a l i z a t i o n can be d i s t i n g u i s h e d by the p r o p e r t i e s s e l e c t e d t o e s t a b l i s h the i n i t i a l c a n o n i c a l value and whether the l i g a n d s o f each atom are c h a r a c t e r i z e d by a t r e e o f such i n i t i a l c a n o n i c a l values o r the sum o f such values obtained by a successive summation process t o be d e t a i l e d later. Tree Algorithms
and Sum
Algorithms
We i l l u s t r a t e the t r e e approach with an example, u s i n g the molecule o f Figure 1. In t h i s f i g u r e the atoms are numbered arbitrarily,
13
9
Figure
1.
A molecular structure with arbitrarily bered atoms
num-
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
6.
BERSOHN
Reactants
in Organic
Synthesis
133
Program
for reference i n the d i s c u s s i o n , not c a n o n i c a l l y . We w i l l take as the s e t o f c a n o n i c a l p r o p e r t i e s the s i n g l e property o f atomic number, and t r e a t double bonds as meaning the double occurrence o f the atom concerned, both as i n the Cahn-IngoldPrelog system. We consider atoms 2 and 6. The i n i t i a l c a n o n i c a l values f o r these are both 6. Hence i n the e f f o r t to d i s t i n g u i s h them, we walk out i n a l l d i r e c t i o n s and compare the c a n o n i c a l values along the paths from both the atoms. The paths of length one, i . e . t a k i n g account only o f the l i g a n d s , give s t r i n g s o f 6.6 f o r both atoms. The paths o f length two give s t r i n g s o f 66.66.66 f o r both atoms. The paths o f length three give s t r i n g s o f 668.668.666.666.666 f o r atom 2 and 668.668.668.666.666 f o r atom 6. Hence we can conclude that atom 6 outranks atom 2. The t r e e algorithm terminates under the f o l l o w i n g v a r i o u s c o n d i t i o n s : 1. No two atoms have i d e n t i c a l t r e e s . 2. The p a i r s o f t r e e s d e s c r i b i n g the environment o f a l l p a i r s o f atoms whose equivalence i s i n doubt e i t h e r converge to a common atom o r e l s e i n v o l v e every other atom o f the molecule. In a r e a l s i t u a t i o n , a t t h i s p o i n t a t r e e o f stereochemical values has to be b u i l t i f there are equivalent atoms i n the molecule. Now l e t us examine the behaviour o f the corresponding sum algorithm. Here we w i l l use the same i n i t i a l c a n o n i c a l value. The second c a n o n i c a l values are the sum o f the l i g a n d s i n i t i a l c a n o n i c a l v a l u e s . In general the i t h c a n o n i c a l value f o r an atom i s the sum o f the i - 1th c a n o n i c a l values o f i t s l i g a n d s . In t h i s way we o b t a i n a second c a n o n i c a l value o f 12 f o r both atoms 2 and 6 o f Figure 1. The t h i r d c a n o n i c a l value i s 30 f o r both atoms. The f o u r t h c a n o n i c a l value i s 76 f o r atom 2 and 78 f o r atom 6. Thus i t i s on the t h i r d i t e r a t i o n o f the summing process t h a t the c a n o n i c a l value f o r atom 6 f i n a l l y r e c e i v e s the information t h a t atom 12 i s an oxygen atom. This information i s not conveyed d i r e c t l y but i t i s mixed i n t o the sum along with the p r o p e r t i e s o f other atoms. The sum method terminates when Canonical Value Number 4 3 row number 1 2 60 1 24 12 6 2 76 6 12 30 114 3 6 52 18 108 4 12 36 6 102 5 54 6 18 78 6 12 6 30 7 48 192 28 6 8 96 8 12 56 48 9 28 6 6 212 10 54 6 30 108 11 12 60 8 12 38 66 8 12 38 13 12 8 6 the number o f equivalent atoms cannot be reduced between two 1
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
134
ORGANIC
SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
s u c c e s s i v e i t e r a t i o n s (method o f r e f s . 9,10) o r when the atom of h i g h e s t c o n n e c t i v i t y has been found (Morgan's a l g o r i t h m ) . In a r e a l case, i f there are e q u i v a l e n t atoms i n the molecule then sums i n v o l v i n g stereochemistry have to be computed. The advantage o f the sum method i s ease o f computation. I t i s e a s i e r t o compare simple numbers than long s t r i n g s o f numbers. I t i s a l s o f a s t e r t o generate the sums. The c a n o n i c a l i z a t i o n procedure o f P r o f e s s o r Ugi's group i s a t r e e algorithm, i n which the i n i t i a l c a n o n i c a l value i s composed o f the atomic number and the c o o r d i n a t i o n number o f the atom. ( I f one o f these two numbers makes the atom unique i n the molecule then only the one number i s used as the i n i t i a l c a n o n i c a l value.) Consider the molecule p a r t l y d e p i c t e d below, i n Figure 2.
Figure
2
By the dotted l i n e s we w i l l mean any s t r i n g o f atoms without side chains which are e q u i v a l e n t with r e s p e c t to atoms 1 and 2. I t i s c l e a r t h a t atoms 1 and 2 are nonequivalent. But they w i l l send out sums o f v a r i o u s c a t e g o r i e s which are the same. The sums o f the unsaturations are a l l zero, a l l the atoms concerned a r e a c y c l i c , the sum o f the atomic numbers are the same as w e l l as the sum o f the number o f the attached hydrogen atoms. Morgan's a l g o r i t h m on encountering a " t i e " l i k e t h i s would examine the l i g a n d s t o decide which atom should be chosen, according t o i t s r u l e s , as the atom to r e c e i v e the lowest number. The beginning atom i s the only one which i s numbered because o f the precedence o f i t s f i n a l c a n o n i c a l value. Other atoms are numbered according to t h e i r r e l a t i o n to i t . The i n i t i a l c a n o n i c a l value i s the number o f l i g a n d s . The summing p a r t o f the a l g o r i t h m terminates when we can no longer i n c r e a s e the number o f equivalence c l a s s e s . There i s no attempt to use the c a n o n i c a l values themselves as the b a s i s f o r o r d e r i n g a l l the atoms. In my a l g o r i t h m the f i n a l c a n o n i c a l values are used as the c r i t e r i o n f o r determining the precedence o f atoms with i d e n t i c a l i n i t i a l canonical values.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
6.
BERSOHN
Reactants
in
Organic
Synthesis
Other Algorithms Not C l a s s i f i a b l e as Tree or Sum
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
1.
135
Program
Algorithms
The Repeated Renumbering Algorithm
I t i s p o s s i b l e t o achieve the purpose o f a t r e e a l g o r i t h m without d i r e c t l y comparing s t r i n g s . We number the atoms according t o t h e i r i n i t i a l c a n o n i c a l values, u s i n g an a r b i t r a r y numbering f o r atoms with the same i n i t i a l c a n o n i c a l value but making sure t h a t i f i<j then the i n i t i a l c a n o n i c a l value of atom i i s greater than o r equal t o the i n i t i a l c a n o n i c a l value o f atom j . Next we renumber the atoms, ranking them f i r s t on the b a s i s o f t h e i r i n i t i a l c a n o n i c a l values and then on the b a s i s of the previous numbering o f t h e i r l i g a n d s . We keep r e p e a t i n g the renumbering process u n t i l the connection t a b l e or other molecular r e p r e s e n t a t i o n i s unchanged by the renumbering. This method appears elegant but i n our hands i t has always been slower than a sum method. The renumbering i s extensive i n the f i r s t few i t e r a t i o n s and the consumption o f time i n t r a n s l a t i n g between s u c c e s s i v e numberings proved e x c e s s i v e . T h i s a l g o r i t h m s t i l l seems worthy o f f u r t h e r i n v e s t i g a t i o n . I t i s p o s s i b l e that improved programming o f the subalgorithms can markedly improve t h i s scheme. The example below shows the f i r s t i t e r a t i o n of t h i s a l g o r i t h m when used t o order the atoms of the molecule shown
2.
C a n o n i c a l i z a t i o n Based on Atomic Coordinates
T h i s i d e a i n v o l v e s f i r s t numbering the atoms according to t h e i r i n i t i a l c a n o n i c a l value. Atoms having the same c a n o n i c a l value would have t h e i r r e l a t i v e precedence decided by t h e i r d i s t a n c e s from the center o f mass of the molecule. The d i s t a n c e would be i n nanometers, not bonds, so there would have to be a uniform way of b u i l d i n g the model and d e c i d i n g on the average or most s t a b l e conformation. T h i s method has the merit of doing away w i t h the need f o r i t e r a t i o n s , whether they be i t e r a t i o n s o f the summing process or i t e r a t i o n s of the t r e e growing process. On the other hand the d i s c o v e r y of equivalences o f atoms under l o c a l symmetry operations presents a problem.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
136
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
A Sum
Algorithm Described i n D e t a i l
My programs use a sum a l g o r i t h m to be d e s c r i b e d below. The time f o r c a n o n i c a l i z i n g the s t e r o i d 6a-methyl-17ot -acetoxyprogesterone i s 5063 times the time f o r executing a s t o r e i n s t r u c t i o n on the same computer, the IBM 370/165. (The a c t u a l time i s 1.61 + 0.02 microseconds. I deliberately place t h i s number i n parentheses, s i n c e a l l such comparisons should be r e l a t i v e t o the computer.) A search o f the l i t e r a t u r e d i d not r e v e a l other such reported times, so there i s no reason to conclude t h a t the sum a l g o r i t h m d i s c u s s e d below i s l e s s or more r a p i d than any other c a n o n i c a l i z a t i o n algorithms used elsewhere. H o p e f u l l y t h i s paper presents some s t a r t i n g p o i n t s f o r c o n s i d e r i n g the question of e f f i c i e n c y and how to improve i t . The r e a c t a n t generation time f o r d i f f i c u l t problems, e.g. s t e r o i d s , i s 4-5 m i l l i s e c o n d s , hence the programs spend as much as 40% of t h e i r time c a n o n i c a l i z i n g the molecular s t r u c t u r e r e p r e s e n t a t i o n . E v i d e n t l y , here i s the most important p l a c e t o think about e f f i c i e n c y . I t i s necessary f i r s t t o present e x p l i c i t l y the molecular s t r u c t u r e r e p r e s e n t a t i o n used. The s t r u c t u r e i s d e s c r i b e d by two t a b l e s . The f i r s t i s a connection t a b l e ; the second i s a stereochemical r e l a t i o n t a b l e . Connection t a b l e s and t h e i r a l t e r n a t i v e s are w e l l d i s c u s s e d i n the book o f M.F. Lynch et a l . (11). Both t a b l e s have to be c a n o n i c a l i z e d ; i n both t a b l e s the i t h row d e s c r i b e s atom number i . The connection t a b l e contains 12 columns or f i e l d s t h a t d e s c r i b e p r o p e r t i e s of atom i and these are followed by columns 13-16 i n which are l i s t e d the row numbers o f the atoms which are l i g a n d s of atom i . (For the b i t o r i e n t e d reader, I e x p l a i n t h a t the f i r s t 12 columns occupy 32 b i t s and the l a s t four columns a l s o occupy 32 b i t s . T h i s imposes a l i m i t of 254 non-hydrogen atoms i n the molecule: the value 255 i s reserved to mean a blank.) The f i r s t twelve columns o f the connection t a b l e are as f o l l o w s : 1. the atomic number, 2. the degree o f u n s a t u r a t i o n , 3. four minus the number o f hydrogen atoms, 4. the elementary f u n c t i o n a l group s e r i a l number, 5. member o f a r i n g o f s i z e other than 3,5, or 6, 6. member o f a r i n g o f s i z e 3, 7. member o f a r i n g o f s i z e 5, 8. member o f a r i n g o f s i z e 6, 9. p o s i t i v e charge, 10. negative charge, 11. R c h i r a l i t y , 12. S c h i r a l i t y . The connection t a b l e o f 2-Butanone-3-ol (R) i s given on the next page. A
CH,C ° J)
—
0H
CH 3
CH, 5
°
0°
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
6.
BERSOHN
Reactants in Organic Synthesis Program
137
atomic u n s a t u r a t i o n four minus elementary r i n g +-RS Ligands number # H atoms group 8 4 4 0000 0000 2 2 0 8 0 3 0000 0000 3 0 6 4 4 0000 0000 0 0 3 4 19 6 0 7 0000 0000 1 2 5 3 6 0 0000 0000 2 1 0 6 0 0000 0000 3 1 0 The "degree of u n s a t u r a t i o n " i s 0 f o r saturated atoms, 1 f o r aromatic atoms, 2 f o r carbon atoms doubly bonded t o carbon atoms, 4 f o r atoms i n v o l v e d i n a double bond t o a heteroatom, 6 f o r ketene carbonyl carbons, 8 f o r t r i p l y bonded atoms. Atoms which are i n some way c e n t r a l to an elementary and heteroatom-containing f u n c t i o n a l group, such as the oxygen o f an ether or the carbonyl carbon o f a ketone e t c . are l a b e l l e d with a number t h a t corresponds i n our program t o that elementary f u n c t i o n a l group. These numbers, l i s t e d i n Table I are fundamental t o the d i s c u s s i o n i n p a r t II o f t h i s paper. Here we observe t h a t the value f o r column 4 i s zero f o r a l l atoms which are not c e n t r a l to a heteroatom-containing elementary group o f Table I . Going back t o F i g u r e 2, we note that the c a n o n i c a l i z a t i o n system used here cannot regard the atoms 1 and 2 as being e q u i v a l e n t s i n c e the i n i t i a l c a n o n i c a l values i n c l u d e the values of column 4. The sum o f these l a t t e r values f o r atom 1 i s d i f f e r e n t from t h a t f o r atom 2. Columns 5 through 12 c o n t a i n e i t h e r the value 0, meaning the atom l a c k s the r e l e v a n t property o r 1, meaning the atom posseses the r e l e v a n t p r o p e r t y . The stereochemical r e l a t i o n t a b l e c o n t a i n s i n the i t h row the s t e r i c r e l a t i o n s o f atom i to other atoms. The e n t r i e s i n the t a b l e c o n s i s t s of p a i r s o f symbols, the f i r s t i n d i c a t i n g the s t e r i c r e l a t i o n and the second i n d i c a t i n g the atom concerned. 1.9 means t h a t atom i i s a x i a l i n the r i n g o f which t h i s l i g a n d atom 9 i s a member. 2.9 i s the corresponding e q u a t o r i a l i n d i c a t i o n . 5.6 would mean t h a t atom i i s a s u b s t i t u e n t o f a double bond and i s c i s to atom 6. 6.8 would mean t h a t atoms i and 8 are t r a n s across some double bond. The symbols 7 and 8 r e f e r to c i s and trans r e l a t i o n s h i p s w i t h respect t o a r i n g . The f u l l l i s t of symbols i s given i n r e f e r e n c e 9. The i n i t i a l c a n o n i c a l value i s based on the values of a l l o f the f i r s t 10 columns of the connection t a b l e . Successive c a n o n i c a l values are sums o f the values of these ten p r o p e r t i e s , i n c l u d i n g data from s u c c e s s i v e l y remote atoms. (To avoid overflow of, f o r example, the t h i r d column onto the second column, the ten numbers may be expanded i n t o numbers with more d i g i t s and the ensemble can be kept i n a 64 b i t number manipulated by f l o a t i n g p o i n t r e g i s t e r s . This time consuming expansion can be avoided by f o l l o w i n g the procedure o f the appendix, making sure to i n c l u d e step 6.) E v i d e n t l y the l a r g e r the s e t o f p r o p e r t i e s from which the i n i t i a l c a n o n i c a l value i s d e r i v e d , the more
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
138
COMPUTER-ASSISTED ORGANIC
SYNTHESIS
Table I . S e r i a l Numbers o f Elementary F u n c t i o n a l Groups
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
1 add one f o r CH 3 add 3 f o r phenyl t o make b e n z y l i c d e r i v a t i v e 5 CH20H primary a l c o h o l 7 CHOH secondary a l c o h o l 9 COH t e r t i a r y a l c o h o l 11 s u l f o x i d e 13 sulfone 15 n i t r o s e o 17 n i t r o 19 ketone 21 e s t e r methyl o r other t r i v i a l e s t e r 23 aldehyde 25 n i t r i l e 27 d i t h i a n e 29 c a r b o x y l i c a c i d 31 k e t a l 33 a c e t a l 35 hemiacetal 37 ether 39 epoxide 41 CI 43 hydrazine 45 primary amine 47 secondary amine 49 t e r t i a r y amine 51 imine 53 urea 55 primary amide 57 secondary amide 59 t e r t i a r y amide 61 CCI 63 t h i o l 65 C-S-C s u l f i d e 67 C-S-S-C d i s u l f i d e 69 CF 71 ketene; the middle carbon i s a c e n t r a l atom 73 bromide 35 a c y l h a l i d e 77 carbodiimide 79 isocyanate 81 amine oxide 83 oxime 85 s u b s t i t u t e d hydroxylamine 87 azo C-N=N-C 89 s u l f o n i c a c i d 91 sulfonamide 93 s u l f i n i c a c i d 95 mesylate
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
6.
BERSOHN
Reactants
in
Organic
Synthesis
Program
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
Table I. Continued 97 C=C; add 97 to O X s e r i a l number to o b t a i n s e r i a l # of a l l y l i c C-X 99 C=CH 101 CH=CH 103 C=CH2 105 CH=CH2 107 cyclohexene 109 C-C t r i p l e bond 111 phosphine oxide 113 phosphinic a c i d 115 anhydride 117 imide 119 primary e s t e r C02CH2R 121 secondary e s t e r C02CHRR 123 t e r t i a r y e s t e r C02CRRR 125 primary phosphine 127 secondary phosphine 129 t e r t i a r y phosphine 131 add 131 to C-X s e r i a l # to o b t a i n aromatic C-X s e r i a l # 133 secondary borane 135 t e r t i a r y borane 137 peroxide 139 hemiketal 141 enol l a c t o n e 143 v i n y l h a l i d e 145 primary v i n y l amine 147 secondary v i n y l amine 149 t e r t i a r y v i n y l amine 151 cyclopropane r i n g 153 cyclobutane r i n g 155 aliène middle carbon i s c e n t r a l atom 194 diene, a complex f u n c t i o n a l group, (97 + 97)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
140
d i s t i n c t i o n s can be made between atoms. Such d i s t i n c t i o n s i n c l u d e the d i s t i n c t i o n between r i n g atoms and a c y c l i c atoms, between an e s t e r carbonyl carbon and a ketone carbonyl carbon etc. The more such d i s t i n c t i o n s can be made the fewer i t e r a t i o n s of the summing process are r e q u i r e d . A c e n t r a l idea here i s t h a t a synthesis program, u n l i k e a program processing manual input, knows a l l about the molecule from which the molecule a t hand i s d e r i v e d hence matters l i k e what s i z e r i n g i f any the atom i s s i t u a t e d i n or what f u n c t i o n a l group the atom i s c e n t r a l t o , are data a v a i l a b l e to a s y n t h e s i s program before c a n o n i c a l i z a t i o n of the molecular s t r u c t u r e r e p r e s e n t a t i o n . When we procède from the k t h to the k + 1th c a n o n i c a l value and there i s no decrease i n the number of equivalent atoms, then the c a l c u l a t i o n o f the c a n o n i c a l values terminates. A l l atoms are then numbered i n the order of the value of the f i r s t 10 columns o f the connection t a b l e . Whenever these are equal the f i n a l c a n o n i c a l values are consulted to determine which atom precedes which. I f the f i n a l c a n o n i c a l values are the same we have good grounds to suspect t h a t the atoms are e q u i v a l e n t but they must meet two more requirements t o be judged e q u i v a l e n t : 1. I f there are a t l e a s t two nonterminal double bonds i n the molecule the a l g o r i t h m continues and c a l c u l a t e s a "stereochemical c a n o n i c a l value", i n which the i n i t i a l c a n o n i c a l value i s a number meaning c i s o r trans with respect to a double bond. I f the atoms concerned are s t i l l e q u i v a l e n t with r e s p e c t to the steroeochemical c a n o n i c a l value, then they are t r u l y e q u i v a l e n t unless they are both c h i r a l . T h i s step i s r e q u i r e d to r e v e a l the inequivalence of atoms 1 and 2 o f the Figure below.
OH
H
>
CH:
\
CH
y
'\
/ 3
CH
CH:
Η
3
2. I f the atoms are c h i r a l the program must determine the c h i r a l i t y and i f they are both R or both S they are a c t u a l l y not e q u i v a l e n t , ( c f . reference 9). I f , l e t us say, atoms 9 and 10 are e q u i v a l e n t , then which should be numbered 9 and which should be numbered 10? How, f o r
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
6.
BERSOHN
Reactants
in
Organic
Synthesis
141
Program
example, shpuld we number the atoms of cyclohexane? The answer i s t h a t i f i<j then the row numbers o f the l i g a n d s of i which are not j should be l e s s than or equal to the row numbers of the ligands of j which are not i . A p p l i c a t i o n o f t h i s r u l e gives I, 2,4 and 3 f o r the numbering o f the atoms o f cyclobutane, reading around the r i n g . S i m i l a r l y , reading around the r i n g , t h i s r u l e gives 1,2,4,6,5 and 3 f o r the atoms o f benzene. We r e q u i r e t h a t the l i g a n d row numbers o f i and the l i g a n d row numbers o f j being compared must not include i or j . I f we drop t h i s s t i p u l a t i o n then we would have an i n f i n i t e loop i n the case o f ethane or symmetrically s u b s t i t u t e d ethanes or r i n g s of equivalent atoms. We r e t u r n once more to Figure 2. Thanks to the presence o f column 4 the procedure here cannot regard the two atoms as being e q u i v a l e n t . But what i f a more complicated s i t u a t i o n arose such that two atoms which are not e q u i v a l e n t had the same u l t i m a t e c a n o n i c a l value? I consider t h i s most u n l i k e l y and have not been able to f i n d such an example but i n any case i f t h i s i s encountered i t can e a s i l y be managed by i n c l u d i n g i n the algorithm the requirement that equivalent atoms must have pairwise equivalent l i g a n d s . On d e t e c t i n g the nonequivalence of the l i g a n d s the precedence would f o l l o w that of the l i g a n d s . ( I n c l u s i o n of t h i s requirement removes the need to make sure t h a t the various columns from the i n i t i a l c a n o n i c a l value do not overflow onto columns to the l e f t of them. In the very r a r e case i n which i n e q u i v a l e n t atoms appear to be equivalent because of such overflow, the atoms w i l l be detected as i n e q u i v a l e n t by the above s t a t e d requirement t h a t equivalent atoms must have e q u i v a l e n t ligands.) I t i s necessary to order the l i g a n d atoms i n column 13-16. We happen to use ascending order. In the process of s o r t i n g the row numbers of the f o u r l i g a n d atoms i t i s easy to detect i n v e r s i o n s of c h i r a l i t y t h a t have taken p l a c e i n a r e a c t i o n . I f an even number o f exchanges i s r e q u i r e d to order the numbers o f the l i g a n d atoms, then the c h i r a l i t y o f the atom i s the same i n the reactant as i n the product. I f an odd number of exchanges i s r e q u i r e d to order these row numbers then the c h i r a l i t y o f t h i s atom i n the reactant i s opposite to that o f the atom i n the product. I f two atoms of the r e a c t a n t are e q u i v a l e n t any common l i g a n d cannot be c h i r a l even i f i t i s c h i r a l i n the product so a zero i s placed i n columns 11 and 12. S i m i l a r l y i f a saturated carbon or quaternary n i t r o g e n atom has no two equivalent l i g a n d s and the atom was not c h i r a l i n the product then the c h i r a l i t y i s a problem to be determined, using the methods o f reference 9. The previous manipulations o f c h i r a l i t y are i n c l u d e d i n the time r e q u i r e d f o r c a n o n i c a l i z a t i o n of the connection t a b l e but the problem o f d e c i d i n g c h i r a l i t y when the atom i s not c h i r a l i n the product i s handled by a d i f f e r e n t r o u t i n e . II.
The Numbering o f S y n t h e t i c a l l y I n t e r e s t i n g
Substructures
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
142
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
and the Use of These Numbers f o r Rapid Discovery o f F u n c t i o n a l Groups Most s y n t h e t i c r e a c t i o n s produce a w e l l d e f i n e d substructure i n the product. I t i s t h e r e f o r e n a t u r a l t o c l a s s i f y the r e a c t i o n s by the substructure produced i n the product. T h i s i s done i n the standard compendia such as those o f Buehler and Pearson, (12) and Harrison and H a r r i s o n . (13) I t i s a l s o n a t u r a l f o r a programmer to number these substructures and use these numbers as i n d i c e s for r e t r i e v a l o f r e a c t i o n s t h a t produce t h i s substructure. The problem o f r e c o g n i z i n g the substructures becomes one o f d e r i v i n g these numbers i n some way from the molecular s t r u c t u r e representation. In the r e p r e s e n t a t i o n which I use, the numbers c h a r a c t e r i z i n g elementary f u n c t i o n a l groups are b u i l t i n t o the r e p r e s e n t a t i o n so the d e r i v a t i o n of the elementary f u n c t i o n a l group numbers i s an immediate e x t r a c t i o n . (14_, 15) As mentioned above, column 4 o f the connection t a b l e has the value zero except f o r atoms which are c e n t r a l t o some elementary f u n c t i o n a l group t h a t contains a heteroatom. Elementary f u n c t i o n a l groups which do not c o n t a i n a heteroatom, i . e . carbon-carbon m u l t i p l e bonds, are e a s i l y c o l l e c t e d from the contents o f column 2, the degree o f u n s a t u r a t i o n , which p l a c e s such atoms ahead o f a l l other carbon atoms. A loose d e f i n i t i o n of an elementary f u n c t i o n a l group i s t h a t i t i s a carbon-carbon m u l t i p l e bond or a heteroatom and i t s carbon l i g a n d ( s ) together with t h e i r ligands. A more p r a c t i c a l d e f i n i t i o n o f an elementary f u n c t i o n a l group i s that i t i s any f u n c t i o n a l group t h a t appears i n Table I! T h i s l a t t e r d e f i n i t i o n r e v e a l s the e m p i r i c a l and a r b i t r a r y nature of this classification. Complex f u n c t i o n a l groups are sets of atoms i n which we can f i n d more than one elementary f u n c t i o n a l group. As a simple example, we can look a t the connection t a b l e of 2-butanone-3-ol given above. As we scan column 4 we see t h a t atom 2 has the l a b e l 19, meaning ketone and atom 3 has the l a b e l 7, meaning secondary a l c o h o l . There are no other elementary f u n c t i o n a l groups i n the molecule, s i n c e , as shown i n column 2, there are no v i n y l or a c e t y l e n i c carbons. Having our l i s t of elementary f u n c t i o n a l groups we then look for alpha, beta, gamma, d e l t a and e p s i l o n r e l a t i o n s h i p s between them. T h i s search r e v e a l s t h a t the ketone and secondary a l c o h o l are adjacent, g i v i n g us the complex f u n c t i o n a l group with s e r i a l number 26, the sum o f 19 and 7, and alpha k e t o l . Complex f u n c t i o n a l groups are thus found s t a r t i n g from the elementary f u n c t i o n a l groups. The numbering of complex f u n c t i o n a l groups i s not one t o one s i n c e , f o r example, t h e r e are other ways o f forming the number 26 from the sums o f two odd i n t e g e r s . Consequently future use o f the f u n c t i o n a l group has to be preceded by a t e s t on the f i r s t c e n t r a l atom to a s c e r t a i n i f i t i s a secondary a l c o h o l . The format o f the f u n c t i o n a l group r e p r e s e n t a t i o n i n a l i s t o f f u n c t i o n a l groups prepared by the program i s : s e r i a l number o f the f u n c t i o n a l group, row number of
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
6.
BERSOHN
Reactants
in Organic
Synthesis
Program
143
the f i r s t c e n t r a l atoms, row numbers o f the ligands o f the f i r s t c e n t r a l atom, row number o f the second c e n t r a l atom, row numbers of the ligands o f the second c e n t r a l atom, row numbers o f other relevant atoms. An elementary f u n c t i o n a l group with a heteroatom has only one c e n t r a l atom. A carbon-carbon m u l t i p l e bond i s w r i t t e n so that the carbon with fewer attached hydrogen atoms i s i n the p o s i t i o n o f the f i r s t c e n t r a l atom and the number o f the other atom i s i n the p o s i t i o n o f the second c e n t r a l atom. Below we show the f u n c t i o n a l group l i s t o f 2-butanone-3-ol.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
26 7 19
2 0 0 4 3 1 2 5 2 0 0 3
3 1 5 4
°0 II
*0H I
C H , C — CH C H ,
4 3 2 3 5 0 The ambiguity o f the numbering o f the complex f u n c t i o n a l groups i n which a t e s t must be performed on column 4 o f the f i r s t mentioned atom before the i d e n t i t y o f the complex f u n c t i o n a l group i s c e r t a i n , i s the r e s u l t o f a temporary compromise with the hardware. A t r u l y high speed v i r t u a l storage w i l l enable the complex f u n c t i o n a l groups t o be numbered unambiguously, so t h a t a t e r t i a r y a l c o h o l next t o an e s t e r w i l l be numbered d i f f e r e n t l y from the complex f u n c t i o n a l group c o n s i s t i n g o f a secondary a l c o h o l next t o a ketone. These unambiguous numbers w i l l be l a r g e r than the present ones. The number o f atoms i n t e r v e n i n g between the c e n t r a l atoms of the two f u n c t i o n a l groups i s given as a p r e f i x t o the number of the complex f u n c t i o n a l group. For example, an alpha keto e s t e r i s numbered 0.40, a beta k e t o e s t e r i s numbered 1.40, a gamma k e t o e s t e r 2.40 e t c . I t i s unnecessary t o f i n d a l l the f u n c t i o n a l groups o f every reactant molecule. Many r e a c t i o n s simply a l t e r , i n s e r t o r remove one f u n c t i o n a l group. A s u i t a b l e f l a g posted on the d e s c r i p t i o n of such r e a c t i o n s should a l e r t the synthesis program t h a t the r e a c t i o n i s o f t h i s type and the f u l l f u n c t i o n a l group f i n d i n g r o u t i n e need n o t be c a l l e d . Instead the f u n c t i o n a l groups are copied from the product t o the reactant, with s u i t a b l e renumbering o f the atoms i n v o l v e d . Next the alpha, beta gamma e t c . r e l a t i o n s o f the new o r d i f f e r e n t f u n c t i o n a l group with the other f u n c t i o n a l groups are looked f o r . I f the r e a c t i o n c o n s i s t s of i n s e r t i n g a new f u n c t i o n a l group then the f u n c t i o n a l group i n question i s simply removed from the l i s t o f f u n c t i o n a l groups copied from the product. F a i l u r e t o use such short cuts w i l l add n o t i c e a b l y t o reactant generation time. Some substructures o f s y n t h e t i c i n t e r e s t are n e i t h e r r i n g s nor f u n c t i o n a l groups but are j u s t hydrocarbon fragments. F o r example, i f the r e a c t i o n i s d e s u l f u r i z a t i o n o f a 2-substituted thiophene d e r i v a t i v e then the product contains a s t r i n g o f four saturated carbons, which are o r i g i n a l l y p a r t o f the thiophene r i n g , along with the removed s u l f u r atom. The predominant theme
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
144
COMPUTER-ASSISTED ORGANIC
SYNTHESIS
of the author's approach to substructure d i s c o v e r y i s that the data should be s t r u c t u r e d so t h a t the group o f i n t e r e s t should simply be c o l l e c t e d and not searched f o r s p e c i f i c a l l y o r i n q u i r e d about by questions i n some l o g i c a l sequence t h a t narrow down the p o s s i b i l i t i e s . But i n t h i s case I cannot f i n a a way t o avoid a s p e c i f i c i n q u i r y , and h o p e f u l l y the i n g e n u i t y o f others w i l l provide some s o l u t i o n t o t h i s kind o f problem. An even more important problem i s the problem o f d i s c o v e r i n g what I w i l l c a l l h y p e r s t r u c t u r e s . T y p i c a l l y they w i l l i n v o l v e r i n g systems and/or three o r more f u n c t i o n a l groups. What has t o be r e t r i e v e d , once these are discovered, i s not an o r d i n a r y s y n t h e t i c r e a c t i o n but a h i g h l y s p e c i a l i z e d r e a c t i o n , perhaps using an enzyme o r microorganisms or e l s e a s p e c i a l i z e d sequence o f r e a c t i o n s , the performance o f which w i l l produce the discovered hyperstructure i n good y i e l d . The r e c o g n i t i o n o f such hyperstructures w i l l probably r e p l a c e , t o a n o t i c e a b l e extent, the search process o f generating r e a c t a n t s . The d i s c o v e r y o f such hyperstructures procèdes i n our subprogram as f o l l o w s . I f one has discovered a c o n s t i t u e n t p a r t of t h i s h y p e r s t r u c t u r e , a subroutine i s c a l l e d which i n q u i r e s about v a r i o u s h y p e r s t r u c t u r e s o f which t h i s could be a p a r t . Thus the d i s c o v e r y o f fused aromatic r i n g s leads t o the i n q u i r y sequence f o r f i n d i n g phenanthrene. An o r d i n a r y aromatic molecule would not be quizzed f o r the phenanthrene p o s s i b i l i t y . S i m i l a r l y , the d i s c o v e r y o f fused nonaromatic r i n g s leads t o questions searching toward the existence o f a s t e r o i d s k e l e t o n . An example i s the 11-alpha-hydroxylated t r a n s - a n t i - t r a n s s t e r o i d s . The oxygenation a t the 11 p o s i t i o n o f such s t e r o i d s i s w e l l known to be a b l e t o be e f f e c t e d by microorganisms. (16) My subprogram t h a t d i s c o v e r s f u n c t i o n a l groups r e q u i r e d a time e q u i v a l e n t t o 1250 s t o r e i n s t r u c t i o n s t o f i n d a l l the f u n c t i o n a l groups o f PGE2 and 2844 store i n s t r u c t i o n s t o f i n d a l l the f u n c t i o n a l groups o f t e t r a c y c l i n e . (The a c t u a l times were 0.40 and 0.91 ms, r e s p e c t i v e l y , the standard d e v i a t i o n being l e s s than 1%). Comparative data i n the l i t e r a t u r e were not found. The process i s q u i t e d i r e c t and both the need and the p o s s i b i l i t y o f improving i t appear l e s s than f o r the canonical i z a t i o n process. Support f o r t h i s work from the N a t i o n a l Research C o u n c i l o f Canada i s g r a t e f u l l y acknowledged.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
6.
BERSOHN
Reactants in Organic Synthesis Program
145
Appendix. Steps of the sum a l g o r i t h m i n d e t a i l s u f f i c i e n t f o r programming. 1. C a l c u l a t e the i n i t i a l c a n o n i c a l values f o r a l l the atoms by copying the f i r s t 32 b i t s of each connection t a b l e row and s h i f t i n g r i g h t two b i t s to e l i m i n a t e the c h i r a l i t y i n f o r m a t i o n . 2. C a l c u l a t e the second and t h i r d c a n o n i c a l values f o r a l l the atoms. The i t h c a n o n i c a l value of an atom i s the sum of the i - l t h c a n o n i c a l values of i t s l i g a n d atoms. 3. Make a l i s t o f a l l p a i r s o f atoms which are e q u i v a l e n t i n t h e i r t h i r d c a n o n i c a l values as w e l l as t h e i r f i r s t c a n o n i c a l values. C a l l the l i s t S, f o r Suspected e q u i v a l e n t p a i r s . 4. Check through the l i s t S to f i n d the f o l l o w i n g s p e c i f i c cases: gem dimethyl p a i r s , gem d i h a l o p a i r s , methylenes or oxygens of the 1,3-dioxolane r i n g of a k e t a l or a c e t a l of ethylene g l y c o l , p a i r s of corresponding atoms i n the carboalkoxy groups of a malonic e s t e r , ortho or meta carbons of monosubstituted benzenes. (Almost a l l cases I have encountered are covered by t h i s l i s t . ) Remove these cases from the l i s t of suspected equivalences and put them on a new l i s t c a l l e d C. I f S i s now empty then go to step 7. 5. C a l c u l a t e the next set of c a n o n i c a l values f o r a l l the atoms. Then examine the l i s t S of suspected e q u i v a l e n t p a i r s of atoms and check t o see i f t h e i r l a t e s t c a n o n i c a l values are s t i l l the same. I f the c a n o n i c a l values are now d i f f e r e n t remove the p a i r from S. I f S i s not empty then go back to the beginning of t h i s step. 6. F i n d out whether the l i g a n d atoms of each p a i r o f e q u i v a l e n t atoms on the l i s t S are p a i r w i s e e q u i v a l e n t . I f not then rank the p a i r o f atoms according to the c a n o n i c a l value of the highest ranking l i g a n d . I f there are nonterminal double bonds i n the molecule then c a l c u l a t e the stereochemical c a n o n i c a l values using the s t r i n g of f i r s t numbers of each p a i r i n the stereochemical t a b l e as i n i t i a l c a n o n i c a l v a l u e s . Next c a l c u l a t e the second and t h i r d stereochemical c a n o n i c a l v a l u e s . I f a l l the p a i r s of S are s t i l l equivalent then go to step 7. Otherwise remove the i n e q u i v a l e n t p a i r s from S and continue to c a l c u l a t e stereochemical c a n o n i c a l values and check f o r inequivalences o f p a i r s i n S u n t i l the length of S no longer decreases. 7. S o r t the rows o f the connection t a b l e and the s t e r e o r e l a t i o n t a b l e according t o the precedence determined f i r s t by the i n i t i a l c a n o n i c a l values and i n case where the i n i t i a l c a n o n i c a l values are equal, by the f i n a l c a n o n i c a l v a l u e s . In t h i s process prepare a t r a n s l a t i o n t a b l e such t h a t the i t h entry o f the t a b l e i s the new number of o l d atom number i . 8. Renumber the l i g a n d atoms i n each row using the t a b l e j u s t created. 9. S o r t the l i g a n d s i n each row. I f an o l d number of interchanges are r e q u i r e d i n the s o r t i n g process f o r the row o f
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
146
COMPUTER-ASSISTED
ORGANIC
SYNTHESIS
an atom which i s c h i r a l i n the product, then i n v e r t the chirality. In t h i s case the c h i r a l i t y o f the atom i n the reactant i s opposite to that which i t has i n the product. 10. For a l l p a i r s on the l i s t s S and C, f i n d any common neighbors and erase any c h i r a l i t y derived from the product connection t a b l e . 11. For a l l p a i r s o f atoms on S o r C see i f the f o l l o w i n g r u l e i s obeyed: I f i and j are equivalent and i < j then the f i r s t l i g a n d o f i must have the same or a lower row number than the f i r s t l i g a n d of j . I f the f i r s t l i g a n d o f i i s the same as the f i r s t l i g a n d of j , i . e . they have a common l i g a n d , then the second l i g a n d o f i must have the same o r a lower row number than the second l i g a n d o f j . I f the r u l e i s not obeyed, interchange the renumber atom i as atom j , rename the former atom j as atom i and change the t r a n s l a t i o n t a b l e , the connection t a b l e and the stereorelation table accordingly. Note: I use a s e l e c t i o n s o r t f o r s o r t i n g the rows o f the connection t a b l e i n step 7. For the number of atoms i n v o l v e d , u s u a l l y 25-40, the s t r a i g h t s e l e c t i o n s o r t was n o t i c e a b l y f a s t e r than the t r e e s e l e c t i o n s o r t s discussed by Knuth. (1^7) The optimum s o r t i n g procedure i n t h i s case i s very much an open question. The numbers o f the l i g a n d s i n each row were sorted by the bubble s o r t (17) because these numbers are u s u a l l y i n order before c a n o n i c a l i z a t i o n , s i n c e they are u s u a l l y i n the same order i n the reactant molecule a t hand as they are i n the already c a n o n i c a l i z e d product molecule connection t a b l e . The most e f f i c i e n t s o r t f o r an almost sorted l i s t i s the bubble s o r t . Abstract The speed o f generating reactants i s discussed from the standpoint o f improvement in two key r o u t i n e s , the c a n o n i c a l i z a t i o n of molecular s t r u c t u r e and the discovery o f f u n c t i o n a l groups. Algorithms i n use f o r c a n o n i c a l i z a t i o n o f molecular s t r u c t u r e representations i n a computer are c l a s s i f i e d i n t o t r e e algorithms and sum algorithms. A p a r t i c u l a r sum algorithm i s d e s c r i b e d . The advantages o f c a n o n i c a l i z a t i o n are presented, i n c l u d i n g the discovery o f atoms equivalent under l o c a l o r g l o b a l symmetry operations. A method f o r f i n d i n g f u n c t i o n a l groups i s a l s o described, the essence of which is the s p e c i a l l a b e l l i n g of the connection t a b l e rows o f atoms t h a t are c e n t r a l to simple f u n c t i o n a l groups. The product f u n c t i o n a l group information is used when examining the r e a c t a n t ( s ) , so that groups can be copied, where p o s s i b l e , and not rediscovered.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
6. BERSOHN
Reactants in Organic Synthesis Program
147
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch006
Literature Cited 1. Susseguth, E.H., J . Chem. Doc., 5, 36 (1965). A more rapid procedure using a connection table with much information about each atom, i s to be found i n the appendix of reference 10. 2. Cahn, R.S., Ingold, C.K. and Prelog, V., Angew. Chem. Informatn. Edn., 5, 385, 511 (1966). 3. International Union of Pure and Applied Chemistry, "Nomenclature of Organic Chemistry", Sections A, Β & C, Third Ed., Butterworths, London, 1971. 4. Wipke, W.T. and Dyott, T.M., J . Amer. Chem. Soc., 96, 4825, 4834 (1974). 5. Gelernter, H., Sridharan, N.S., Hart, A.J., Yen, S.C., Fowler, F.W., and Shue, H., Topics Current Chem., 41, 113 (1973). 6. Smith, E.G., "The Wiswesser Line-Formula Chemical Notation", McGraw-Hill, New York, 1968. 7. Morgan, H.L., J. Chem. Documentation, 5, 107 (1965). 8. Gasteiger, J., Gillespie, P., Marquarding, D., Ugi, I., Topics i n Current Chemistry, 48, 1 (1974). 9. Esack, A. and Bersohn, M., J . Chem. Soc., Perkin I, 1124 (1975). 10. Bersohn, M. and Esack, Α., Chemica Scripta, 6, 122 (1974). 11. Lynch, M.F., Harrison, J.M., Town, W.G. and Ash, J.E., "Computer Handling of Chemical Structure Information", McDonald, London, 1971. 12. Buehler, C.A. and Pearson, D.E., "Survey of Organic Synthesis", John Wiley & Sons, New York, 1970. 13. Harrison, I.T. and Harrison, S., "Compendium of Organic Synthetic Methods", Vols I and II, John Wiley & Sons, New York, 1971, 1974. 14. Bersohn, M. and Esack, Α., J . Chem. Soc., Perkin I, 2463 (1974). 15. Bersohn, M. and Esack, Α., Chemica Scripta, i n press. 16. Fonken, G.S. and Johnson, R.A., "Chemical Oxidations with Microorganisms", Marcel Dekker Inc., New York, 1972. 17. Knuth, D.E., "Sorting and Searching", Addison-Wesley, Reading, Mass., U.S.A., 1973.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7 An Artificial Intelligence System to Model and Guide Chemical Synthesis Planning by Computer: A Proposal N. S. SRIDHARAN
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
Rutgers University, Computer Science Dept., New Brunswick, N.J. 08903
One o f t h e central problems i n a p p l y i n g computer methods t o c h e m i c a l s y n t h e s i s p l a n n i n g is S e a r c h Guidance. The ability o f a c h e m i s t t o g u i d e an interactive computer in its s e a r c h f o r s y n t h e s e s and the possibilities f o r self-guidance in Artificial Intelligence programs a r e b o t h limited by t h e form and content o f the search information that is made available as the exploration p r o c e e d s . A system f o r S e a r c h M o d e l l i n g is proposed in this paper which c a n augment existing systems f o r s y n t h e s i s p l a n n i n g and s e r v e t o gather, a n a l y z e and amplify t h e i n f o r m a t i o n generated d u r i n g controlled exploration. The s e a r c h management model is specified in a s i m p l e descriptive form and two example models a r e i n c l u d e d in t h e p a p e r . The g u i d a n c e o f s e a r c h u s i n g t h e i n f o r m a t i o n g a t h e r e d is specified by a r u l e s e t in t h e s i m p l e s y n t a x o f Condition=>Action pairs. A chemist can interactively m o d i f y this rule set. F u r t h e r , an advanced s e a r c h model is p r e s e n t e d w h i c h , by i n t r o d u c i n g t h e powerful concept o f a P l a n n i n g Space, a l l o w s t h e s e a r c h f o r s y n t h e s e s t o go f o r w a r d , backward and t o leap i n t o t h e m i d d l e under controlled conditions.
148 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical
Synthesis
Planning
by
Computer
149
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
INTRODUCTION Planning chemical synthesis routes for known molecular s t r u c t u r e s i s a r i c h problem area o f f e r i n g a challenge that i s being met in inventive and imaginative ways not only by chemists, but a l s o by computer s c i e n t i s t s and mathematicians. I b r i n g to t h i s task the perspective of a s p e c i a l i s t i n the methods of developing A r t i f i c i a l I n t e l l i g e n c e w i t h i n a computing system, and a p e r s i s t e n t concern for using the mechanisable aspects of human knowledge and human problem s o l v i n g techniques i n the medium of a machine. I b r i n g to t h i s task no e x p e r t i s e i n organic chemistry or i n s y n t h e s i s . But I do have the b e n e f i t of several years of intimate contact with the problem of mechanizing the search for syntheses and with expert chemists, e s p e c i a l l y Prof. W.F. Fowler, who foresaw such p o s s i b i l i t i e s and who worked with us. My a p p l i c a t i o n to t h i s task under the guidance of Prof. Gelernter culminated during 1971 i n the running v e r s i o n of SYNCHEM I , the f i r s t computer program to perform s u c c e s s f u l m u l t i - s t e p synthesis e x p l o r a t i o n s automatically without on-line guidance or intervention. This f i r s t v e r s i o n of SYNCHEM employed several key ideas and techniques that were discovered by others e a r l y i n the research on mechanical problem s o l v i n g . These key ideas w i l l be reviewed below. Since my main i n t e r e s t l i e s i n the d i r e c t i o n of artificial i n t e l l i g e n c e , I have spent the f i v e years f o l l o w i n g my work i n SYNCHEM i n working on the a p p l i c a t i o n of a r t i f i c i a l i n t e l l i g e n c e methods to problems i n Mass Spectrometry with the H e u r i s t i c Dendral p r o j e c t at Stanford U n i v e r s i t y and a l s o with Professor Ivar Ugi at Munich i n a s s i m i l a t i n g h i s a l g e b r a i c approach to the representation of r e a c t i o n s . Upon Todd Wipke's i n v i t a t i o n to p a r t i c i p a t e i n t h i s Symposium I have e l e c t e d to set f o r t h i n a very s p e c i f i c manner my proposals on how I would t a c k l e the task of synthesis planning i n the l i g h t of my current understanding of the advances that have been made i n the methods of a r t i f i c i a l i n t e l l i g e n c e . Thus, I have three aims i n w r i t i n g t h i s paper:
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
150
COMPUTER-ASSISTED
ORGANIC
SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
a) Propose a system to augment e x i s t i n g synthesis search systems by focusing on the issues of search management. To t h i s end the concept o f search modelling i s introduced and two search models are presented. b) C l a r i f y the f i n e d i s t i n c t i o n between s e l e c t i o n of transform by Relevance criteria and by Applicability criteria. S e l e c t i o n by A p p l i c a b i l i t y defines what i s u s u a l l y c a l l e d the State Sgace and search i n the s t a t e space u s u a l l y grows the synthesis path uniformly from one end only. S e l e c t i o n by Relevance i s not used by any e x i s t i n g system which when used y i e l d s the powerful Planning Space. The search i n a planning space can take "leaps" along the synthesis sequence. c) I n d i c a t e that a combination of search i n both the s t a t e space and the planning space i s p o s s i b l e and that t h i s a f u n c t i o n of the search model employed. The second search model described i n the paper allows the search for synthesis to go forward, backward and to leap into the middle under c o n t r o l l e d c o n d i t i o n s . I t i s hoped that the advantages of t h i s way o f developing a system w i l l include: upgrading the r o l e of the chemist from one of r a t i n g , pruning and s e l e c t i n g subgoals or precursors to that of g i v i n g i n j u n c t i o n s to the system i n the form of r u l e s added, removed or modified as the search proceeds a few steps at a time; the i n t r o d u c t i o n o f the s t r a t e g i c and judgmental knowledge i n the program i n a manner that decoupled from the knowledge o f r e a c t i o n chemistry; and the a b i l i t y t o experiment e a s i l y with v a r i o u s models of search management that are q u a l i t a t i v e l y d i f f e r e n t from each other. The payoffs foreseen are of genuine importance to those of us concerned with the techniques o f a r t i f i c i a l i n t e l l i g e n c e and when proven p r a c t i c a b l e should be e x c i t i n g and c h a l l e n g i n g to chemists as w e l l . I t must be stated at the outset that these new ideas on search modelling are an outgrowth of my attempts to develop a system for common sense reasoning about human a c t i o n s using a p s y c h o l o g i c a l theory of a c t interpretation [Schmidt, 1976; Sridharan, 1975; 1976]. This system has c a p a b i l i t i e s
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRIDHARAN
Chemical
Synthesis
Planning
by
Computer
151
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
of s e l e c t i n g and applying transforms associated with act names, s t r u c t u r i n g them i n t o a plan and reasoning both forward and backward along the plan s t r u c t u r e . The strategy of act i n t e r p r e t a t i o n i s coded i n the form of r u l e s e t s . The framework adapted for t h i s work is called Meta D e s c r i p t i o n System (MDS) [ S r i n i v a s a n , 1973; 1976], designed and developed by S r i n i v a s a n . I s h a l l explore here i n d e t a i l the use of t h i s framework i n the management of search for chemical s y n t h e s i s . The connections between the p s y c h o l o g i c a l theory and synthesis planning s t r a t e g i e s w i l l be l e f t for l a t e r e x p o s i t i o n . There are three major conceptual approaches [See Sridharan, 1974 f o r a review] that have been taken on the issue of a computer mediated design of chemical synthesis plans and they share a c e n t r a l idea among them - that of searching a space of p o s s i b i l i t i e s i n a systematic manner using e m p i r i c a l knowledge as appropriate. The very power of these approaches stems from the s y s t e m a t i z a t i o n of search of the space of possibilities. For those approaching synthesis as fertile grounds for designing and b u i l d i n g computer programs that solve d i f f i c u l t i n t e l l e c t u a l problems [Sridharan, 1971, 1973, 1974; G e l e r n t e r , 1973, 1976] the main problems are twofold. F i r s t , the a c q u i r i n g and packaging of knowledge of the r e a c t i o n s i n a form s u i t a b l e for use w i t h i n a program has to be done c a r e f u l l y and the success of the program depends upon the correctness and extent of the knowledge base. Second, the techniques of conducting search with an incomplete, uncertain and possibly inconsistent knowledge base have to be customized to the task of chemical s y n t h e s i s . The interactive problem solving concepts developed [Corey, 1969; Wipke, 1973, 1976] are a t t r a c t i v e because of t h e i r thoroughness and the chemist user tends to approach the system with hopes that the d i s c i p l i n e d e x p l o r a t i o n of pathways w i l l ensure him that he has not overlooked any reasonable alternative.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
152
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
The t h i r d approach taken t o computer methods i n chemical synthesis i s one of f o r m a l i z i n g the set of p o s s i b l e r e a c t i o n s [Ugi, 1976], l i m i t i n g the use of e m p i r i c a l knowledge to the s e l e c t i o n r u l e s for these reactions. This method seeks out novel reaction schemes and can suggest to the chemist routes that could not be found by the other methods. However, a reasonable synthesis program i s yet to emerge from t h i s approach. The a l g e b r a i c c h a r a c t e r i z a t i o n of the search space however o f f e r s the p o t e n t i a l of h i g h l y i n t e r e s t i n g s t r u c t u r e s to be defined on the search space and might be very successful i n the long run. A l l of these methods c u r r e n t l y u t i l i z e a structure generally c a l l e d the H e u r i s t i c Method. THE
search Search
HEURISTIC SEARCH METHOD
The H e u r i s t i c Search Method [ N i l s s o n , 1971; Newell & Simon, 1972] i s a usual f i r s t approach t o problem s o l v i n g i f the s p e c i f i c a t i o n of the problem i t s e l f i s given p r e c i s e l y as a Goal S i t u a t i o n , and the s o l u t i o n required i s some sequence of Transforms i . e . Operators that can e f f e c t i v e l y transform the current s i t u a t i o n to the goal s i t u a t i o n . I t i s important that the allowable operators are f i n i t e and are w e l l - d e f i n e d . The problem s o l v i n g procedure involves the use of a Search Model that i s used i n guiding the search for a s o l u t i o n . The h e u r i s t i c nature of the procedure a r i s e s from the use of approximate methods of evaluating progress towards a s o l u t i o n and of assessing the merit and p o t e n t i a l of any p a r t i a l s o l u t i o n sequence. The user gives up in p r i n c i p l e the completeness and o p t i m a l i t y o f the search process gaining i n f a c t more frequent demonstrations o f s u c c e s s f u l problem s o l v i n g [Newell & Simon, 1972]. In chemical synthesis the goal s t a t e i s the target molecular s t r u c t u r e and the operators t o consider for c o n s t r u c t i n g a s o l u t i o n sequence are molecular r e a c t i o n s . The search for a synthesis plan proceeds backwards from the target molecular s t r u c t u r e by considering and applying r e a c t i o n s i n the r e t r o - s y n t h e t i c d i r e c t i o n . The process i s a s e r i a l one and has an "inner loop" that repeatedly asks
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical Synthesis Planning by Computer
153
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
i t s e l f the questions "Should the process stop now?" and "What next?". The process can be s u c c e s s f u l , i n t e r e s t i n g and powerful depending on the use of a proper Search Model to answer these questions. The answer to the question "What next?" comes i n two p a r t s : the choice of a molecular s t r u c t u r e that can be s e t up as a subgoal to complete the s y n t h e s i s , and the choice of the operator to t r y on that subgoal. The information a v a i l a b l e to the process i n a r r i v i n g at the answers i s of two kinds, both of which must be included i n the search model. The first kind comprises the catalog of s t a r t i n g m a t e r i a l s , the l i b r a r y of r e a c t i o n s , t e s t s of a p p l i c a b i l i t y of the r e a c t i o n s , the a p r i o r i merit r a t i n g f o r the goodness ( y i e l d , s p e c i f i c T t y etc.) o f the r e a c t i o n and includes i n general information that i s made a v a i l a b l e to the system p r i o r to the a c t u a l statement of the problem to solve. The nature ancf extent oT t h i s pr i o r information determines the s t r u c t u r e of the search possibilities. The other kind includes information that becomes a v a i l a b l e as the search proceeds and constitutes a r i c h body o f information that i s s p e c i f i c to the problem at hand. Only limited examples of the use of the second kind of information can be shown i n current programs, perhaps the c l e a r e s t one i s the placement of p r o t e c t i o n r e a c t i o n s f o r s e n s i t i v e f u n c t i o n a l groups. The manner i n which such problem s p e c i f i c Information i s c o l l e c t e d , organized and used t o guide search determines the character of the a c t u a l search performed for a given problem. THE PROBLEM SOLVING GRAPH MODEL.
AS
A
MINIMAL SEARCH
The Problem S o l v i n g Graph (PSG) [Gelernter, 1962] i s a g r a p h i c a l model of the dynamics of the search that i s conducted on a given problem, and c o n s i s t s of a root node representing the target molecule l i n k e d to a cascade of descendants which are one-level precursors to those at a l e v e l higher i n the PSG. The H e u r i s t i c Evaluation function assigns numerical weights (or i n some cases elements of an ad hoc s c a l e of d i s c r e t e values) t o each node i n the PSG. T y p i c a l l y , the node evaluation i s based not only on p r o p e r t i e s of the s i t u a t i o n designated by the node and
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
154
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Q
NODE
A N D - 0 R P R O B L E M SOLVING GRAPH MODEL Figure
1.
And-or
problem
solving
graph
model
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical
Synthesis
Planning
by
155
Computer
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
the operator a p p l i e d to generate the node, but a l s o on the e n t i r e path from the goal node to the node i n question. The PSG model i n combination with the h e u r i s t i c value assignments provides the search mechanism ready answers to the "What next?" question. The s t r u c t u r e of the PSG f o r the chemical synthesis problem, shown i n Figure 1, c o n s i s t s of AND-nodes when operators generate m u l t i p l e precursors a l l of which need to be made a v a i l a b l e f o r the r e a c t i o n to be s u c c e s s f u l l y executed, and OR-nodes designating the choice a v a i l a b l e among s e v e r a l operators a p p l i c a b l e to a node. The selection c r i t e r i a that are w e l l entrenched i n the Theory of H e u r i s t i c Search [ S l a g l e , 1971] f o r handling such AND-OR problem s o l v i n g graphs are: At an OR-node s e l e c t the most promising subgoal; At an AND-node s e l e c t subgoals s t a r t i n g from the most l i k e l y to f a i l to the l e a s t l i k e l y to f a i l . INFORMATION-GATHERING HEURISTIC SEARCH.
AS
COMPLEMENTARY
ACTIVITY
TO
The planning a c t i v i t y involves a v a r i e t y of d e c i s i o n s that r e q u i r e information not customarily included i n the i n i t i a l s p e c i f i c a t i o n s of à transform. Such d e c i s i o n s i n v o l v i n g reasoning about the search process beyond the information provided by the PSG model c a l l f o r a more elaborate search model and i s considered i n t h i s s e c t i o n . The PSG model may be viewed as a c o l l e c t i o n ready answers to the f o l l o w i n g set of questions: a) Does a node have subgoals? they?
How
many?
What
of are
b) What i s the s t a t u s of a node? Was a s u c c e s s f u l path found? Was i t t r i e d and f a i l e d on a l l paths? Are there descendant subgoals s t i l l open? c) Does the s i t u a t i o n ( i . e . molecule)
represented
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
in
COMPUTER-ASSISTED ORGANIC SYNTHESIS
156
this node occur elsewhere i n the PSG? Is the s i t u a t i o n c i r c u l a r i . e . c a l l i n g for s y n t h e s i z i n g X to synthesize X higher up? d) At an OR-node what i s the best subgoal to take up? At an AND-node what i s the precursor that should be tackled f i r s t ?
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
Now consider the f o l l o w i n g set of questions that go beyond the information maintained by the PSG model: a) When there i s an operator whose relevance to synthesizing a molecule is clear, but the a p p l i c a b i l i t y c o n d i t i o n s are not s a t i s f i e d , under what set of circumstances should the unsatisfied preconditions be made into subgoals? Under what c o n d i t i o n s should the operator be r e j e c t e d altogether? b) For a given molecule what i s the most s t r a t e g i c sequence for the i n t r o d u c t i o n of f u n c t i o n a l groups? c) When i s the sequence i n t r o d u c t i o n immaterial?
of
functional
group
d) Given a subgoal, can the paths explored for another s t r u c t u r a l l y homologous s t r u c t u r e be considered v a l i d here? e) Given a synthesis route (or partial route) involving a protection/unprotection reaction p a i r , should an attempt be made to d e r i v e a r e v i s e d route not i n v o l v i n g p r o t e c t i o n by resequencing some of the reactions? Answering these questions c a l l s for maintaining a r i c h e r and b e t t e r organized base of information about the search paths than that provided by the PSG and i t s h e u r i s t i c value assignments to the nodes. As an a l t e r n a t i v e to the Heuristic Search process, consider the f o l l o w i n g two-stage process: a) Perform some e x p l o r a t i o n i n the search space (be i t the State Space of the H e u r i s t i c Search described above, or the Planning Space to be discussed below). b) Gather the search information, analyze, amplify and
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical
Synthesis
Planning
by
Computer
157
use i t to guide f u r t h e r e x p l o r a t i o n .
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
A great deal of f l e x i b i l i t y and i n v e s t i g a t i v e power comes to us i f we separate the t o t a l system i n t o two components g i v i n g explicit charge of the e x p l o r a t i o n by search to one, and the a n a l y s i s and a s s i m i l a t i o n o f the search information to the other. We gain conceptual c l a r i t y i n t h i n k i n g about the r u l e s for search guidance and set about designing novel Search Models with a new ease and v i g o r . I w i l l describe b r i e f l y the information gathering system which has been developed a t Rutgers and show by example a novel form of search model i n the remaining sections of the paper. ΜΕΤΑ-DESCRIPTION SYSTEM: A PARADIGM FOR INFORMATION-GATHERING The s t r u c t u r e of a system described i n the f a c i l i t y of MDS [ S r i n i v a s a n , 1973 & 1976; Sridharan, 1975] concepts i s radically d i f f e r e n t from the procedure based systems to which we are accustomed. I t i s more f a v o r a b l e , t h e r e f o r e , to introduce the system d i r e c t l y by an example. Table I presents the STRUCTURAL DESCRIPTIONS of the c l a s s e s of e n t i t i e s that are involved i n b u i l d i n g the PSG model. The c l a s s PSGRAPH designates the Problem S o l v i n g Graph whose ELEMENTS are NODES. There are two important c l a s s e s that help structure c o l l e c t i o n s of NODES i n t o OR-NODES and AND-NODES. The l a t t e r three c l a s s e s have MERIT and STATUS r e l a t i o n s associated with them, shown i n the Table r e l a t i n g these nodes to INTEGER values for MERIT, and a c l a s s c a l l e d STATUS f o r the STATUS r e l a t i o n . There are three values of STATUS defined as constants v i z . , OPEN, FAILED and SUCCEEDED. The PSGRAPH has a GOAL which i s a NODE and a c o l l e c t i o n of open nodes and f a i l e d nodes. The TRIALNODE designates the node to take up as subgoal when the search process i s set to explore the space again. The PSGRAPH has a r e l a t i o n STATUS which i s intended to i n d i c a t e the c o n d i t i o n s under which the process should terminate. NODES are r e l a t e d t o the OR-NODEs v i a the SUBGOAL r e l a t i o n , the OR-NODEs i n d i c a t e CHOICES of AND-NODEs and the AND-NODES i n turn INCLUDE any number of NODES
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC
158 (*
TABLE
(* (CONSTANTS (CONSTANTS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
(*
I
*)
CONSTANTS
OF T H E DOMAIN
(YESNO (YES NO))) (STATUS (SUCCEEDED
STRUCTURAL
PSG
FAILED
DESCRIPTIONS
*)
OPEN)))
*)
(TON:
[PSGRAPH
(element N O D E elementof) (goal N O D E goalof) (opennodes NODE opennodeof) (failednodes NODE failednodeof) (status STATUS statusof) (trynode NODE|AND-NODEIOR-NODE trynodeof)])
(TDN:
[NODE (subgoal OR-NODE subgoalof) (subnode NODE subnodeof) (descendant NODE descendantof) (status STATUS) ( s i t u a t i o n MOLECULE) (merit INTEGER meritof) (repeated NODE repeatedby) ( c i r c u l a r NODE)])
(TDN:
[OR-NODE (subgoal AND-NODE) (status STATUS) (merit INTEGER)])
(TDN:
[AND-NODE (subgoal NODE) (status STATUS) (merit INTEGER)])
(TDN:
[MOLECULE ( s t r u c t u r e CHEMICAL-GRAPH) ( a v a i l a b l e YESNO)]) (* SENSE DEFINITIONS *)
(QSCC: [((NODE Ν) | (X elem Ν) (N status OPEN)) PSGRAPH opennodes]) (* Flags s p e c i f i a b l e on the r e l a t i o n s have been l e f t out f o r s i m p l i c i t y *)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
SYNTHESIS
7.
Chemical
SRIDHARAN
Synthesis
Planning
by
Computer
Table I. Continued I
(QSCC:
[ ( ( N O D E N) PSGRAPH
(P elem Ν )
(N s t a t u s
FAILED))
(QSCC:
[ ( ( S T A T U S S) I (X ( g o a l s t a t u s ) S ) )
failednodes] ) PSGRAPH
status]) (QSCC:
[((STATUS
S) I
((X
(subgoal s t a t u s ) SUCCEEDED) => (X s t a t u s SUCCEEDED)) ( ( ( N O T [X (subgoal s t a t u s ) F A I L E D ] ) AND
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
(NOT
[X S t a t u s S U C C E E D E D ] ) ) (X S t a t u s OPEN))
=>
( ( ( A L L NODE N)(X subgroup Ν) (N s t a t u s
FAILED))
(X S t a t u s F A I L E D ) ) ) OR-NODE
status]) (QSCC:
[((STATUS
S) I
((X
(subgoal s t a t u s ) F A I L E D ) => (X s t a t u s F A I L E D ) ) (((X (subgoal s t a t u s ) OPEN) (NOT [X S t a t u s F A I L E D ] ) ) => (X s t a t u s OPEN)) (((NOT
(NOT
[X S t a t u s
FAILED])
[X S t a t u s O P E N ] ) ) => (X S t a t u s S U C C E E D E D ) ) )
AND-NODE
status]) (QSCC:
[ ( ( N O D E A) I
(A (subgoal
subgoal
subgoal)
X
NODE
subgoalof]) (QSCC:
[ ( ( N O D E A) |
(X s u b g o a l o f A) OR
(X (descendantof s u b g o a l o f ) A ) ) NODE
descendantof]) (QSCC:
[((STATUS
((X
S) I
( s i t u a t i o n a v a i l a b l e ) YES)
=>
(X s t a t u s SUCCEEDED)) ((X (subgoal s t a t u s ) SUCCEEDED) => (X S t a t u s SUCCEEDED)) ((X c i r c u l a r Y) => (X s t a t u s F A I L E D ) ) ) NODE
status]) (QSCC:
[ ( ( N O D E R) I
(X ( s i t u a t i o n
s i t u a t i o n o f ) R))
NODE
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
160 Table I.
ORGANIC SYNTHESIS
Continued
repeated]) C) I (X repeated C) (X descendantof C))
(QSCC:
[((NODE NODE
(QSCC:
[ ( ( I N T E G E R I)
circular]) |
(X (subgoal merit) I ) (NOT [X (subgoal merit >=) I ] ) )
OR-NODE
merit]) (QSCC: [ ( ( I N T E G E R I ) | (X (subgoal merit) I ) (NOT [ I (>= meritof subgoalof) X])) Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
AND-NODE
merit]) (* PRODUCTION RULES GUIDING SEARCH *) [INITIALIZE (IT (PSGRAPH P)) (INPUT (P goal G)) (IR (P goal G)) (IR (P opennodes G))] (G s t a t u s SUCCEEDED) => (OUTPUT G)(HALT) (G Status FAILED) => (OUTPUT G)(HALT) (G element Χ)(X c i r c u l a r Y) => (ASSERT (X Status FAILED)) (NOT [G trynode X]) => (ASSERT (G trynode (G g o a l ) ) ) (G trynode X)(NOT [X subgoal Y]) => (ASSERT (G trynode NIL))(SPROUT X)(EVALUATE X) (G trynode Χ) (X subgoal Y) (X (merit meritof) Y) => (ASSERT (G trynode Y)) (* End of Table I *)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
thus c l e a r l y graph.
Synthesis
exhibiting
Planning
by
Computer
the AND/OR
161
nature
of the
Let us turn our a t t e n t i o n b r i e f l y to the SENSE DEFINITIONS which are s p e c i f i c a t i o n s of the L o g i c a l c o n d i t i o n s that are to be met f o r a s s e r t i n g the various relations and are at the same time s p e c i f i c a t i o n s of the computations to be performed i f the system i s given the r e s p o n s i b i l i t y to f i l l i n values for c e r t a i n r e l a t i o n s . Simple d e f i n i t i o n s are given f o r the OPENNODES and FAILEDNODES r e l a t i o n s of the PSGRAPH c l a s s . The OPENNODES are the set of a l l NODES Ν which are ELEMENTS of X (denoting the PSGRAPH) whose STATUS i s OPEN. The STATUS of the PSGRAPH i s defined to be the status of the goal node of the PSGRAPH w r i t t e n [(STATUS S) I (X (goal status) S ) ] . The nature of the AND-NODES and the OR-NODES i s c l e a r l y s p e l l e d out i n the d e f i n i t i o n s for the STATUS r e l a t i o n s on these c l a s s e s . The d e f i n i t i o n f o r the status of the AND-NODE may be paraphrased i n t o E n g l i s h as f o l l o w s : a) I f X includes a node whose status then the status of X i s also FAILED; b) I f X status i s not FAILED and X OPEN node then the status of X i s OPEN;
i s FAILED
includes
f i n a l l y , c) I f X i s neither OPEN nor FAILED i t must be SUCCEEDED.
an
then
The information displayed i n Table I i s the description the user provides to the Information-gathering system as the s p e c i f i c a t i o n of the Search Model. The system accepts the S t r u c t u r a l D e s c r i p t i o n s and sets up Data Structures and access functions f o r each of the r e l a t i o n s . The Sense D e f i n i t i o n s are analyzed t o compile a Network of Information Flow [Sridharan, 1976] that p r e s c r i b e s the data flow paths when a new piece of information i s made a v a i l a b l e to t h i s system. This much could be termed the "compile-time" a c t i v i t y of the system.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
162
(* TABLE I I *) (* STRUCTURAL DESCRIPTIONS *) (TDN: [SEARCHGRAPH (* This i s the search graph of FLEXI) (elements NODE) (goal NODE) (trynode RNODEISNODEIDNODEIFNODE)])
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
(TDN: [SNODE (* State Space Structure (Backward Search)) (node NODE snode) (subgoal OR-NODE) (merit MERIT) (status STATUS) (subgoal NODE) ( c i r c u l a r SNODE) (features FEATURE) (descendant SNODE)]) (TDN: [RNODE (* Planning Space Structure) (node NODE mode) (redgoal RNODE) ( d i f g o a l DNODE) (merit INTEGER meritof) (status STATUS) (relevantfeatures FEATURE)]) (TDN: [DNODE (* State Space Structure (Forward Search)) (node NODE dnode) (tonode NODE) (fromnode NODE) ( d i f f e r e n c e s FEATURE) (merit MERIT) (status STATUS)])
(TDN: [FNODE (* Non-Goal-Directed Forward Search) (node NODE fnode)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRIDHARAN
Table II.
Chemical
Synthesis
Planning
by
Computer
163
Continued
(transform TRANSFORM appliedto) (reactivegroup FEATURE) (canproduce FNODE canbeproducedfrom) (merit MERIT meritof) (status STATUS)]) (TDN: [TRANSFORM
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
(* One i s needed f o r every growth) (appliedto FNODE transform) (product FNODE) (reagents MOLECULE)]) (* SAMPLE PRODUCTION RULES *) (ADDED (SNODE S)) => (ASSERT ((S node) fnode NIL)) (* No forward e x p l o r a t i o n i f S was given as a r e t r o s y n t h e t i c goal) (ADD (FNODE D))=> (FILLIN (INSTANTIATE (DNODE (fromnode &(N node)) (tonode &(N (canbeproducedfrom node dnode tonode)))))) (* I f Ν was generated by forward e x p l o r a t i o n t r y a s s e r t i n g a DNODE subgoal i f the information needed i s there) (* SENSE DEFINITIONS *) (QSCC: [((DNODE D) | (D tonode (X goalnode))) RNODE difgoal]) (QSCC: [((DNODE D) | ((X (node status) SUCCEEDED) => (X status SUCCEEDED)) (((X ( d i f g o a l status) SUCCEEDED) (X (redgoal status) SUCCEEDED)) => (X status SUCCEEDED)) (((X ( d i f g o a l status) FAILED)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
164
COMPUTER-ASSISTED ORGANIC SYNTHESIS
OR (X (redgoal status) FAILED)) => (X status FAILED))) RNODE status])
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
(QSCC: [((NODE Y) | (X (difnodeof goalnode) Y)) DNODE tonode]) (QSCC: [((STATUS S) | ((X (fromnode canproduce* node) (X tonode)) => (X Status SUCCEEDED)) ( (X (goalnode goalnode descendant node) (X fromnode)) => (X Status SUCCEEDED))) DNODE status] ) (QSCC: [((STATUS S) | ( (X (node status) SUCCEEDED) => (X status SUCCEEDED)) ((X (subgoal status) SUCCEEDED) => (X Status SUCCEEDED))) SNODE status] ) (QSCC: [((STATUS S) | (((X (mode status) SUCCEEDED) OR (X (snode status) SUCCEEDED) OR (X ( s i t u a t i o n a v a i l a b l e ) YES)) => (X s t a t u s SUCCEEDED) ) ) NODE status])
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical
Synthesis
Planning
by
Computer
165
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
The planning and problem s o l v i n g i s i n i t i a t e d and c o n t r o l l e d by a s e t of r u l e s that we s h a l l examine p r e s e n t l y . The i n i t i a l i z a t i o n i s s t r a i g h t f o r w a r d and involves c r e a t i n g an instance of PSGRAPH and f i l l i n g i t s goal node. This i n d i c a t e s that i t i s the s p e c i f i c a t i o n of the goal that t r i g g e r s the process. The s p e c i f i c a t i o n of the goal node involves submitting the s t r u c t u r e of the molecule to be synthesized. Turning our a t t e n t i o n away from the Rule s e t that c o n t r o l s the problem s o l v i n g process, l e t us consider the information gathering a c t i v i t y caused by the a d d i t i o n of a subgoal node f o r some operator explored by the search component of the system. This may be s p e c i f i e d as a conjunction of precursor molecules which are INCLUDED i n an AND-NODE. Consider the a c t i o n taken when one of the precursors i s asserted as a SUBGOAL of the GOAL node. The check of the c o n d i t i o n f o r the (NODE subgoalof NODE) r e l a t i o n i n d i c a t e s that t h i s AND-NODE needs to be introduced as a choice i n the OR-NODE pointed to by the GOAL node [ t h i s i s i n d i c a t e d by the expression (A (subgoal choice includes) X ) ] . The consequent a s s e r t i o n of the CHOICE r e l a t i o n flows along i t s data flow path to the STATUS relation of the OR-NODE i n question* It is appropriate to point out here that t h i s data flow l i n k was "compiled" when the d e f i n i t i o n of STATUS was scanned and i t was e s t a b l i s h e d then that any additions/changes to the CHOICE of an OR-NODE was to take e f f e c t i n turn on the STATUS r e l a t i o n . A v e r b a l d e s c r i p t i o n such as t h i s one cannot describe a l l the data flow that takes place but h o p e f u l l y the above explanation conveys the concept of the data flow and consequent " informations-gather ing" proceeding as per the s t r u c t u r a l and sense d e f i n i t i o n s of the Search Model given by the user f o r t h i s problem domain. At t h i s p o i n t , i f the reader w i l l grant that as the r e s u l t s o f the e x p l o r a t i o n conducted by the search component gets fed t o the Information-Gathering component the r e q u i s i t e search model w i l l be created or updated as appropriate, we can turn our a t t e n t i o n back again to the Search Guidance Rules. The Rules are w r i t t e n i n what i s known i n the computer science l i n g o as the "Production Rule Form" [Davis & King, 1975].
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
166
COMPUTER-ASSISTED ORGANIC SYNTHESIS
The production system takes a sequence c o n d i t i o n a l a c t i o n s p e c i f i c a t i o n s of the form:
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
RULE :
of
(Condition to check)=>(Action to take)
In applying one of these r u l e s the l e f t - h a n d side of the Rule i s f i r s t tested against the current s t a t e of the model and i f the t e s t i s s a t i s f i e d the a c t i o n s of the r i g h t side of the r u l e are performed. There are a v a r i e t y of r u l e sequencing methods conceivable, but we s h a l l use only the simplest of them here. The c o n t r o l s t a r t s from the beginning of the r u l e sequence and t r i e s each r u l e and c y c l e s back to the f i r s t r u l e a f t e r the l a s t r u l e i s t r i e d . The execution of a (HALT) i n some a c t i o n component of a r u l e terminates the e n t i r e process. The r u l e set for the PSG model i s very simple. I n i t i a l l y , the status of the PSGRAPH i s examined to see i f the process should terminate. The r u l e s w r i t t e n here s p e c i f y that i f the status of the PSGRAPH i s FAILED or SUCCEEDED then the graph i s output and the process h a l t s . Otherwise, I f there i s a node marked as a p o s s i b l e node to sprout, the goal node i s picked f i r s t . On the other hand, the presence of a trynode which has no subgoals i n d i c a t e s that the s e l e c t i o n i s completed and the a c t i o n to take i s to SPROUT the trynode and EVALUATE it. The responsibility of SPROUT is to choose a transformation, t e s t i t s a p p l i c a t b i l i t y and give the set of precursors to Modelling System. The Modelling System then updates the model and posts the tree h i e r a r c h y , the c i r c u l a r i t y and s t a t u s r e l a t i o n s . The evaluation a l s o submits i t s merit r a t i n g of the nodes to the Modelling System which i n t u r n , reassigns the merits of the a f f e c t e d AND-NODES and OR-NODES. The i m p l i c a t i o n s of the new information so entered are followed by the Modelling System and the information needed by the r u l e set i s provided i n a ready form. The control repeatedly flows in a SELECT-SPROUT-EVALUATE loop u n t i l h a l t e d . The r u l e set i s f l e x i b l e and can be changed i f a s p e c i f i c synthesis problem suggests a d i f f e r e n t form of c o n t r o l . I t i s not d i f f i c u l t to enter s y n t a c t i c guidance based on the model graph using the distance of a node from the g o a l , the number of conjuncts at an
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical Synthesis Phnning by Computer
167
AND-NODE e t c . . I t i s also p o s s i b l e to guide the search based on chemical information, f o r example, to disregard subgoals involving a seven-membered heteratomic r i n g . I t i s conceivable that when the system runs i n t e r a c t i v e l y the guidance w i l l be changed and experimented with as the search proceeds.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
SEARCH IN A PLANNING SPACE The s t r u c t u r e of the search space i s determined not only by the c o l l e c t i o n of transforms a v a i l a b l e to the system but also by the r u l e s f o r s e l e c t i n g the transforms to be t r i e d f o r any subgoal. There are two basic ways f o r s e l e c t i n g transforms. a) S e l e c t i o n by A p p l i c a b i l i t y . I f the transforms are selected because they guarantee that the target molecular s t r u c t u r e w i l l be produced upon their a p p l i c a t i o n , the required precursors become subgoals. This i s the method of s e l e c t i n g transforms by t h e i r a p p l i c a b i l i t y i n the r e t r o s y n t h e t i c d i r e c t i o n . The synthesis sequence grows a step at a time ensuring that the target molecular s t r u c t u r e w i l l be a product of the r e a c t i o n sequence developed so f a r , once the precursors are made available. The space of p o s s i b i l i t i e s determined by this criterion of a p p l i c a b i l i t y i s termed the STATE SPACE. b
) S e l e c t i o n by Relevance. In some cases a transform i s selected because i t produces a molecule only s i m i l a r to the target molecular s t r u c t u r e but not e x a c t l y the same. When such a transform i s applied to a target s t r u c t u r e T, i t may synthesize some s t r u c t u r e S that i s s i m i l a r to T, from a set of precursors P. This breaks the o r i g i n a l problem of synthesis i n t o two subproblems, i) Synthesize P. S(P) i i ) Transform S==>T. TR(S,T) The transform selected s p e c i f i e s a r e a c t i o n that converts Ρ to S, and t h i s transform w i l l , i n general, c o n s t i t u t e an intermediate step i n the synthesis sequence. The space of p o s s i b i l i t i e s determined by the c r i t e r i o n of relevance i s termed the PLANNING SPACE and i n several s i t u a t i o n s the search toward a
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
168
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
Scheme I.
Reaction used in
transform
a.
TRANSFORM
SYNTHESIZE
b.
TRANSFORM
SYNTHESIZE Scheme II.
Pfonning
space
maneuvers
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical
Synthesis
Planning
by
Computer
169
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
s o l u t i o n can be b r i e f e r i n t h i s space than i n the State Space. This d e f i n i t i o n of a Planning Space i s a v a r i a t i o n of the concept introduced i n the General Problem Solver system (GPS) [Ernst, 1969]. The d i s t i n c t i o n between the S e l e c t i o n of transforms by A p p l i c a b i l i t y and by Relevance i s an important one when considering the s t r a t e g i e s one might employ to search f o r synthesis sequences. The search i n a Planning Space has the c h a r a c t e r i s t i c that the search "leaps" i n t o some intermediate point i n the synthesis sequence and e s t a b l i s h e s an " i s l a n d " and the s o l u t i o n search could then proceed from the i s l a n d to the target molecule i n the forward d i r e c t i o n or from the i s l a n d backward i n the r e t r o s y n t h e t i c d i r e c t i o n toward a v a i l a b l e molecules. The s i g n i f i c a n c e of t h i s a b i l i t y to leap has been explored i n other task areas than synthesis search and has been found to be a powerful t o o l i n converging on s o l u t i o n s r a p i d l y . I t s u t i l i t y f o r synthesis search remains to be shown and for now can be i l l u s t r a t e d only i n terms of examples. The f o l l o w i n g sketch of an example i s o f f e r e d to i l l u s t r a t e the idea of planning space. The example has not been checked by any chemist and thus i t s chemical correctness cannot be assured. Wipke [Wipke, 1976] uses the example of a r e a c t i o n that synthesizes an a l c o h o l group i n 1,4 r e l a t i o n to an e l e c t r o n withdrawing group, say C=0, by the opening of epoxide by a s t a b i l i z e d i o n . Consider the target s t r u c t u r e shown i h Scheme I . The c r i t e r i o n of a p p l i c a b i l i t y would require that the target s t r u c t u r e contain the -C(OH)-C-COsubstructure and would not be a p p l i c a b l e i n the State Space search. In search conducted i n the Planning Space, i f the transform i n d i c a t e d i s considered relevant to the target s t r u c t u r e (e.g., i f the presence of the a l c o h o l and an unsubstituted 1,4 carbon i s s u f f i c i e n t ) then t h i s transform may be used. The o r i g i n a l synthesis problem i s replaced with two subproblems shown i n Scheme I I .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
170
The above example could be s u c c e s s f u l l y completed by working forward reducing the d i f f e r e n c e between S and T, and working backward i n s y n t h e s i z i n g P. COMBINING SEARCH IN PLANNING AND STATE SPACE The search i n a planning space should be conducted by t a k i n g repeatedly the synthesize S type problem for each Ρ s p l i t t i n g i t each time i n t o a 'Synthesize and a 'Transform' type problem, d e f e r r i n g a l l the Transform TR type problems t i l l some a v a i l a b l e molecular s t r u c t u r e i s reached along a path. This w i l l generate a S k e l e t a l Plan where many of the intermediate steps are l a c k i n g i n d e t a i l but each one i s given as a TR type problem. I f an e v a l u a t i o n function can designed for these s k e l e t a l plans much useless search can be avoided by using the planning space.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
1
The search i n the s t a t e space i s conservative and takes small steps attempting to make steady progress towards completing a set of s o l u t i o n s . This can cause e i t h e r aimless wandering because small changes i n the merit values assigned to subgoals cause no s i g n i f i c a n t s h i f t s of a t t e n t i o n or because the changes i n the merit values cause l a r g e abrupt changes i n behaviour. This has been given the graphic name of the Mesa Phenomenon by Minsky [Minsky, 1963]. The planning space s t r u c t u r e s the solution sequendes q u i t e d i f f e r e n t l y and causes a more g o a l - d i r e c t e d search to proceed. The method of transform s e l e c t i o n allows the program to "leap" i n t o the s o l u t i o n sequence and decide upon one of the intermediate r e a c t i o n s and permits the s o l u t i o n to grow i n both d i r e c t i o n s . It appears that a j u d i c i o u s combination of both the State Space and Planning Space search methods might be able to overcome some of the d i f f i c u l t i e s found i n each of the methods [Amarel, 1969] With these c o n s i d e r a t i o n s i n view the next s e c t i o n introduces a framework i n which to combine the two spaces, l e t t i n g the chemist user supply the search guidance r u l e s customized to the p a r t i c u l a r problem at hand. FLEXI:
A FLEXIBLE ADVANCED SEARCH MODEL
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
7.
SRiDHARAN
Chemical
Synthesis
Planning
by
Computer
171
Given a c o l l e c t i o n of nodes designating molecules there are the f o l l o w i n g types of tasks one can generate (Table I I ) : a) For a given node whose molecule i s not yet synthesized, develop a synthesis by working backwards in the s t a t e space using a p p l i c a b l e transforms. b) For a given node whose molecule i s not yet synthesized, develop a synthesis by problem reduction using relevant transforms. c) For a given ordered p a i r of nodes develop a synthesis route that transforms one molecule to the other. d) For a given node execute a b r i e f non-goal-directed exploration forwards using reactions i n their conventional d i r e c t i o n s . A search model i s introduced here, c a l l e d FLEXI, in which four types of s t r u c t u r e s are used to symbolize the above four categories of tasks and these four nodes form the BUILDING BLOCKS of the search management model. Figure 2 shows that a NODE designates a molecule and i n d i v i d u a l l y can be set up as an SNODE for searching r e t r o s y n t h e t i c sequence, or as an RNODE f o r search for a planning route. The FNODE i s used t o s e t up the node f o r forward e x p l o r a t i o n without a goal guidance. The sprouting of an SNODE generates a piece of the f a m i l i a r AND/OR problem s o l v i n g graph and the status r e l a t i o n s on SNODE, OR-NODE and AND-NODE are posted s i m i l a r to that given e a r l i e r . The sprouting of an RNODE R l , see Figure 3, c o n s t i t u t e s a step i n the planning space and generates two tasks by problem reduction - an RNODE R2 and a DNODE D l . R2 c a l l s for the synthesis of a molecule by further problem reduction and the DNODE sets up a problem of transforming one molecule i n t o another. The transform used i n the sprouting of R l i s used to e s t a b l i s h that i t canproduce the fromnode N3 of the DNODE from the node N2 of R2. This i s e x h i b i t e d i n Figure 3. The new task set up i n the RNODE could of course be r e s t r u c t u r e d as an SNODE task by some r u l e i n the production system. The RNODE w i l l be considered successful as soon as the NODE connected t o i t has
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
172
COMPUTER-ASSISTED ORGANIC
SYNTHESIS
FNODE
Ο
fnode situation
NODE
MOLECULE
RNODE
SNODE
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
Figure
2.
NODE
building
block
RNODE Rl node
redgoah
\difgoal
(RNODE R2 (
)DNODE D1
-K
) NODE N1 fnode
\snode
Ç)FNODE F3
(
)SNODE S2
SNODE SI
Figure
3.
The RNODE
building,
block
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical Synthesis Planning by Computer
173
SUCCEEDED, whether by i t s RNODE or the SNODE tasks or even by i t s designating a molecule which i s a v a i l a b l e i n the Catalog of S t a r t i n g Compounds.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
Under c e r t a i n circumstances a synthesis f o r the fromnode N3 of the DNODE could be attempted independently and i t s success w i l l d i s a l l o w the RNODE R2 from further c o n s i d e r a t i o n during search. In that case, the success of the DNODE i s s u f f i c i e n t to guarantee the success of RNODE R l and thereby of NODE Nl. The DNODE can succeed e i t h e r when a path i s found by working forwards from FNODE F2 t o F3 or by working backwards from SNODE S2 t o SI. Figure 4 i l l u s t r a t e s the canproduce relation among p a i r s of FNODES. Working forwards from F l i f a molecule M2 r e s u l t s from a transformation i n v o l v i n g Ml then the corresponding FNODE F l can produce F2. The s t r u c t u r e s e x h i b i t e d here are only the b u i l d i n g blocks of the search model. By s u i t a b l y c o n t r o l l i n g and guiding the e x p l o r a t i o n , the search can take on great v a r i e t y t r a v e r s i n g the s t a t e space forwards or backwards or t r a v e r s i n g the planning space. The r u l e s e t can be made to contain broad i n j u n c t i o n s such as "Do not explore a node i n the forward d i r e c t i o n i f i t was created by i n s t a n t i a t i n g a SNODE, i . e . a node to proceed i n the r e t r o s y n t h e t i c d i r e c t i o n " ' or "When you add an FNODE immediately i n s t a n t i a t e a revised DNODE type task" as shown i n Figure 5. Of course, these two r u l e s are used here only as examples and i n given s i t u a t i o n s one might wish to advise the system otherwise. The e s s e n t i a l point i s that a f l e x i b l e form of search guidance s p e c i f i c a t i o n i s a v a i l a b l e and can be used t o bring t o bear on a given problem a wide v a r i e t y of h i n t s , suggestions and advice that would be d i f f i c u l t i n a standard H e u r i s t i c Search program. Overcoming some d i f f i c u l t i e s of H e u r i s t i c Search. One cause of the Mesa Phenomenon i n the case of chemical synthesis i s the use of f u n c t i o n a l group substitution reactions while working i n the
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
Ο
node
ORGANIC SYNTHESIS
FNODE F2 product
\
canproduce transform
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
FNODE Fl Figure
4.
a p p l i e d
FNODE
t 0
building
T
R
A
N
S
F
0
R
M
block
( A D D E D ( S N O D E S)) => ( A S S E R T ((S node) fnode N I L ) ) (* No forward exploration if S was given as a retrosynthetic goal) (ADDED (FNODE D)) => (FILLIN (INSTANTIATE ( D N O D E (fromnode &(N node)) (tonode & ( N (canbeproducedfrom node dnode tonode ) ) ) ) ) ) (* If Ν was generated by forward exploration try asserting a D N O D E subgoal if the information needed is there)
Figure
5.
Sample guidance
rules
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
7.
SRiDHARAN
Chemical
Synthesis
Vlanning
by
Computer
175
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
retrosynthetic direction. I f a molecule containing several f u n c t i o n a l groups i s selected f o r sprouting then performing f u n c t i o n a l group s u b s t i t u t i o n s often yields precursors that have nearly the same merit as the target molecule. The presence of several f u n c t i o n a l groups only aggravates the s i t u a t i o n . Within FLEXI t h i s c l a s s of r e a c t i o n s could be used mainly f o r transforming a molecule to another when they are s t r u c t u r a l l y s i m i l a r , i . e . a DNODE type task which might be c a r r i e d out most favorably using f u n c t i o n a l group s u b s t i t u t i o n s . Thus, avoiding the use of these reactions i n the r e t r o s y n t h e t i c d i r e c t i o n should prevent the problem. As further s p e c i f i c problems are i s o l a t e d and solved, the framework of FLEXI may help us to submit the proper r u l e s of search guidance to the system. CONCLUSION The H e u r i s t i c Search method i s b a s i c a l l y very simple. I t involves a s e r i a l processor working backwards that s e l e c t s subgoals and transforms by asking "What next?". A good implementation of the h e u r i s t i c method includes a SEARCH MODEL which i s a symbolic representation of the progress of search. The Problem Solving Graph that i s commonly used to guide search i s a u s e f u l but l i m i t e d search model. Symbolization i s the key to Reasoning and the computer can reason only about things i t can handle s y m b o l i c a l l y . Furthermore, the richness of the search model c o n t r i b u t e s to the conduct of an i n t e l l i g e n t search. In t h i s paper a search modelling system i s described by examples. This system allows the user to describe rather than to program the search model and to associate c o n s t r a i n t s that govern the growth of the model. The system provides a Rule Language based on the user described search model and the user may then p r e s c r i b e r u l e s i n the form of Production Rules. The r u l e s the user w r i t e s can be general, being v a l i d over a wide v a r i e t y of task s p e c i f i c a t i o n s , or can be b i t s of advice and h i n t s about a given problem. The paper concludes by showing how some of the standard d i f f i c u l t i e s with h e u r i s t i c search such as g e t t i n g locked into a plateau can be overcome by s u i t a b l e
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
176
techniques o f s e a r c h m o d e l l i n g . The m o d e l l i n g i d e a s shown h e r e c a n be used t o c o n t r o l and specify p r o t e c t i o n r e a c t i o n s and f o r sequencing f u n c t i o n a l group i n t r o d u c t i o n . I t i s also possible to carry out different s t y l e s o f e x p l o r a t i o n v a r y i n g t h e emphasis on t h e d i r e c t i o n a l i t y and space i n w h i c h t h e s e a r c h i s conducted. The applicability o f t h e model t o complement a h e u r i s t i c search process i s now made c l e a r , however i t s a c t u a l use i n c h e m i c a l s y n t h e s i s planning awaits the p a r t i c i p a t i o n of a chemist collaborator !
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
ACKNOWLEDGMENT I wish t o thank Prof. Srinivasan for h i s stimulating discussions of h i s modelling ideas r e p r e s e n t e d i n MDS w i t h o u t which my u n d e r s t a n d i n g o f MDS would be meager. A s u b s e t o f MDS adequate t o d e a l w i t h PSG and FLEXI i s b e i n g programmed and made a v a i l a b l e b u t so f a r no s y n t h e s i s problems have been t r i e d o u t i n t h e s e frameworks. The p r e s e n t system i s implemented i n t h e language FUZZY [ L e F a i v r e , 1974]. I look forward t o the continued cooperation of Prof. L e F a i v r e i n t h e f u t u r e and h i s h e l p i n t h e p a s t i s hereby g r a t e f u l l y acknowledged. I w i s h t o thank P r o f . S a u l Amarel f o r h i s e x p e r t a d v i c e upon r e a d i n g t h i s paper.
REFERENCES 1.
C h a r l e s Schmidt [1976] U n d e r s t a n d i n g Human Action: R e c o g n i z i n g t h e P l a n s and M o t i v e s o f Other P e r s o n s , i n Cognition and Social B e h a v i o r , J. Carroll & J. Payne (editors), Lawrence Earlbaum P r e s s , 1976.
2.
N.S. S r i d h a r a n [1975] The Architecture o f BELIEVER: A System for Intepreting Human Actions. T e c h n i c a l Report RUCBM-TR-46, Department o f Computer S c i e n c e , University, New Brunswick N J .
3.
Rutgers
N.S. S r i d h a r a n [1976] The Architecture of BELIEVER-Part II. The Frame and Focus Problems in AI. T e c h n i c a l Report RUCBM-TR-47, Department o f Computer S c i e n c e , R u t g e r s University, New Brunswick N J .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
7. SRIDHARAN
Chemical Synthesis Planning by Computer
177
4.
Chitoor Srinivasan [1973] The Architecture o f a Coherent I n f o r m a t i o n System. Advanced P a p e r s o f t h e T h i r d International Joint C o n f e r e n c e on Artificial Intelligence, S t a n f o r d , 1973.
5.
Chitoor Srinivasan [1976] An I n t r o d u c t i o n t o t h e M e t a - D e s c r i p t i o n System. T e c h n i c a l Report SOSAP-TR-18, Department o f Computer Science, Rutgers University, New Brunswick NJ.
6.
N.S. S r i d h a r a n [1974] A Heuristic Program t o D i s c o v e r S y n t h e s e s f o r Complex O r g a n i c M o l e c u l e s . P r o c e e d i n g s o f the IFIP74 (International Federation for Information Processing) C o n g r e s s , S t o c k h o l m , August 1974.
7.
N.S. S r i d h a r a n [1971] An Application of Artificial Intelligence t o the D i s c o v e r y o f Complex O r g a n i c S y n t h e s i s . D o c t o r a l Dissertation, Department o f Computer S c i e n c e , SUNY a t Stony B r o o k , 1971.
8.
N.S. S r i d h a r a n [1973] Search Strategies f o r the t a s k o f O r g a n i c C h e m i c a l S y n t h e s i s . Advanced Papers o f the T h i r d International Joint Conference on Artificial Intelligence, Stanford, 1973.
9.
H. G e l e r n t e r [1973] The D i s c o v e r y o f O r g a n i c S y n t h e t i c Routes by Computer. T o p i c s i n C u r r e n t C h e m i s t r y , Volume 41, S p r i n g e r - V e r l a g , 1973.
10.
H e r b e r t G e l e r n t e r [1976] Empirical E x p l o r a t i o n s o f SYNCHEM, An Application o f Artificial Intelligence t o the Problem o f Computer-Directed Organic S y n t h e s i s D i s c o v e r y . This volume. 11.
E . J . Corey & W.T. Wipke [1969] C o m p u t e r - a s s i s t e d D e s i g n o f Complex O r g a n i c S c i e n c e , Volume 166 (178), 1969.
Synthesis,
12.
Todd Wipke [1973] C o m p u t e r - A s s i s t e d Three D i m e n s i o n a l S y n t h e t i c Analysis. I n Computer R e p r e s e n t a t i o n and M a n i p u l a t i o n o f C h e m i c a l I n f o r m a t i o n . W.T. Wipke et. al. (editors), John W i l e y (New York) 1974.
13.
Todd Wipke [1976]
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
178
COMPUTER-ASSISTED ORGANIC SYNTHESIS S E C S — S i m u l a t i o n and evaluation o f C h e m i c a l S y n t h e s i s : S t r a t e g y and P l a n n i n g . T h i s volume.
14.
I v a r U g i [1976] The S y n t h e t i c D e s i g n Program MATSYN as a p a r t o f MATCHEM, A System o f Computer Programs f o r t h e D e d u c t i v e Solution o f C h e m i c a l P r o b l e m s . T h i s volume.
15.
Nils Nilsson [1971] Problem S o l v i n g Methods in McGraw Hill (New York) 1971.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007
16.
Artificial
A l l e n N e w e l l & H e r b e r t Simon [1972] Human Problem Solving. Prentice-Hall 1972.
Intelligence.
(New J e r s e y )
17.
H e r b e r t G e l e r n t e r [1962] Machine-Generated Problem S o l v i n g Graphs. P r o c e e d i n g s o f a Symposium on t h e M a t h e m a t i c a l Theory o f Automata Polytechnic Institute o f B r o o k l y n P r e s s (New York) 1962. 18. James Slagle [1971] Artificial Intelligence: The Heuristic Programming Approach. M c G r a w - H i l l , New Y o r k , 1971. 19.
R a n d a l l D a v i s and J o n a t h a n K i n g [1975] An Overview o f P r o d u c t i o n Systems, S t a n f o r d Artificial Intelligence L a b o r a t o r y Memo AIM-271, October 1971.
20.
George E r n s t & Allen N e w e l l [1969] GPS: A Case Study in Generality and Problem S o l v i n g . Academic P r e s s (New York), 1969.
21.
M a r v i n M i n s k y [1963] S t e p s Toward Artificial Intelligence. I n Computers and Thought, by Edward Feigenbaum and Julian Feldman (editors), M c G r a w - H i l l (New York) 1963.
22.
S a u l Amarel [1969] P r o b l e m - S o l v i n g and D e c i s i o n - M a k i n g by Computer: An Overview. In Cognition: A Multiple V i e w , G a r v i n (editor), S p a r t a n Books (Washington) 1970.
23.
R i c h a r d L e F a i v r e [1974] F u z z y P r o b l e m - S o l v i n g , Ph.D. Dissertation, Computer S c i e n c e Department, University o f W i s c o n s i n , Madison, 1974. The R e p r e s e n t a t i o n o f F u z z y Knowledge, J o u r n a l o f C y b e r n e t i c s , volume 4, p57-66, 1974.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8 Computer-Assisted Synthetic Analysis in Drug Research P. GUND, J. D. ANDOSE, and J. B. RHODES
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
Merck Sharp & Dohme Research Laboratories, Dept. of Scientific Information, and Corporate Management Information Systems, MSDRL Systems and Programming Dept., Merck & Co., Inc., Rahway, N.J. 07065
It was recognized at Merck some time ago (1) that the analytical and data-handling capabilities of a computer could f a c i l i t a t e organic synthesis design, just as spectroscopic methods have revolutionized structure determination. Indeed, the chemist's approach to synthetic design resembles a computer program (2). As outlined in Figure 1, the chemist begins by defining his synthetic problem; he collects relevant knowledge about chemical reactions (note that he never sequentially searches through all available literature); and he designs a synthesis. If the synthesis f a i l s or if he wants additional p o s s i b i l i t i e s , he iterates (repeats the process) to generate additional syntheses. If initial attempts are unsatisfactory, the chemist may enlarge his store of relevant knowledge, e.g. by reading the literature or talking to an expert; or he may redefine the problem - i . e . , find new keys so that more of his knowledge of reactions becomes relevant. In fact, chemists have been known to search exhaustively for a way to implement a preferred route, only to f i n a l l y re-analyze their problem and find an entirely different - and ultimately successful - one. If sufficiently desperate, the chemist may browse (i.e., perform a random search) through the literature for ideas. This method has occasionally succeeded when more rational approaches failed, and might be taken as an indication that our reaction classification and retrieval methods are imperfect. We may envision two levels of computer support of the chemist's analytical process. One approach - which we may c a l l a reaction retriever - organizes and retrieves relevant reaction information. The other, which we here c a l l a synthesizer, simulates a large part of the synthetic process, as shown by the frame in Figure 1. Both computer approaches require a data base of chemical reactions, and it is conceivable that they could share the same data base. We w i l l return to this point; but first, we should consider whether either method would find use by the practicing chemist. 179 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
LEME PDROEFBIM PAKOTETYRERNS CONCEPTS ν
>
>
Figure
1.
Organic
e n
synthesis: Problem
^LRTIEEARCATTOIUNRE f
"retriever
synthesizer"
CHEMSIT)
analysis
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
8.
Synthetic
GUND ET AL.
Analysis
in
Drug
Research
181
Categories of Pharmaceutical Synthetic A n a l y s i s We i d e n t i f y four types of pharmaceutical s y n t h e s i s : (1) s y n t h e s i s of analogs of a l e a d " compound, (2) s y n t h e s i s of n a t u r a l products, (3) process development» and (4) new r e a c t i o n discovery (Figure 2). A lead analog program normally attempts 11
Synthesis
Type
Synthesis
Analogs of "Lead"
Class
Retriever
SM ?
Synthesizer f o r D i f f i c u l t Compounds; R e t r i e v e r Retriever
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
SM Ρ Ρ
Synthesizer; Retriever
Retriever
Process Development
? ·* Ρ SM + Ρ
Synthesizer; Retriever
Retriever
New
SM ·> ?
R e t r i e v e r ; Forward Operating Synthesizer
Natural
? SM
Computer A i d
Product
Reactions
Figure
2.
Computer-assisted
synthetic
analysis types and
classes
to f i n d short syntheses of a s e r i e s of r e l a t e d compounds, o f t e n from a s i n g l e intermediate which can be obtained i n l a r g e q u a n t i t i e s - f o r example, c o n s t r u c t i o n of various s i d e chains s t a r t i n g from p e n i c i l l a n i c a c i d . O c c a s i o n a l l y , however, a d e s i r e d analog must be made by q u i t e d i f f e r e n t chemistry - f o r example, p r e p a r a t i o n of dethiacephalosporins (J3). Also o c c a s i o n a l l y , analogs w i l l be made by applying s t r a i g h t f o r w a r d chemistry to an a v a i l a b l e s t a r t i n g m a t e r i a l . Thus, i f we d i f f e r e n t i a t e three c l a s s e s of s y n t h e t i c analyses - (a) s t a r t i n g m a t e r i a l and product s p e c i f i e d (SM •> P); (b) product s p e c i f i e d (? •+ P) ; and (c) s t a r t i n g m a t e r i a l s p e c i f i e d (SM ?) , then lead analog syntheses may belong to any of the c l a s s e s . When a n a t u r a l product with i n t e r e s t i n g b i o l o g i c a l a c t i v i t y i s i s o l a t e d , s y n t h e s i s i s used to confirm the s t r u c t u r e and to o b t a i n s u f f i c i e n t m a t e r i a l f o r biochemical and s t r u c t u r e m o d i f i c a t i o n s t u d i e s . Synthetic a n a l y s i s i s g e n e r a l l y of the "product s p e c i f i e d (? P) type, except when a r e l a t e d m a t e r i a l i s known and a v a i l a b l e (SM Ρ synthesis) · For process development, a n a l y s i s tends to be exhaustive, s i n c e the optimal commercial s y n t h e s i s o f t e n i s d i f f e r e n t from the "best" laboratory method. When a cheap r e l a t e d compound i s a v a i l a b l e , the a n a l y s i s may be of the SM ·* Ρ type. F i n a l l y , chemists apply new r e a c t i o n s to known compounds i n order to generate new drug leads; t h i s r e q u i r e s s y n t h e t i c f f
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
182
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
analyses of the SM ? o r , o c c a s i o n a l l y , SM Ρ types. A r e a c t i o n r e t r i e v e r program i s a p p l i c a b l e to a l l f o u r types of syntheses. A s y n t h e s i z e r program a p p l i e s p r i m a r i l y to the ? Ρ c l a s s o f problem, although a forward working s y n t h e s i z e r would apply to SM ? problems. The a p p l i c a b i l i t y of these programs by s y n t h e s i s type i s summarized i n F i g u r e 2. As a g e n e r a l i z a t i o n , a s y n t h e s i z e r i s most u s e f u l f o r " c r e a t i v e syntheses, while a r e a c t i o n r e t r i e v e r may i d e n t i f y optimal c o n d i t i o n s a f t e r a r e a c t i o n pathway has been chosen. Therefore both methods should be v a l u a b l e .
11
Program D e s c r i p t i o n s A r e a c t i o n r e t r i e v e r program permits o r g a n i z i n g and e n l a r g i n g the s t o r e of r e a c t i o n knowledge a v a i l a b l e to each individual. Chemists have long used manual systems f o r o r g a n i z i n g r e a c t i o n knowledge, such as i n d i v i d u a l card f i l e s , T h e i l h e i m e r s famous s e r i e s of volumes, Reactiones Organicae, and r e c e n t l y the Derwent Chemical Reactions Documentation S e r v i c e . Computer o r g a n i z a t i o n of such c o l l e c t i o n s enables r e t r i e v a l by r e a c t a n t s , products, r e a c t i o n type, r e a c t i o n c o n d i t i o n s , and/or mechanism (4)· C r e a t i o n of such a computer program i s g e n e r a l l y considered an information r e t r i e v a l a p p l i c a t i o n (4)· A s y n t h e s i z e r program t r a d i t i o n a l l y begins with a t a r g e t s t r u c t u r e and a p p l i e s chemical r u l e s to generate and evaluate p o t e n t i a l precursors. I t o f f e r s the c a p a b i l i t y of f a s t , exhaustive, unbiased s y n t h e t i c analyses. As i n d i c a t e d i n many of the other c o n t r i b u t i o n s to t h i s symposium, t h i s approach i s u s u a l l y considered an a r t i f i c i a l i n t e l l i g e n c e a p p l i c a t i o n . Since we had concluded that both program types were d e s i r a b l e , we wondered i f the same r e a c t i o n data base could serve for both computer approaches, as the flow diagram of Figure 1 suggests. We t h e r e f o r e embarked upon a f e a s i b i l i t y study to t e s t t h i s dual use concept. 1
Proposed Computerized Chemical Reaction C o l l e c t i o n We conceived a system where r e a c t i o n s coded by Merck chemists could serve three purposes - current awareness, r e a c t i o n r e t r i e v a l , and s y n t h e s i z e r input (Figure 3). We i d e n t i f i e d the f o l l o w i n g system development stages: (I) development of r e a c t i o n coding sheet; (II) d e s c r i p t i o n o f r e a c t i o n s on sheets by chemists ( c o n t i n u i n g ) ; ( I I I ) t r a n s l a t i o n to computer readable r e a c t i o n information by i n f o r m a t i o n s c i e n t i s t s and t y p i s t s ( c o n t i n u i n g ) ; (IV) development of software f o r computer storage and r e t r i e v a l of r e a c t i o n i n f o r m a t i o n . In the f e a s i b i l i t y study, we a c t u a l l y c a r r i e d out phases I and I I , and performed a l i m i t e d systems a n a l y s i s o f phases I I I and IV. We designed a r e a c t i o n coding sheet t o c o n t a i n most of the information needed by both r e a c t i o n r e t r i e v e r and s y n t h e s i z e r programs (Figure 4 ) . Data on r e a c t a n t s , products, intermediates, reagents, c o n d i t i o n s , y i e l d s , work-up procedures, mechanism, and
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8.
GUND E T A L .
Synthetic
Analysis
in Drug
CHEMISTS WRITE REACTION DESCRIPTIONS
DEVELOP REACTION CODING SHEET
»
CODERS INPUT REACTIONS
' REACTION I ' RETRIEVAL/ SERVICE/
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
Figure
3.
Reaction
1.CODER: P. G-u* A
file
6 . REFERENCES:
Rfc+cl.^e
T^WKcdrov 7. AFFILIATION:
I INPUT FOR / SYNTHESIZER/ / PROGRAM /
development
2 .LOCATION(ΜΟΝΕ) : RIO (Wli)
4.REACTION NAME: C c p K e J # e p * r i i * S y n ^ e c i S
}
CAr^nC**,
*
3.DATE:
Zlz7/rtf
5.SOURCE: ^*MERCg>
V * * ^ V 6 * 9 , * · Γ3 Λ * 7 . ϊ )
LeH
183
Research
^LITERATURES OTHER
M ^ trc
8.REACTION (show a l l reactants:
A • M 3 CH z COCI
I; I + R3 @ V
R l + R2
PI + P2)
-SL
G-e*er«.| Syn-/<e$lS «-p
-
^-L
Ccpkal*sp*rinÇ
9.(circle crucial fragments In both starting materials and products) 10.STERE0SPECIFICITY: H e j i o S p e c . ' i i c , 11. GENERALITY:
û»ej
12. REAGENTS :
K C0 t
3
s+«reo$r« « ^ c .
cyclo^dd.'4» # κ
c
YIELD RANGE : 5T> -*0/ RATING: #
/
Ac«-»»~e
1 2 3 «(sjist)
NaH/ï>*F
or
13. REACTION CONDITIONS: ® - 7 8 ' C 14. WORKUP PROCEDURES : (Σ) CKro~*.-tD«|<xf 15. MECHANISM: "
o~
s./.c*.
16.INTERFERING GROUPS(refer to specific structure, e.£. R l , I, P2, etc.) 17. PROMOTING GROUPS(refer, as above, to specific structure) 18. CODER'S COMMENTS(including, i f desired, specific compounds prepared and yield)
Figure
4.
Merck reaction
coding
sheet
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
184
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
i n t e r f e r i n g and p r o t e c t i n g groups was sought. The r e a c t i o n was expected to be described i n terms of s t r u c t u r a l diagrams; t h i s not only would make the sheets readable f o r current awareness c i r c u l a t i o n , but a l s o would enable the diagrams to be coded d i r e c t l y f o r computer input using software developed f o r the Merck Chemical S t r u c t u r e Information System (5). This i n turn would provide high q u a l i t y l i n e p r i n t e r output of chemical s t r u c t u r e s (see Figure 5 ) , and a l s o allow substructure coding by computer program instead of by l a b o r - i n t e n s i v e , error-prone manual coding. In phase I I , we asked Merck chemists to code r e a c t i o n s i n t h e i r area of e x p e r t i s e . We received 44 coded sheets from 31 chemists, a g r a t i f y i n g response. However, while these were eminently s u i t a b l e f o r current awareness c i r c u l a t i o n and f o r a r e a c t i o n r e t r i e v a l s e r v i c e , they were g e n e r a l l y u n s u i t a b l e f o r a s y n t h e s i z e r . Thus, as summarized i n Table I, over 50% of the Table I Summary of Coded 18 9 7 2 2 2 2 1 JL 44
Reactions
F u n c t i o n a l Group Interchange C-C Bond Formation H e t e r o c y c l i c Syntheses Cycloa'dditions Group P r o t e c t i o n F u n c t i o n a l Group I n t r o d u c t i o n F u n c t i o n a l Group Removal C-Hetero Bond Formation Hetero-Hetero Bond Formation Total
r e a c t i o n s c o l l e c t e d were f u n c t i o n a l group manipulations probably the c l a s s of r e a c t i o n most commonly performed i n the l a b o r a t o r y , but the l e a s t u s e f u l f o r generating m u l t i s t e p , n o n t r i v i a l syntheses. Moreover, few of the r e a c t i o n s were described i n s u f f i c i e n t d e t a i l to ensure generation of v a l i d pathways by a s y n t h e s i z e r program. Reactions f o r a R e t r i e v e r v s . Reactions f o r a Synthesizer Although i n p r i n c i p l e the same r e a c t i o n data base could be used f o r both r e t r i e v e r and s y n t h e s i z e r programs, our f e a s i b i l i t y study i n d i c a t e d that t h i s was d i f f i c u l t to achieve i n p r a c t i c e , f o r s e v e r a l reasons. A s y n t h e s i z e r r e t r i e v e s r e a c t i o n s according to f u n c t i o n a l i t y or substructure of the t a r g e t molecule, so fewer keys are needed tnan f o r a r e t r i e v e r program which r e q u i r e s entry by product, r e a c t a n t s , r e a c t i o n type, e t c . Therefore f i l e o r g a n i z a t i o n f o r one program i s not n e c e s s a r i l y optimal f o r the other. Furthermore, while an incompletely described r e a c t i o n may be u s e f u l l y included i n a r e t r i e v e r program, i t could be
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8.
GUND E T A L .
Synthetic
Analysis
CHEM STRUCS SEARCH REPORT MARCH OW.R.GALL/DR.P.GUND SEARCH FIND
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
O •1
in Drug
1 0 ,
C
w
3
185
ΐ θ 7 β
BD SEARCH
30Θ6
Research
-OB
14
S TEST
C l - ^
8(a)
O1 O2 * AND
MERCK CHEMICAL STRUCTURE INFORMATION SYSTEM
*************************************
L-590,225-00S C20H18NO4CI
MOL.
WT.
371.923
NS
CH3O--
2-/-, -P-CHLOROBENZOYL-s-METHOXY-a-METTIYL-s-1NDOLYL/PROPI ON IC AC ID
Figure
5.
Line printer
output of chemical
structures
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
186
disastrous for a synthesizer. The l a t t e r i n t e r p r e t s each r e a c t i o n as a s e r i e s of chemical i n s t r u c t i o n s f o r generating precursors. Each r e a c t i o n d e s c r i p t i o n t h e r e f o r e must be tested and debugged to prevent p r o l i f e r a t i o n of numerous improbable s y n t h e t i c pathways. F i n a l l y r e a c t i o n d e s c r i p t i o n s should i d e a l l y be d i f f e r e n t f o r these two purposes. While a r e t r i e v e r can handle many very s p e c i f i c r e a c t i o n s , a s y n t h e s i z e r r e q u i r e s generalized r e a c t i o n s f o r e f f i c i e n c y . Thus, a s y n t h e s i z e r - l i k e the chemist - must i n i t i a l l y decide whether, e.g., reducing a ketone to an a l c o h o l i s d e s i r a b l e ; i t i s u s u a l l y i n a l a t e r step of the a n a l y s i s that the chemist decides which of the hundreds of r e a c t i o n c o n d i t i o n s for ketone r e d u c t i o n should be t r i e d . A s y n t h e s i z e r program which attempted to apply each of s e v e r a l hundred r e t r o - r e d u c t i o n transforms each time an a l c o h o l appeared i n a t a r g e t molecule, would not be very e f f i c i e n t . In c o n c l u s i o n , r e t r i e v e r and s y n t h e s i z e r programs are complementary i n a i d i n g the s y n t h e t i c chemist. A s y n t h e s i z e r excels i n d e r i v i n g m u l t i s t e p routes to designated products, w h i l e a r e t r i e v e r i s s u p e r i o r f o r choosing s p e c i f i c r e a c t i o n c o n d i t i o n s . The r e a c t i o n data bases should r e f l e c t these d i f f e r e n c e s .
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
9
Current Status We a r e e v a l u a t i n g the Derwent Chemical Reactions Documentation S e r v i c e as the b a s i s of a r e a c t i o n r e t r i e v a l system. I f Derwent succeeds i n t h e i r o b j e c t i v e o f coding ca. 20,000 r e a c t i o n s from Theilheimer, Organic Syntheses, e t c . - plus coding 4000 new r e a c t i o n s per year - they w i l l create a data base which we could not hope t o d u p l i c a t e o u r s e l v e s . For a s y n t h e s i z e r , we are reimplementing the Simulation and E v a l u a t i o n of Chemical Synthesis (SECS) program on our IBM computer i n c o l l a b o r a t i o n with Professor Wipke. Once the program i s running a t Merck, we w i l l i n s t r u c t the chemists i n program o p e r a t i o n , and then ask them t o code and t e s t transforms. While reimplementation of SECS on IBM equipment has proven to be n o n t r i v i a l , we have made s u b s t a n t i a l progress. We have s u c c e s s f u l l y i n t e r f a c e d our GT42 graphics terminal with our IBM 370/158 computer under time shared o p t i o n (TSO), and the major p o r t i o n of the program i s converted. In the meantime, we have u t i l i z e d the v e r s i o n of SECS operat i n g on F i r s t Data Corporation's time-sharing system t o gain f a m i l i a r i t y with the program, t o a i d i n debugging our IBM v e r s i o n , and to evaluate the program's c a p a b i l i t i e s . In one study, performed with the p a r t i c i p a t i o n of E. Grabowski and R. Czaja, the program demonstrated that i t was r e l a t i v e l y easy f o r the chemist t o use; that i t could produce novel s y n t h e t i c routes; and that costs were w i t h i n reason. I t proved to be h i g h l y d e s i r a b l e to have a knowledgeable chemist guide the program. The a n a l y s i s a l s o revealed some gaps i n the program's chemical knowledge, which c l e a r l y represents an area f o r f u t u r e development.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
8.
GUND ET AL.
Synthetic Analysis in Drug Research
187
Acknowledgements We wish to thank the Merck chemists who participated i n the f e a s i b i l i t y study, and our consultant, W. T. Wipke, for his detailed assistance i n program conversion.
Literature Cited
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008
1.
Sarett, L. H.; "Synthetic Organic Chemistry: New Techniques and Targets", presented before the Synthetic Manufacturers Association, June 9, 1964. 2. Corey, E. J., Pure Appl. Chem. (1967), 14, 19. 3. Guthikonda, R. N., Cama, L. D., and Christensen, B. G., J. Amer. Chem. Soc. (1974), 96, 7584. 4. Valls, J., i n "Computer Representation and Manipulation of Chemical Information", Wipke, W. T., Heller, S. R., Feldmann, R. J . , and Hyde, E., Edits., J. Wiley, New York, 1974, p. 83. 5. Brown, H.D., Costlow, M., Cutler, F. A., J r . , DeMott, A. N., Gall, W. B., Jacobus, D. P., and Miller, C. J., J. Chem. Info. Comp. S c i . (1976), 16, 5.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
9 Computer-Assisted Structure Elucidation: Modelling Chemical Reaction Sequences Used in Molecular Structure Problems1,2
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
TOMAS H. VARKONY, RAYMOND E. CARHART, and DENNIS H. SMITH Departments of Chemistry, Computer Science, and Genetics, Stanford Univ., Stanford, Calif. 94305 Our research i n the a p p l i c a t i o n s of computer techniques to c h e m i c a l problems has focused on elucidation of molecular structures o f unknown compounds. We have been applying problem solving methods derived from research on artificial intelligence to c r e a t e a program which emulates c e r t a i n phases of manual approaches to s t r u c t u r e e l u c i d a t i o n . This program, called "CONGEN", provides a general mechanism f o r assembly o f c h e m i c a l atoms and structural fragments i n f e r r e d from any of a v a r i e t y of s o u r c e s . Such fragments ("superatoms" ) a r e i n f e r r e d manually and s u b s e q u e n t l y s u p p l i e d to the program. Statements about structural fragments and c o n s t r a i n t s on the ways in which they may be assembled a r e i n p u t to CONGEN using a graphical language for representation of structures. T h i s language has important r a m i f i c a t i o n s i n e x t e n s i o n s to CONGEN, as we o u t l i n e s u b s e q u e n t l y . 3
3
There a r e , however, several other important phases of s t r u c t u r e e l u c i d a t i o n , all o f which a r e amenable to c o m p u t e r - a s s i s t a n c e . A r e p r e s e n t a t i o n o f major milestones in typical structure elucidation problems is presented i n F i g . 1. For the purposes of the subsequent d i s c u s s i o n we c o n s i d e r two different categories of s t r u c t u r e problems, both of which fit i n t o the scheme of F i g . 1. The first c a t e g o r y we view as the g e n e r a l problem of structure elucidation, wherein an unknown compound is isolated and characterized. The second category we term "mechanistic" structure elucidation. Into this c a t e g o r y fall s y n t h e t i c r e a c t i o n s where the p r e c u r s o r , or s t a r t i n g m a t e r i a l , i s known but the p r o d u c t ( s ) and the p r e c i s e r e a c t i o n pathways a r e not. Of course there can be some o v e r l a p between these c a t e g o r i e s , and CONGEN handles both i n a s i m i l a r way, but we will
188 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
9.
VARKONY
ET AL.
Computer-Assisted Structure
Elucidation
189
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
d i s c u s s them a s separate topics* Thus. the "chemical h i s t o r y " c o l l e c t e d ( F i g * 1 ) may, i n t h e f o r m e r c a s e , be actual chemical tests or reactions carried out to c h a r a c t e r i z e t h e unknown* For mechanistic s t u d i e s , the h i s t o r y may i n c l u d e a s p e c i f i c r e a c t i o n e x e r c i s e d on a known structure* Data interpretation and s t r u c t u r e a s s e m b l y w i l l be i n t h e f o r m e r c a s e a c t u a l assembly of i n f e r r e d fragments, while i n the mechanistic case i t usually will involve manipulations of, or slight modifications t o a known s t r u c t u r e * Until recently, CONGEN p e r f o r m e d only structure a s s e m b l y and some d a t a i n t e r p r e t a t i o n and provided the facilities to assist examination of structural possibilities and elimination of inconsistent structures* We are now actively pursuing other elements of Fig* 1 * F o r example, examination of s t r u c t u r e s and s u b s e q u e n t d e s i g n o f new experiments i s an interesting chemical and artificial intelligence problem c u r r e n t l y under i n v e s t i g a t i o n * Although actual c o l l e c t i o n of s p e c t r o s c o p i c and c h e m i c a l d a t a i s b e y o n d the scope of our current i n t e r e s t i n s o f a r as p r o g r a m s for symbolic reasoning are concerned, the element of d a t a i n t e r p r e t a t i o n i s a l s o of major importance t o us* The use of chemical transforms, or reaction sequences, i n s t r u c t u r e e l u c i d a t i o n ( F i g . 1 ) i n both the g e n e r a l and m e c h a n i s t i c s e n s e s m e n t i o n e d previously i s the subject of t h i s report* Reactions or sequences of reactions may be carried o u t on an unknown f o r several reasons* The reaction may a) test for a s p e c i f i c f u n c t i o n a l group; b) s i m p l i f y t h e p r o b l e m by decomposing t h e unknown into smaller, more easily characterizable m o l e c u l e s ; c) modify the s k e l e t o n or functional groups to define more accurately their respective environments or make the unknown more amenable t o a n a l y s i s (e_*&* , i n c r e a s e i t s volatility); o r d) u n a m b i g u o u s l y r e l a t e t h e unknown t o a previously characterized compound* Mechanistic studies, usually involving rearrangements or eyelizations, employ reaction sequences to help characterize reaction pathways and establish relationships among sets of related structures* The m u l t i p l i c i t y o f p a t h w a y s o p e n t o such processes frequently prevents establishing structures of products without additional collection and examination of data. In such cases, the chemical t r a n s f o r m which forms p a r t o f the c h e m i c a l h i s t o r y can be c a r r i e d o u t t o y i e l d t h e s e t o f c a n d i d a t e structures
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Figure
1.
PHYS.HISTORY
CHEM./BIOL./
SPECTROSCOPY
milestones
PLAN NEW EXPERIMENTS
I N F E R E N C E S AND CONSTRAINTS
NEW S T R U C T U R A L
OTHER DATA
SPECTRA
UNIQUE FEATURES
COMMON AND
INTERPRETATION
DATA
structure
or on
FINAL STRUCTURES
CANDIDATE STRUCTURES
STRUCTURE ASSEMBLY
out on a known
EXAMINE STRUCTURES
STRUCTURES
ELIMINATE INCONSISTENT
AND CONSTRAINTS
INFERENCES
in the elucidation of molecular structure. Reaction sequences carried candidate structures for an unknown are the topic of this paper.
SEQUENCES
REACTION
CHEMICAL TRANSFORMS
MORE SPECTROSCOPY
Major
KNOWN COMPOUNDS
REARRANGED
OR
COMPOUND
UNKNOWN
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
VARKONY
E T AL.
Computer-Assisted Structure Elucidation
191
w h i c h may t h e n be t r e a t e d j u s t a s candidate structures for an unknown* However, the application of constraints on the r e a c t i o n products has d i f f e r e n t meaning i n m e c h a n i s t i c s t u d i e s as c o n t r a s t e d to general unknowns a s we d e s c r i b e i n t h e M e t h o d s s e c t i o n *
PURPOSE
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
Similarities Assisted Synthesis *
to
and
Contrasts
wi t h
Computer-
As we w i l l i l l u s t r a t e i n s u b s e q u e n t s e c t i o n s , o u r work on chemical reaction sequences has several important s i m i l a r i t i e s t o and c o n t r a s t s with current e f f o r t s d i r e c t e d toward c o m p u t e r - a s s i s t e d s y n t h e s i s »-\ The s i m i l a r i t i e s f a l l p r i m a r i l y i n areas of structure and reaction d e f i n i t i o n and m a n i p u l a t i o n * We s h a r e common problems of user interaction and i n t e r f a c i n g between the outside world and the more rigidly s t r u c t u r e d domain o f the computer program* Reactions and s t r u c t u r a l c o n s t r a i n t s on them must be d e f i n e d and s u c h d e f i n i t i o n s must be s a v e d t o p r o v i d e a knowledge base which c a n be called upon by future users* Internal t o the programs a r e common problems of perceiving important molecular features and e x e c u t i n g the reaction by appropriate manipulations of the structure representation according to the d e f i n i t i o n of the reaction* Algorithms common t o both problems include ring perception ("cycle finding"), "path f i n d i n g " t o d e t e r m i n e c o n n e c t i v i t y , s o m e form of graph matching t o detect given substructures, recognition of symmetry p r o p e r t i e s o f r e a c t a n t s o r p r o d u c t s , a v o i d a n c e o f o r d e t e c t i o n and e l i m i n a t i o n o f d u p l i c a t e structures and representation o f c h e m i c a l as opposed to grapht h e o r e t i c a l c o n c e p t s o f s t r u c t u r e , s u c h as a r o m a t i c i t y * x
y
Despite the f a c t that both our e f f o r t s and t h o s e of c o m p u t e r - a s s i s t e d s y n t h e s i s d e s i g n i n v o l v e executing a representation of a chemical reaction i n the computer, there are fundamental philosophical and m e t h o d o l o g i c a l d i f f e r e n c e s , as summarized i n F i g * 2 and d e t a i l e d as f o l l o w s : I) A s y n t h e s i s problem has a specific target molecule* The g o a l i n developing a synthesis i s to define precursors w h i c h a r e i n some s e n s e s i m p l e r * The precursors become the t a r g e t s f o r the next l e v e l and the procedure i s recursive until the terminating condition of sufficiently simple precursors is
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
IN
A R E
DIRECTION
ROUTES
"GOOD"
M O L E C U L E
SYNTHESIS
2) F I N D I N G
I)TARGET
VARIABLES
2) R E A C T I O N S
ANTITHETIC
I) D E V E L O P T R E E
MATERIALS
PLANNING
PRODUCTS
G O A L S (A)
2) PATHWAYS
IDENTIFY
REACTION
PRODUCTS
UNKNOWN(S)
FROM AMONG
I) I D E N T I F Y
METHOD
•
2
g
PRODUCTS REACTION PRODUCTS
*
ί\
*
IN
ώ
A R E
Γ*
i \
g
•
BY
FROM
-
x
j' 4
PRODUCTS
CANDIDATES CONSTRAINING
AMONG
I) I D E N T I F Y
ύ
UNKNOWN
; \
g
Q
CONSTANTS
DIRECTION
G O A L S (Β)
2) R E A C T I O N S
SYNTHETIC
J
g
Π
T R E E
6 I)DEVELOP
PRODUCTS
REACTIONJL+jVi
IJ
g
STRUCTURES
Β
Ρ3 φ g g
CANDIDATE
REACTION
SEQUENCES
-ù-
REACTION
Figure 2. Contrasts between the methods and goals of development of a synthesis tree in computer-assisted synthesis and development of a reaction sequence tree in structure elucidation. We denote two applications of reaction sequences: A) conversion of known pre cursors into unknown compounds, a problem encountered in mechanistic studies; B) conversion of candidate structures for an unknown into a series of products, usually employed in the general problem of structure elucidation.
GOALS
METHOD
STARTING
SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
VARKONY E T A L .
Computer-Assisted
Structure
Elucidation
193
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
achieved* R e a c t i o n s e q u e n c e s h a v e no p r e d e f i n e d target molecules* The known p r e c u r s o r s (Case A, F i g * 2) o r structural candidates (Case B, Fig* 2) represent reactants. A given reaction transforms each s t r u c t u r e which can undergo the reaction into one or more products* The p r o d u c t s t h e m s e l v e s may be subjected to further reaction* The goals in these reaction sequences a r e : Case
A) identify unknown s t r u c t u r e s i n the set of p r o d u c t s and by so doing, elucidate the r e a c t i o n pathway(s)•
Case
B) i d e n t i f y t h e unknown c a n d i d a t e s by c o n s t r a i n t s
s t r u c t u r e from among t h e applied to the products*
II) Reaction sequences operate i n the s y n t h e t i c rather than the retrosynthetic, or antithetic direction* This d i f f e r e n c e i s i l l u s t r a t e d i n F i g * 2» Each e x p a n s i o n of the s y n t h e s i s tree represents a set of reactions applied i n the reverse, or a n t i t h e t i c direction* An a c t u a l s y n t h e s i s w o u l d p r o c e e d b a c k w a r d s a l o n g one p a t h * Each e x p a n s i o n o f the CONGEN r e a c t i o n t r e e , however, r e s u l t s from t h e a p p l i c a t i o n o f a s i n g l e r e a c t i o n ( a p p l i e d i n t h e s y n t h e t i c d i r e c t i o n ) on one o r more s t a r t i n g materials. III) Expansion of the synthesis tree is c o n t r o l l e d by c o n s t r a i n t s on the r e a c t i o n s which are applied* Expansion of the r e a c t i o n sequence tree in CONGEN i s controlled by constraints on s t r u c t u r e s , i * e.. , p r o d u c t s . Differences ( I I ) and ( I I I ) a r e r e f l e c t i o n s o f t h e fact that synthesis programs use reactions as variables* Several reactions from a library of possible reactions may apply to any target* We c o n s i d e r r e a c t i o n s as c o n s t a n t s * A reaction i s defined ( s e e M e t h o d s ) and a p p l i e d t o a l i s t o f s t r u c t u r e s * The p r o d u c t s a t any l e v e l a r e o b t a i n e d from s t r u c t u r e s at t h e p r e c e d i n g l e v e l t h r o u g h one o r more a p p l i c a t i o n s o f that reaction. A l t h o u g h many r e a c t i o n s may be a p p l i e d to a given l i s t of s t r u c t u r e s , leading to branches i n the r e a c t i o n sequence t r e e (see Methods) our b a s i c task is the exhaustive e x p l o r a t i o n or evaluation, not o f r e a c t i o n s , but of s t r u c t u r a l possibilities*
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
194
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
Relationship Elucidation*
of
Reaction
Sequences
to S t r u c t u r e
In t h e c o u r s e of determination of the s t r u c t u r e o f an unknown compound, r e a c t i o n s may be c a r r i e d o u t on the unknown to gather additional structural information. I n many c a s e s s u c h new i n f o r m a t i o n c a n be expressed directly as constraints on the possible s t r u c t u r e s f o r the unknown* F o r example, i f the base catalyzed exchange of e n o l i z a b l e hydrogen atoms w i t h d e u t e r i u m atoms y i e l d s a new compound whose m o l e c u l a r w e i g h t i s t h r e e amu g r e a t e r t h a n t h e unknown, then the candidate structures f o r the unknown can be t e s t e d directly for the presence of three hydrogens exchangeable under these c o n d i t i o n s without c o n s i d e r i n g the s t r u c t u r e of the transformed material* In f a c t , one c r i t e r i o n f o r useful chemical transforms designed to yield new structural information is that observations on the resulting products be easily translated back to the starting structural possibilities• There i s , however, an important class of r e a c t i o n s i n which the t r a n s l a t i o n of o b s e r v a t i o n s on t h e p r o d u c t s i n t o d i r e c t c o n s t r a i n t s on the s t r u c t u r a l p o s s i b i l i t i e s i s d i f f i c u l t i f not i m p o s s i b l e * In these cases i t i s e s s e n t i a l to c o n s i d e r the application of the reaction to each structural candidate and t h e relationship of these candidates to their respective products* The most common e x a m p l e s o f t h i s class are r e a c t i o n s i n which a given product or s e t of products may be o b t a i n e d f r o m d i f f e r e n t c a n d i d a t e s t r u c t u r e s f o r the unknown* Or, stated slightly d i f f e r e n t l y , the c l a s s o f r e a c t i o n s i n w h i c h t h e r e i s more t h a n one way for a given product or s e t of products to undergo the reverse, or antithetic reaction* The following are some s i m p l e , b u t i l l u s t r a t i v e e x a m p l e s * v
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
Computer-Assisted Structure Elucidation
VARKONY ET AL.
195
In the absence of additional information concerning the original relationships of the a l k y l groups i n ! t o t h e d o u b l e bond and k e t o g r o u p s o f J., t h e r e a r e f o u r ways o f r e a s s e m b l i n g 2 and 3. ( f o u r ways of c a r r y i n g out the a n t i t h e t i c r e a c t i o n ) * This i s an e x a m p l e w h e r e t h e f u n c t i o n a l g r o u p w h i c h was i n t r o d u c e d is a group already present i n the molecule* Thus, there is ambiguity in referring data measured on products back to the starting structures* Similar p r o b l e m s a r i s e i n any f r a g m e n t a t i o n r e a c t i o n where more t h a n two f r a g m e n t s are produced, the worst case being mass s p e c t r a l f r a g m e n t a t i o n s , where t h e fragment ions can be reassembled in many consistent ways. Transformations exemplified by 4. 5. and £L 1 represent removal or i n t r o d u c t i o n of m u l t i p l e bonds w h e r e t h e r e i s a m b i g u i t y , b a s e d on t h e s t r u c t u r e s o f 5. and J., concerning the structures of 4. and 6., respectively* The p r o b l e m becomes r a p i d l y more c o m p l e x if a sequence of reactions is used on a set of candidate structures* Keeping the data and structural possibilities organized is a difficult job to do manually• Other important members of this class of reactions include the c y c l i z a t i o n s and rearrangements mentioned previously with respect to mechanistic reactions* B e c a u s e t h e s e r e a c t i o n s g e n e r a l l y h a v e many ways t o occur, i n perhaps s e v e r a l consecutive steps, they are not n o r m a l l y used t o help solve an unknown structure* Rather, such r e a c t i o n s are c a r r i e d o u t on known m a t e r i a l s and the problem i s to determine the s t r u c t u r e s of o b s e r v e d p r o d u c t s based a t l e a s t in part on knowledge of the r e a c t i o n itself* Carbonium i o n rearrangements and cyclization reactions such as cyclization of squalene epoxide and congeners to lanosterol and r e l a t e d compounds are two important examples• 5
We d e s i g n e d t h e r e a c t i o n s e q u e n c e c a p a b i l i t i e s o f CONGEN t o make i t s i m p l e f o r a u s e r of the program to carry out reactions in either the general or mechanistic category of a p p l i c a t i o n * For the g e n e r a l c a t e g o r y , m e a s u r e m e n t s made on p r o d u c t s o f a reaction c a n be u s e d d i r e c t l y to t e s t the p r o d u c t s without the necessity for translation of each o b s e r v a t i o n back t o the starting materials* These tests have direct e f f e c t s on t h e i m m e d i a t e p r e c u r s o r s o f t h e p r o d u c t s and e v e n t u a l l y on t h e c a n d i d a t e s t r u c t u r e s i n a multi-step sequence; removal of one product can result in e l i m i n a t i n g whole branches of the r e a c t i o n t r e e (see
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
196
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
Methods section)* The c o m p l e x i t i e s of dealing with many s t r u c t u r a l candidates and s e v e r a l r e a c t i o n s and associated products are handled by the i n t e r n a l b o o k k e e p i n g o f t h e program* In the case o f m e c h a n i s t i c studies, the r e a c t i o n c a p a b i l i t i e s are important i n d e f i n i n g t h e a l t e r n a t i v e s t r u c t u r e s which can a r i s e a t each step o f c y c l i z a t i o n or rearrangement* This guarantees that a l l plausible products will be considered i n d e c i d i n g t h e outcome o f such r e a c t i o n s * Note that the ambiguities of r e l a t i n g observed p r o d u c t s t o s t a r t i n g m a t e r i a l s a r e n o t r e m o v e d by u s i n g a computer program* The a d v a n t a g e o f u s i n g a program is that a l l alternatives will be systematically considered* I t i s easy t o h y p o t h e s i z e one p a r t i c u l a r s t r u c t u r e which obeys a l l observed d a t a ; t h e program provides a straightforward s u p p o r t o r d e n i a l o f s u c h an hypothesis *
METHODS Reaction
Definition *
The i n i t i a l s t e p i n c a r r y i n g o u t a r e a c t i o n on a structure or group of structures i s to define the reaction and a l l its characteristics* This includes definition o f a) the reaction site. or local e n v i r o n m e n t o f a m o l e c u l e w h i c h w i l l be a f f e c t e d by t h e r e a c t i o n ; b) t h e t r a n s f o r m * o r t h e m o d i f i c a t i o n s of the reaction site which yield the product(s); and c ) c o n s t r a i n t s on the r e a c t i o n s i t e , or features of the l o c a l o r remote environment which a r e e i t h e r n e c e s s a r y for the r e a c t i o n to occur or w i l l prevent i t from ο ccurring• The definition of the r e a c t i o n i s under t h e i n t e r a c t i v e c o n t r o l o f t h e u s e r o f CONGEN (unless the r e a c t i o n was p r e v i o u s l y d e f i n e d t o h i s / h e r s a t i s f a c t i o n and input from a library of reactions)* This introduces t h e need f o r a g e n e r a l , f l e x i b l e and s i m p l e language o f m o l e c u l a r s t r u c t u r e i n which the reactions can be expressed* We have adopted our general structure editor (EDITSTRUC^), together with the constraints mechanisms of CONGEN, for reaction definition* Thus, the language used i n CONGEN t o describe reactions i s an e a s i l y learned extension of the language needed t o c o n s t r u c t the o r i g i n a l setof candidate structures*
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
VARKONY ET AL.
Computer-Assisted
Structure
Elucidation
EDITSTRUC i s a g r a p h i c a l language for structure description* Each statement about rings, chains, b r a n c h e s , e t c * , r e s u l t s i n c o n s t r u c t i o n of a graphical representation of the statement, i n the form of a connection table* In t h i s r e s p e c t i t d i f f e r s from the ALCHEM language^ developed by Wipke, which allows (restricted) English language statements about a reaction; statements which are subsequently compiled i n t o an i n t e r n a l f o r m used to c a r r y out the r e a c t i o n * Our g r a p h i c a l r e p r e s e n t a t i o n has t h e a d v a n t a g e that i t c a n be u s e d d i r e c t l y t o c a r r y o u t t h e r e a c t i o n because all s t a t e m e n t s are u n d e r s t o o d by the current graphmatcher/pathfinder/cycle-finder and structure manipulation functions within CONGEN* Also, the i m p o r t a n t f e a t u r e of t h e symmetry o f t h e r e a c t i o n can be c o m p u t e d from t h e s e g r a p h i c a l representations (see below)* The full complement of the structural c o n s t r a i n t s a v a i l a b l e i n CONGEN c a n be b r o u g h t t o b e a r t o d e s c r i b e t h e r e a c t i o n i n more d e t a i l * This includes the capability of specifying substructures of any complexity, variable length bridges or chains, arbitrary atom names or bond orders, proton d i s t r i b u t i o n and r i n g s i z e s t o be good o r bad f o r the reaction* W i p k e ' s ALCHEM language i s c u r r e n t l y more complete, because it allows s p e c i f i c a t i o n of three d i m e n s i o n a l p r o p e r t i e s and t h e use o f c e r t a i n Boolean connectives, relating constraints, which we do not c u r r e n t l y have i n CONGEN. An example illustrating the interactive definition of a r e a c t i o n and related constraints is presented in F i g * 3* The t e x t t y p e d by t h e user i s underlined* The r e a c t i o n i s d e h y d r o c h l o r i n a t i o n (8. -> 3_) , a s s u m e d t o be c a r r i e d o u t i n b a s i c conditions with the r e l a t i v e l y b u l k y t.-butoxide ion* For illustrative purposes, assume t h a t the s k e l e t o n J_0 i s known, and o n l y the placement of a c h l o r i n e atom a t one o f the methylene groups i s in question (candidate structures t h e n d i f f e r by t h i s p l a c e m e n t ) .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
198
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
/ED ITREACT NAME: DEHYDROCHLORI NAT I ON (NEW REACTION) "SITE >CHAIN 3 >ATNAME I C L >HRANGE 3 I 3 >SH0W NAME * DEHYDROCHLOR INATION ATOM TYPE ARTYPE. NEIGHBORS HRANGE 1 CL NON-AR 2 2 C NON-AR I 3 3 C NON-AR 2 1-3 >ADRAW DEHYDROCHLOR INAT ION:
(HRANGES NOT INDICATED)
CL-C-C >DONE
"CONSTRAINTS >BADLIST BADLIST CONSTRAINTS CONSTRAINT NAME:CCTBU CONSTRAINT NAME: >DONE_
"SHOW NAME:
DEHYDROCHLOR I NAT I ON
SITE: ATOM TYPE ARTYPE NEIGHBORS HRANGE 1 CL NON-AR 2 2 C NON-AR I3 3 C NON-AR 2 1-3 DEHYDROCHLOR INAT ION: (HRANGES NOT INDICATED) NON-C ATOMS: I- CL 1-2-3
'TRANSFORM >NDRAW DEHYDROCHLOR INAT ION: NON-C ATOMS: l->CL
(HRANGES NOT
INDICATED)
TRANSFORM: UNJOIN I 2 JOIN 2 3 DELATS I CONSTRAINTS:
1-2-3 BADLIST NAME CCTBU
>UNJOIN I 2 >JOIN 2 3 >ADRAW DEHYDROCHLOR INAT ION: CL C=C
(HRANGES NOT INDICATED)
CONSTRAINTS
"•DONE (DEHYDROCHLOR INAT ION
DEFINED)
(DEHYDROCHLOR INAT ION ADDED TO THE REACTION
LIST)
>DELATS I >DONE
Figure
3. An interactive session with CONGEN including definition of the reaction site the reaction transform (TRANSFORM) and constraints on the reaction site (CONSTRAINTS) for the example reaction, dehydrochlorination. A summary of the complete reaction is provided by the SHOW command. User responses to CONGEN are underlined (carriage-returns terminate each command).
(SITE),
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
VARKONY ET
AL.
Computer-Assisted
Structure
Elucidation
199
Reaction S i t e • The r e a c t i o n s i t e represents the segment o f a m o l e c u l e w h i c h w i l l be transformed* The segment i n c l u d e s the atoms a c t u a l l y involved i n the reaction transformation together with any other structural features necessary for the reaction to occur* The site is defined using the appropriate EDITSTRUC commands* The r e a c t i o n (8. -> 3_) i n v o l v e s t h e r e m o v a l of the e l e m e n t s o f HC1 from adjacent carbons. The CHAIN and ATNAME commands ( F i g . 3 ) d e f i n e t h e site, w h i c h i s drawn for i l l u s t r a t i o n (Fig. 3)* The HRANGE command requires that there be from one to t h r e e h y d r o g e n atoms on atom 3 , w h i c h i s obviously necessary f o r the e l i m i n a t i o n o f HC1. (The p r o g r a m actually is c a p a b l e of determining this i t s e l f by examination of the t r a n s f o r m , so t h e HRANGE i n f o r m a t i o n i s r e d u n d a n t . ) The atom numbers are critical parameters for the reaction. T h e s e numbers a r e " s t i c k y " i n t h e s e n s e t h a t they w i l l always be a s s o c i a t e d w i t h the same a t o m s . Subsequent d e f i n i t i o n of the r e a c t i o n transform itself w i l l make e x p l i c i t r e f e r e n c e t o t h e s e atom n u m b e r s . Reaction Transform. The reaction transform i s the actual series of s t r u c t u r a l m o d i f i c a t i o n s which, when a p p l i e d t o t h e atoms i n the r e a c t i o n site, yield the products. The user defines the transformation explicitly by m o d i f i c a t i o n s to the p r e v i o u s l y named site. The TRANSFORM command (Fig. 3) r e s t o r e s the actual connection table representing that site. Then, again u s i n g EDITSTRUC commands, t h e m o d i f i c a t i o n s to t h a t s i t e which e x p r e s s the r e a c t i o n are defined. In the example ( F i g . 3 ) , the r e a c t i o n i n v o l v e s l o s s of HC1 yielding a d o u b l e bond (8. -> 5 . ) , e x p r e s s e d as UNJOIN (break the C-Cl b o n d ) and JOIN t o form the new bond. The DELATS command deletes the c h l o r i n e atom as an inconsequential product. Reaction Site Constraints * These constraints r e f e r to f e a t u r e s of the m o l e c u l e ( o t h e r than those i n the r e a c t i o n s i t e ) which a f f e c t the reaction, either positively by a l l o w i n g i t to o c c u r or n e g a t i v e l y by p r e v e n t i n g i t from o c c u r r i n g * T h e s e f e a t u r e s may be i n the l o c a l environment of the r e a c t i o n s i t e or may be r e m o t e as i n the case o f an i n t e r f e r i n g or competing functionality elsewhere in the molecule. In the example ( F i g * 3), we know that this reaction w i l l be hindered by t h e e x i s t i n g t . - b u t y l g r o u p i n the skeleton (J_0 ) . We previously defined a s u b s t r u c t u r e named, a r b i t r a r i l y , CCTBU as t h e s t r u c t u r e JJ_* Placing this s u b s t r u c t u r e on Β A D L I S T ^ · i s i n t e r p r e t e d by CONGEN as " c a r r y out the r e a c t i o n t r a n s f o r m at the s i t e g i v e n by
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
200 the reaction encountered".
site,
except
when
ORGANIC
CCTBU
SYNTHESIS
(JJL) i s
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
12
The SHOW command ( F i g , 3 ) p r e s e n t s t h e u s e r w i t h a complete summary of the reaction i n i t s current definition• Carrying
Out t h e R e a c t i o n *
Product Constraints* I n many c a s e s , c e r t a i n ways of c a r r y i n g o u t a r e a c t i o n which a r e l e g a l a c c o r d i n g t o d e f i n i t i o n s of the reaction s i t e , constraints and t h e transform y i e l d p r o d u c t s which a r e u n d e s i r e d * In the e x a m p l e ( F i g * 3 ) we wish to avoid formation of double bonds a t t h e b r i d g e h e a d s ( B r e d t ' s r u l e ) * We s u p p l y , a s a BADLIST c o n s t r a i n t , t h e name of a superatom c a l l e d BREDT w h i c h i s p r e v i o u s l y d e f i n e d a s s u b s t r u c t u r e 1 2* The s t a r r e d a t o m s r e p r e s e n t "linknodes" and a r e used t o represent a p a t h o f atoms o f a g i v e n l e n g t h o r range o f lengths* The u n s t a r r e d atoms i n substructure J_2 a r e the bridgeheads, the linknodes the three associated paths* The d o u b l e bond i n J _ 2 i s t o one o f t h e bridgehead atoms, completing an e x p r e s s i o n of the constraint• Applying t h e T r a n s form• Actual use of the reaction transform i s straightforward with the exception o f some f e a t u r e s t o a l l e v i a t e t h e p r o b l e m s o f duplication ( s e e below)* The p r o g r a m examines each structure i n the l i s t of structures t o which the r e a c t i o n i s applied f o r the presence o f the r e a c t i o n site. If a site(s) i s f o u n d , and t h e s t r u c t u r e obeys a l l r e a c t i o n c o n s t r a i n t s then t h e r e a c t i o n t r a n s f o r m i s a p p l i e d t o t h e s t r u c t u r e , once f o r each unique s i t e , and a p r o d u c t i s c r e a t e d f o r each application* Then, i f t h e user has s p e c i f i e d a multi-step reaction, the p r o d u c t m o l e c u l e may be t e s t e d a g a i n f o r the presence of a d d i t i o n a l r e a c t i o n s i t e s and t h e r e a c t i o n c a r r i e d out again* This e f f e c t i v e l y allows us t o e m u l a t e a reaction w h i c h has been c a r r i e d out with a specific
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
9.
VARKONY ET AL.
Computer-Assisted
Structure
Elucidation
201
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
molar r a t i o of r e a g e n t to starting material; i f only one mole o f r e a g e n t was used, the procedure can be stopped after a single application of a transform; alternatively it can be applied to completion (exhaustively)• C o n s i d e r the r e a c t i o n summarized i n F i g u r e 3 and i t s e f f e c t s on s t r u c t u r e s J_3. - J_5* I n s t r u c t u r e J_3. t h e r e a c t i o n s i t e m a t c h e s t w i c e , o n c e a t C-6,7, o n c e a t C7,8* The r e a c t i o n i s n o t c a r r i e d o u t , h o w e v e r , b e c a u s e both r e a c t i o n s i t e s v i o l a t e the undesired environment r e p r e s e n t e d by JJ_. F o r J_4, t h e r e a c t i o n s i t e matches once at C-6,12 and t h e reaction is carried out* But the product c o n s t r a i n t BREDT on B A D L I S T (J_2) r e j e c t s t h e B r e d t ' s r u l e v i o l a t o r J_6, r e s u l t i n g i n no p r o d u c t s for s t r u c t u r e _1_4* The r e a c t i o n s i t e f i t s t w i c e i n 15. at and C-4,5, and b o t h f i t t i n g s yield products, 17 and J_8, r e s p e c t i v e l y *
Duplication
Among
Products
of
a. R e a c t i o n *
When a reaction is a p p l i e d to a given list of structures, i t is frequently true that some p r o d u c t s t r u c t u r e s o c c u r many t i m e s i n t h e "raw" products l i s t * In mechanistic studies, this is the desired result because each occurrence of a product represents a unique r e a c t i o n pathway (see R e s u l t s and Discussion)* In s t r u c t u r e e l u c i d a t i o n s t u d i e s , though, the i m p o r t a n t information is the chemical identity of, not the p a t h w a y s t o , e a c h p r o d u c t , and i n s u c h applications i t is necessary to e l i m i n a t e d u p l i c a t e structures* This i s not a s i m p l e m a t t e r b e c a u s e a l t h o u g h structures are chemically equivalent their representations within the
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED
202
ORGANIC
SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
program may be different (.§_•&• , t h e atoms may be numbered differently from one representation of a structure to the next)* One method of duplicate e l i m i n a t i o n which a v o i d s c o s t l y atom-by-atom structure comparisons between a l l p a i r s of structures involves casting each product into a standard representation ("canonical f o r m " ) * Duplicates t h e n c a n be d e t e c t e d e a s i l y by d i r e c t comparison of these representations* The canonicalization process is relatively timec o n s u m i n g , t h o u g h , and i t i s d e s i r a b l e t o e x p l o r e more efficient methods of duplicate e l i m i n a t i o n wherever possible• One type o f d u p l i c a t i o n which c a n be d e t e c t e d without recourse to canonicalization is symmetry d u p l i c a t i o n , w h i c h c a n a r i s e e i t h e r when the r e a c t i o n itself p o s s e s s e s some symmetry o r when the s t a r t i n g structure i s symmetrical* Our g r a p h - m a t c h i n g algorithm which i s r e s p o n s i b l e for locating possible f i t t i n g s of t h e r e a c t i o n s i t e w i t h i n a m o l e c u l e t a k e s no a c c o u n t o f symmetry* F o r example, suppose the r e a c t i o n i s the a d d i t i o n of one mole of hydrogen t o an alkene* The r e a c t i o n s i t e h e r e i s J_2. a n d t h e t r a n s f o r m a t i o n i s J_2. -> 20»
c=c 1 2 IS
22
c-c 1 2 20
21
23
C-C 1 2 19
C—(C N)
\/ Ν
28
2 1 24
C=C 1 2 25
C-C—OH 1 2 26
C—C
V Ν
29
C—Ν
\/ Ν
30
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
VARKONY E T AL.
Computer-Assisted
Structure
Elucidation
203
B e c a u s e a t o m s 1 and 2 a r e e q u i v a l e n t i n both the reaction s i t e J_2. and t h e transformed site ( . 2 0 . ) , the reaction has 2-fold symmetry* If the reacting structure i s 2J_, w h i c h i t s e l f has a t w o - f o l d symmetry p l a n e , then the f o u r m a t c h i n g s 22. - 23. a l l y i e l d t h e same product, cyclohexene* These f o u r matchings are members of an equivalence class determined by t h e symmetries of the r e a c t i o n and t h e reacting molecule* If the reaction were unsymmetrical, say with a transform of J_2_ - > 2 J > (this would be a hydration r e a c t i o n ) , t h e n t h e r e w o u l d be two equivalence classes among t h e m a t c h i n g s 2 _ 2 - 23.9 one c o n t a i n i n g 22 and 23 (each of t h e s e w o u l d l e a d t o e y e l o h e x e n - 3 - o l ) and one c o n t a i n i n g 23. and 2Ά (each y i e l d i n g eyelohexen-4-ol)* If the reacting molecule also had an u n s y m m e t r i c a l s t r u c t u r e , s a y £2, t h e n e a c h m a t c h i n g would constitute a separate one-element equivalence class, and f o u r d i s t i n c t s t r u c t u r e s would result* The g e n e r a l problem, then, i s to eliminate a l l b u t one member o f e a c h e q u i v a l e n c e c l a s s p r e s e n t i n t h e complete set of matchings* T h i s i s a form of the socalled double coset problem of combinatorial m a t h e m a t i c s w h i c h has b e e n d i s c u s s e d p r e v i o u s l y i n the context of c o n s t r u c t i v e graph l a b e l i n g . Our s o l u t i o n c o n s i s t s o f two p a r t s * F i r s t , b e f o r e the matchings are c a l c u l a t e d , a c r i t e r i o n i s d e f i n e d f o r o r d e r i n g any s e t of matchings* This criterion provides for the comparison of two matchings and, based upon the c o r r e s p o n d e n c e o f r e f e r e n c e numbers o f t h e a t o m s i n t h e r e a c t i n g s t r u c t u r e t o r e f e r e n c e numbers i n t h e r e a c t i o n site, d e f i n e s one matching to be " s m a l l e r " than the other* Second, as each matching is o b t a i n e d , the symmetry g r o u p s of b o t h t h e r e a c t i o n and the r e a c t i n g m o l e c u l e a r e used t o form a l l p o s s i b l e symmetry i m a g e s of the matching* I f t h e m a t c h i n g i s " s m a l l e r " t h a n any of these symmetry images, i t is kept as the r e p r e s e n t a t i v e of i t s e q u i v a l e n c e c l a s s * Otherwise i t is discarded as being a duplicate of some representative elsewhere in the complete set of matchings * The symmetry of the reacting molecule is a p r o p e r t y o f i t s s t r u c t u r e and c a n be c o m p u t e d p r i o r to the matching* The symmetry o f the r e a c t i o n depends u p o n t h e p r o p e r t i e s (.e.g.* , atom names, a l l o w a b l e r a n g e s o f h y d r o g e n s ) and i n t e r c o n n e c t i o n s of the " k e y " atoms in the reaction site, and upon the transform modifications* H e r e , "key" atoms a r e t h o s e atoms w h i c h are actually altered by the application of the
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
STRUCTURES
Figure 4. Development and pruning of a reaction sequence tree. Candidate structures are 31-39. Interconnecting Unes and the size of a structure convey information on the fate of each candidate. Broken lines pointing to small structures mean that the product(s) and its predecessor(s) are invalid and would be removed by constraints. Regular lines mean the product(s) and its associated candidate structure remain after reaction 1; medium lines connect products and associated structures which are viable after reaction 2; and heavy lines indicate the products from the one structure, 38, which survives after all constraints are applied.
CANDIDATE
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
VARKONY E T A L .
Computer-Assisted
Structure
Elucidation
205
t r a n s f o r m to the s i t e as o p p o s e d t o atoms which are n e c e s s a r y t o , but do n o t d i r e c t l y p a r t i c i p a t e i n , the reaction* I n some c a s e s the p r o p e r t i e s of t h e s e key a t o m s , o r t h e o r d e r s o f t h e b o n d s b e t w e e n them, c o v e r a r a n g e o f p o s s i b i l i t i e s a s , f o r e x a m p l e , i n t h e use o f a special atom name which w i l l match any non-hydrogen atom, or of a bond o r d e r o f "any" which w i l l match a bond o f any multiplicity* In such cases it i s not a l w a y s p o s s i b l e t o c o m p u t e an o v e r a l l symmetry f o r the r e a c t i o n as a s e p a r a t e e n t i t y , b u t o n l y t h e symmetry i n the context of a particular matching* For example, c o n s i d e r the hypothetical reaction s i t e 2_8 where the "polyname" (C N) represents an atom which can be either C or N* This site really represents two p o s s i b i l i t i e s , 2_9. and 3.0., e a c h of which has two-fold symmetry. O n l y a f t e r a m a t c h i n g has b e e n obtained can it be determined which of the two possibilities p e r t a i n s and thus which symmetry i s appropriate. In these cases i t i s p o s s i b l e at l e a s t to define, before matching, a set of p o s s i b l e r e a c t i o n symmetries which may be applicable* Then for each matching i t is necessary only to make t h e a p p r o p r i a t e s e l e c t i o n from this set, not to recompute completely the reaction symmetry * Development. Sequence Tree•
Indexing
and
Pruning
of
the
Reaction
A reaction sequence may be of arbitrary complexity* A convenient representation for describing a r e a c t i o n sequence i s a t r e e s t r u c t u r e * We illustrate the d e v e l o p m e n t and i n d e x i n g of a r e a c t i o n sequence tree in Figure 4 * We assume f o r this example t h a t there are nine candidate s t r u c t u r e s ( _ 3 J _ - 33.) f o r an unknown w h i c h i s a 1 , 1 ,-eyeloheptane diol, possessing no g e m - d i o l f u n c t i o n a l i t y . In the example (Fig* 4 ) we present the results (and their ultimate c o n s e q u e n c e s ) o f t h e a p p l i c a t i o n o f two r e a c t i o n s in a s t e p w i s e manner, a single-step oxidation (reaction 1 ) followed by a dehydration (reaction 2 ) * A third reaction, exhaustive dehydration, is also a p p l i e d to t h e s e t o f c a n d i d a t e s t r u c t u r e s (3A. - 3 9 ) * A r e a c t i o n sequence t r e e has several important features* B r a n c h i n g of the t r e e o c c u r s w h e n e v e r more than one reaction is applied to a single set of s t r u c t u r e s (or products), e..jg.* , r e a c t i o n s 1 and 3 , F i g * 4* There is not necessarily a one-to-one correspondence of s t r u c t u r e s to products* First, a g i v e n r e a c t i o n may p r o d u c e more t h a n one p r o d u c t f r o m a
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
206
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
given structure, e i t h e r : 1) b e c a u s e t h e r e a c t i o n s i t e applies more t h a n once to the s t r u c t u r e (§.*&* > two products a r e produced from ϋ , 3Λ a n d 3_6 39 by reaction 1 ) ; o r 2) because i t i s a fragmentation reaction* S e c o n d , i t may be p o s s i b l e to obtain the same p r o d u c t f r o m two d i f f e r e n t s t r u c t u r e s , 61 . 62 a n d .64 a r e e a c h produced from t h e same r e a c t i o n a p p l i e d t o two d i f f e r e n t s t r u c t u r e s , J*i a n d j>J_, Hfi a n d 54. and 4_0 and respectively)* It i s possible to develop the complete r e a c t i o n s e q u e n c e t r e e by a p p l y i n g a p l a n n e d s e r i e s o f r e a c t i o n s t o t h e c a n d i d a t e s t r u c t u r e s b e f o r e any l a b o r a t o r y work i s a c t u a l l y done* In r e a l a p p l i c a t i o n s , however, t h e tree would be developed in a stepwise manner by c a r r y i n g out a r e a c t i o n i n the laboratory, acquiring data on t h e p r o d u c t s a n d then t u r n i n g t o CONGEN t o explore the implications of this information* We a t t e m p t t o i l l u s t r a t e what i s a v e r y dynamic p r o c e s s w i t h t h e s t a t i c f o r m o f F i g u r e 4* Each r e a c t i o n y i e l d s a s e t o f p r o d u c t s which a r e indexed by pointers to t h e i r precursors* These pointers are maintained a t each step i n the e x p a n s i o n of the t r e e so t h a t i n f o r m a t i o n (constraints) applied to products a t any level automatically results i n appropriate a c t i o n ( p r u n i n g away undesired structures) a t a l l l e v e l s below and above t h e g i v e n level* There a r e s e v e r a l types of c o n s t r a i n t s which can be a p p l i e d t o s t r u c t u r e s i n t h e r e a c t i o n s e q u e n c e t r e e * One constraint is a minimum t o maximum number o f products* In the l a b o r a t o r y , o x i d a t i o n ( r e a c t i o n 1) of t h e unknown structure yielded two structures* Applying the oxidation to the s e t of candidate structures (3_1 - 3.2.) y i e l d s two products from each s t r u c t u r e e x c e p t ^ 2 , 23. a n d 23.9 w h i c h are, therefore, r e j e c t e d a s c a n d i d a t e s by CONGEN* ( T h e s t r u c t u r e s w h i c h a r e s i n g l e p r o d u c t s o f ^2., 23 a n d 23 p r o d u c e d by CONGEN p r i o r t o a p p l i c a t i o n o f t h e c o n s t r a i n t a r e .42., 4jJ a n d 46. respectively*) W i t h no further c o n s t r a i n t s , the remaining candidate structures are s t i l l viable* 1 2
Any of the e x i s t i n g structural constraints i n CONGEN c a n be a p p l i e d t o p r o d u c t s a t a n y s t e p * In the laboratory, t h e two products of reaction 1 were s e p a r a t e d and e a c h s u b j e c t e d to a dehydration (reaction 2)* A major component o b t a i n e d from e a c h p r o d u c t was an α , g-unsaturated ketone* Applying a GOODLIST" constraint expressing this observation, structures -
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
9.
VARKONY E T AL.
Computer-Assisted
Structure
Elucidation
207
60 a r e p r u n e d away by CONGEN, l e a v i n g o n l y the α , 3 u n s a t u r a t e d k e t o n e s 6J_ - _6_4• This pruning also r e s u l t s i n the r e j e c t i o n of products 4 J _ , Jj_4, A £ > Hi and JJ3 a t the p r e v i o u s l e v e l because they d i d not y i e l d α , 3 unsaturated ketones* Rejecting these leads, i n turn, t o r e j e c t i o n o f 2A> 2Λ a n d 3J5 candidate structures because t h e i r products of r e a c t i o n 1 d i d not both y i e l d an ot , 3 - u n s a t u r a t e d k e t o n e * T h i s l e a v e s o n l y 21 - 23 as c a n d i d a t e s t r u c t u r e s *
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
a
s
R e a c t i o n s c a n a l s o be c a r r i e d o u t e x h a u s t i v e l y by r e p e t i t i v e a p p l i c a t i o n o f the r e a c t i o n u n t i l there are no more r e a c t i o n s i t e s r e m a i n i n g i n t h e m o l e c u l e . This is illustrated i n t h e example by reaction 3, an exhaustive dehydration* In the laboratory, this reaction yielded three different dienes* Without further elaboration of the s t r u c t u r e s of these products, the correct s t r u c t u r e c a n be a s s i g n e d a s 2Â b e c a u s e 21 and 23 y i e l d o n l y one p r o d u c t ( t h e same o n e , 66) * I n c a r r y i n g o u t t h e r e a c t i o n w i t h CONGEN, 6_5. and 67 a r e r e j e c t e d by a BADLIST constraint forbidding aliènes* P r o d u c t s J56>, 6_8 and 6_2. a r e p r o d u c e d from 3 8 . the f i n a l s t r u c t u r e . When reactions are r e l a t i v e l y simple and w e l l u n d e r s t o o d , t h e k i n d o f p r u n i n g d e s c r i b e d above c a n be used* As l o n g a s one has c o n f i d e n c e i n t h e r e a c t i o n p r o c e e d i n g as d e f i n e d , t h e n one c a n u s e t h e p r e d i c t e d r e s u l t s a s p o w e r f u l c o n s t r a i n t s on t h e p r o d u c t s and a l l i n t e r r e l a t e d s t r u c t u r e s i n the tree* Otherwise, there would be no grounds f o r r e j e c t i n g a structure* A c h a r a c t e r i s t i c o f m e c h a n i s t i c s t u d i e s , however, i s t h a t t h e g e n e r a l d i r e c t i o n o f t h e r e a c t i o n i s known, but i n insufficient detail to rule out a m u l t i p l i c i t y of products* O t h e r w i s e one could p r e d i c t the products a p r i o r i and t h e r e w o u l d be no p r o b l e m * In a d d i t i o n , of c o u r s e , one b e g i n s w i t h a s i n g l e , known s t r u c t u r e and i t i s nonsense t o use the p r u n i n g mechanism described above. When we p r u n e t h e r e a c t i o n s e q u e n c e t r e e for a m e c h a n i s t i c problem we p r u n e o u t i n d i v i d u a l p a t h w a y s * Rejection of a p a r t i c u l a r product i n the tree prunes away a l l s t r u c t u r e s w h i c h p o i n t o n l y t o i t o r t o w h i c h only i t points. I f a s t r u c t u r e has a n o t h e r source or an a l t e r n a t i v e f a t e , i t i s r e t a i n e d . In the process o f f o c u s i n g i n on t h e s t r u c t u r e s o f unknown products of c y c l i z a t i o n s o r r e a r r a n g e m e n t s we a l s o f o c u s i n on t h e p o s s i b l e pathways o f f o r m a t i o n .
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
208 RESULTS AND
DISCUSSION
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
A) Ln Example of Application of Reaction Sequences to M e c h a n i s t i c Problems• There are at l e a s t two ways t o a p p l y t h e r e a c t i o n s e q u e n c e c a p a b i l i t i e s o f CONGEN t o m e c h a n i s t i c problems* Some o f t h e p r o c e d u r e s d i s c u s s e d s u b s e q u e n t l y c a n be c a r r i e d o u t with current computer-assisted s y n t h e s i s programs whenever a single compound r e p r e s e n t s t h e s t a r t i n g p o i n t * One a p p l i c a t i o n of reaction sequences i n v o l v e s detailed mechanistic s t u d i e s of p o s s i b l e r e a r r a n g e m e n t s of a particular compound* If a mechanism is t o be e l u c i d a t e d i n d e t a i l , i t i s i n s u f f i c i e n t t o know m e r e l y that one compound can c o n v e r t to another* One must a l s o know the i d e n t i t y of e a c h atom i n v o l v e d and i t s fate i n the reaction. This i s normally followed by tagging v a r i o u s atoms i n the starting m a t e r i a l with isotopic or s u b s t i t u e n t labels* This requirement i s t r a n s l a t e d i n the computer program to the f a c i l i t y for retaining structures ( a t t h e same level in the tree) which are f o r m a l l y d u p l i c a t e s but which are in fact d i f f e r e n t i n terms of the numbering of the atoms ( s e e Methods s e c t i o n ) * U s i n g t h i s a p p r o a c h the f a t e of each atom at each s t e p of a sequence c a n be traced* We think that this c a p a b i l i t y w i l l help a user to d e s i g n the best places for l a b e l l i n g the starting material, b a s e d on t h e c o m p u t e r ' s s i m u l a t i o n of the c o u r s e of a reaction* C o n s i d e r , as a b r i e f e x a m p l e , possible 1,2alkyl shifts in structure J_0, under constraints forbidding formation of 3 and 4 membered r i n g s and methyl groups* Although there are only three new structures (in the absence of l a b e l l i n g ) produced i n the first st^p, there are eight different ways o f p e r f o r m i n g the s h i f t to y i e l d four pairs of f o r m a l l y e q u i v a l e n t s t r u c t u r e s , two o f w h i c h , 70a and 70b. are formally t h e same as t h e starting material (10.) but have different numberings* Each of the three new skeletons appears as a pair of s t r u c t u r e s (71 a.b. 72a , b . and 73a b ) * The members o f e a c h p a i r could be d i s t i n g u i s h e d by a p p r o p r i a t e l a b e l l i n g o f £0* Because there i s a one-to-one c o r r e s p o n d e n c e between t h e atom numbers o f t h e p r o d u c t s and t h e s t a r t i n g material, 70 « it is simple to visualize the course of each rearrangement * t
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
9.
VARKONY ET
AL.
Computer-Assisted
Structure
Elucidation
209
Figure 5. The eight structures, unique in terms of the numbering of their atoms, are produced by a single 1,2-alkyl shift of 70. Transitions involving bridgehead carbonium ions were allowed, but three- and four-membered rings were forbidden.
A second a p p l i c a t i o n of reaction sequences to mechanistic s t u d i e s i n v o l v e s r e a c t i o n s where one needs to explore possible products and interconversion pathways, but without regard to preserving identities o f atoms. An e x a m p l e of t h i s type of r e a c t i o n i s the c l a s s i c problem of the i n t e r c o n v e r s i o n of isomers of 10 16 adamantane. This problem has been the s u b j e c t of s e v e r a l recent a r t i c l e s using computer-based approaches to help e l u c i d a t e the course of v a r i o u s interconversions 3 . A characteristic of such problems i s that they i n v o l v e r e a c t i o n s which are run to completion because the reaction and associated c o n d i t i o n s do n o t a l l o w s t o p p i n g the r e a c t i o n after a p r e c i s e number o f s t e p s have t a k e n place. Generally, c y c l i z a t i o n s take place u n t i l further plausible sites for cyclization are exhausted; rearrangements take place until there is no change in the r a t i o s of products. Reaction c a p a b i l i t i e s o f CONGEN can model such reactions. Cyclizations usually involve only a s m a l l number o f s t e p s , so this represents no s p e c i a l problem. Rearrangements, however, can proceed i n d e f i n i t e l y i f no s t o p p i n g c o n d i t i o n i s s p e c i f i e d . In the program we carry reactions to completion by a p p l y i n g t h e r e a c t i o n one s t e p a t a t i m e , s t o p p i n g when no new structures, compared to a l l those produced p r e v i o u s l y , are encountered. Any f u r t h e r s t e p s would be c i r c u l a r and y i e l d no new p r o d u c t s or pathways. C
H
t
o
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Figure
6.
Complete interconversion bridgehead carbonium
map for C10H16 ring systems ions (solid lines); pathways
devoid of three- and four-memhered rings. Pathways involving bridgehead carbonium ions (dashed lines).
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
involving
no
to
GO
V3
ΚΛ
Ω
hH
il
C/î
>
ι
ο S
Ο
h-» Ο
9.
VARKONY E T A L .
Computer-Assisted
Structure
Elucidation
211
Because f i v e structures ( i l , U , U> - i 8 ) were missing from t h e o r i g i n a l s e t o f adamantane i s o m e r s considered by W h i t l o c k and Siefkin ^, completion of t h e i r i n t e r c o n v e r s i o n map r e p r e s e n t s a good e x a m p l e f o r a mechanistic application. Although i t i s possible to do t h i s problem as o u t l i n e d above, b e g i n n i n g with a specific precursor and running the reaction to completion, i n fact t h e r e i s a much simpler way t o develop the complete i n t e r c o n v e r s i o n map* Under t h e structural constraints p r e s e n t e d ^, there a r e 21 possible isomers • Whenever the complete set of possibilities is available, the complete i n t e r c o n v e r s i o n map c a n be g e n e r a t e d by subjecting a l l p o s s i b i l i t i e s (21 i s o m e r s ) t o one s t e p o f t h e r e a c t i o n , i n t h i s case a 1,2-alkyl s h i f t * The r e v e r s e reaction i s i m p l i c i t i n t h i s s t e p and a l l p o s s i b l e pathways from one s t r u c t u r e t o o t h e r s a r e e s t a b l i s h e d * The c o m p l e t e i n t e r c o n v e r s i o n map i s shown i n F i g u r e 6* 5
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
1
1
Earlier work ^ demonstrated that conversion of tetrahydrodicyclopentadiene ( i O ) t o a d a m a n t a n e (_8Z_) was not p o s s i b l e w i t h o u t i n v o k i n g f o r m a t i o n o f a b r i d g e h e a d carbonium i o n . The e x i s t e n c e o f a d d i t i o n a l s t r u c t u r e s ( 7 1 , 74. 7 6 - 7 8 ) w h i c h l i e i n t h e p a t h o f c o n v e r s i o n o f 70 t o a d a m a n t a n e (£2_) meant t h a t t h i s q u e s t i o n must be reinvestigated* We c a r r i e d o u t t h e a b o v e r e a r r a n g e m e n t reaction under the c o n s t r a i n t s o f no formation of bridgehead carbonium ions, using as d e f i n i t i o n s o f bridgeheads those s e l e c t e d p r e v i o u s l y ^* The r e s u l t s are depicted i n Figure 6* We o b t a i n a f o u r component map i f t h e d a s h e d l i n e s ( p a t h w a y s i n v o l v i n g bridgehead c a r b o n i u m i o n s ) a r e removed* One c o m p o n e n t i s i8., t h e second i s U , t h e t h i r d i s 10 - Β a n d J3_ - ϋ , and t h e fourth is ϋ - 5_0* Although our complete r e s u l t s i n d i c a t e t h a t t h e r e a r e a l t e r n a t i v e pathways from i O t o a d a m a n t a n e (R7) n o t c o n s i d e r e d p r e v i o u s l y , the conclusion o f W h i t l o c k and S i e f k i n ^ t h a t a t l e a s t one b r i d g e h e a d carbonium i o n i s required f o r the conversion, under the given s t r u c t u r a l constraints, i s v e r i f i e d * 1
B* An Example of A p p l i c a t i o n of Reaction Sequences to a Structure Elucidation P r o b l e m * The structure elucidation of c o r i o l i n (whose p r o p o s e d structure is 2J_) , a sesquiterpene antibiotic, represents a problem where structural information i n f e r r e d from c h e m i c a l r e a c t i o n s p l a y e d a crucial role in the tentative s o l u t i o n * Although i t i s possible i n this case to translate a l l structural inferences d e r i v e d from o b s e r v a t i o n s on t h e r e a c t i o n p r o d u c t s b a c k to constraints on the complete set of s t r u c t u r a l
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
212
possibilities, i t is difficult to do using the c o n s t r a i n t s mechanism i n CONGEN* In fact i t i s much s i m p l e r and much more i n t u i t i v e , c h e m i c a l l y , t o use the reaction sequence f e a t u r e s o f CONGEN to e x p r e s s the reaction, obtain products, test the products with constraints and have the program automatically d e t e r m i n e w h i c h c a n d i d a t e s t r u c t u r e s a r e p l a u s i b l e as a result. Extensive spectroscopic data revealed that coriolin has an empirical formula ^15^20^5 * is c o m p o s e d o f f i v e s t r u c t u r a l f r a g m e n t s , jj_2 These fragments comprise a l l o f the atoms i n the empirical formula, so that the f r e e valences (bonds w i t h an u n s p e c i f i e d t e r m i n u s ^ ) o f 5_2 - 3_6 must a l l be c o n n e c t e d t o o t h e r , n o n - h y d r o g e n atoms* a n c
The s t r u c t u r a l a n a l y s i s o f c o r i o l i n u s i n g CONGEN and reaction sequence information provides some interesting examples of the different ways both s u b s t r u c t u r a l and c h e m i c a l i n f e r e n c e s can be utilized to help solve the problem* If chemical experiments have a l r e a d y been c a r r i e d out i n the laboratory, then f r e q u e n t l y some o f t h e i n f e r e n c e s c a n be u s e d i n CONGEN wjiile c o n s t r u c t i n g structures* For example, b a s e d on superatoms 2_6 with the constraint of no additional multiple bonds, t h e r e are more than 800 possible structures* But the chemical evidence'^ r e v e a l s t h a t t h e s t r u c t u r e p o s s e s s e s a t l e a s t one four, one f i v e and one s i x membered r i n g * Even though the precise environment of these rings cannot e a s i l y be s p e c i f i e d u n t i l the r e a c t i o n sequence i s c a r r i e d out i n CONGEN, t h e number o f each r i n g of each s i z e can be used as a constraint, resulting in 56 structural candidates p r i o r to chemical r e a c t i o n s * The NMR data do n o t r e v e a l the presence of c y c l o p r o p y l hydrogens* This constraint further reduces the number of p o s s i b i l i t i e s to 52* 9
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
9.
VARKONY E T A L .
In reactions H CORIOLIN
Computer-Assisted
Structure
the laboratory the was c a r r i e d o u t :
Elucidation
following
213 sequence
of
L i A l H j|
2
>DIHYDROCORIOLIN
>HEXAHYDR OCORIOL IN
i
CrO
ι
V triketone ο
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
/
3
OH \
CH—C 1 °-° 2 97
A
CH—CH 1
ζ>
°2
98
The first reaction reduced the ketone f u n c t i o n a l i t y i n c o r i o l i n ( 5 J . ) t o an a l c o h o l . Carrying out this reaction i n CONGEN y i e l d s t h e e x p e c t e d 52 products. The s e c o n d r e a c t i o n o p e n e d t h e two e p o x i d e f u n c t i o n a l i t i e s , y i e l d i n g two new h y d r o x y l g r o u p s , b o t h of which a r e t e r t i a r y . T h i s a l l o w s a r e a c t i o n s i t e and t r a n s f o r m t o be d e f i n e d w h i c h e x p r e s s t h i s observation, 2J. -> 23.) a more e f f i c i e n t p r o c e d u r e t h a n o p e n i n g b o t h e p o x i d e s b o t h ways f o l l o w e d by p r u n i n g t h e p r o d u c t list. The HRANGE r e s t r i c t i o n , ( i * e _ . , no hydrogens, or H Q _ ) , on atom 1 o f 3J_ r e s u l t s i n f o r c i n g the epoxide t o open to a t e r t i a r y a l c o h o l (£8.)» The e x p e c t e d 52 p r o d u c t s a r e o b t a i n e d i n CONGEN. The final reaction r e s u l t e d i n oxidation of the three secondary a l c o h o l functionalities to keto groups. Spectroscopic data suggested that one o f t h e k e t o g r o u p s was i n a f o u r membered r i n g , one i n a f i v e and one i n a s i x membered ring. This constraint c a n be i n v o k e d on the o r i g i n a l c a n d i d a t e s t r u c t u r e s f o r c o r i o l i n o n l y by a c o m p l i c a t e d case a n a l y s i s on the possible environment(s) of the o r i g i n a l ketone f u n c t i o n a l i t y . As a c o n s t r a i n t on t h e products of the f i n a l oxidation, however, i t isa straightforward t e s t whose r a m i f i c a t i o n s i n terms o f s t r u c t u r a l candidates are determined a u t o m a t i c a l l y by CONGEN. This r e a c t i o n , plus constraints, l e a v e s 20 s t r u c t u r a l candidates, the proposed s t r u c t u r e (JLL) A N D 19 others. We have examined these structures (automatically, using CONGEN) f o r the presence of > 0
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
214
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
naturally-occurring t r i c y c l i c sesquiterpane skeletons ' b e c a u s e i t was t h i s r e a s o n i n g by a n a l o g y w h i c h l e d to t h e p r o p o s a l o f 5_1 for coriolin. S t r u c t u r e 5_L is t h e only one o f the c a n d i d a t e s which possesses a known skeleton• In the group o f 20 s t r u c t u r e s there are f i v e , i n c l u d i n g 5J_, w h i c h obey a h e a d - t o - t a i l isoprene rule* The other four structures are - 102 * We h a v e recently investigated the scope of isomerism of terpenoid systems and find many examples i n the literature where additional structural possibilities exist b u t where structural assignment i s b a s e d on a n a l o g y w i t h known s y s t e m s * A l t h o u g h t h e r e may be good reasons f o r using a n a l o g y , no new terpenoid skeletons w i l l be d i s c o v e r e d t h i s way*
CONCLUSIONS We h a v e p r e s e n t e d an a p p r o a c h w h i c h i s c a p a b l e o f emulating many of the laboratory applications of sequences of chemical reactions* Integrated with the c a p a b i l i t i e s o f t h e CONGEN p r o g r a m t o s u g g e s t sets of candidate structures f o r an unknown compound, this a p p r o a c h has s i g n i f i c a n t l y i n c r e a s e d t h e power of the program to assist chemists in solving structure elucidation problems* We have used some b r i e f but i l l u s t r a t i v e e x a m p l e s t o show d i f f e r e n t a p p l i c a t i o n s o f r e a c t i o n sequences to both m e c h a n i s t i c and s t r u c t u r a l problems. As c o m p u t e r programs to aid i n structure elucidation develop further capabilities and become more widely available, we feel that they w i l l be utilized exactly as o t h e r analytical tools a r e used* CONGEN p r o v i d e s a means f o r v e r i f y i n g hypotheses about unknown s t r u c t u r e s and suggesting a l t e r n a t i v e s which m i g h t o t h e r w i s e be o v e r l o o k e d *
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
9. VARKONY ET A L .
Computer-Assisted Structure Elucidation
215
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
Our inability to utilize stereochemical i n f o r m a t i o n i s a shortcoming, but i t i s not a severe problem f o r many applications of reaction sequences. Most c h e m i c a l r e a c t i o n s u t i l i z e d t o p r o v i d e a d d i t i o n a l s t r u c t u r a l , as opposed t o s t e r e o c h e m i c a l , i n f o r m a t i o n a r e d e s i g n e d t o have broad a p p l i c a t i o n . The r e a c t i o n s must a p p l y i n a v a r i e t y o f p o s s i b l e s i t u a t i o n s because the environments of the f u n c t i o n a l i t i e s involved are u s u a l l y not p r e c i s e l y d e f i n e d . In addition, detailed relative or absolute stereochemical information i s seldom available until considerable detail of the structure i s known; r e a c t i o n s cannot under these c i r c u m s t a n c e s be s e n s i t i v e t o s t e r e o c h e m i s t r y .
LITERATURE CITED Part XXIII o f t h e series "Applications of Artificial Intelligence for Chemical Inference". F o r Part XXII s e e B.G. B u c h a n a n , D.H. S m i t h W.C. W h i t e , R . J . Gritter, E.A. Feigenbaum, J . Lederberg, and C. Djerassi, J. Amer. Chem. S o c . , in press. 2) We w i s h t o t h a n k t h e National Institutes of Health ( R R 0 0 6 1 2 - 0 5 ) for their g e n e r o u s financial support of t h e SUMEX c o m p u t i n g resource ( R R 0 0 7 8 5 - 0 3 ) on w h i c h CONGEN was d e v e l o p e d a n d is made available to outside users. 3) R.E. Carhart, D.H. S m i t h , H. Brown and C. Djerassi, J. Amer. Chem. Soc., 97, 5755 ( 1 9 7 5 ) . 4) E . J . C o r e y and W.T. W i p k e , Science, 166, 178 ( 1 9 6 9 ) . 5) " C a r b o n i u m I o n s " , Vol. I-IV, G.A. O l a h a n d P.v.R. Schleyer, eds., Wiley Interscience, New Y o r k , N.Y. (1968-1973). 6) E.E. v a n T a m e l e n , Accts. Chem. R e s . , 8, 152 ( 1 9 7 5 ) . 7) A d o c u m e n t describing t h e CONGEN program, including EDITSTRUC and all its associated structure manipulation capabilities, is available from t h e authors. 8) R.E. Carhart a n d D.H. S m i t h , C o m p u t e r s in Chemistry, in press. 9) W.T. Wipke and T.M. D y o t t , J. Amer. Chem. Soc., 96, 4825 ( 1 9 7 4 ) . 10) H.L. M o r g a n , J . Chem. Doc., 107 ( 1 9 6 5 ) . 11) L.M. Masinter, N.S. Sridharan, R.E. Carhart and D.H. S m i t h , J . Amer. Chem. Soc., 96, 7714 ( 1 9 7 4 ) . 12) This example is fictional and is used for illustrative purposes only. There are probably easier ways to distinguish the candidate structures in practice. Only structural isomers, as opposed to geometric isomers are presented. 1)
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
216 13)
COMPUTER-ASSISTED ORGANIC
H.W.
Whitlock
Soc., 90, 4929
and
M.W.
Siefken,
J.
Amer.
SYNTHESIS
Chem.
(1968).
14) E.M. Engler, M. Farcasiu, A. P.v.R. Schleyer, J . Amer.
Sevin, Chem.
J.M. Cense, and Soc., 95, 5769
(1973). 15)
R.E. Carhart, Sridharan, J.
D.H. Chem.
Smith, H. I n f . Comp.
Brown, Sci.,
and N.S. 15, 124
(1975).
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009
16) S. T a k a h a s h i , H. Iunuma, T. Takita, Κ. Maeda, a n d H. Umezawa, T e t . Lett., 4663 (1969). 17) T.K. D e v o n and A . I . Scott, "Handbook of Naturally Occurring Compounds. V o l . II. T e r p e n e s , " A c a d e m i c Press, I n c . , New Y o r k , N.Y., 1972. 18) D.H. S m i t h and R.E. Carhart, Tetrahedron, in press.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
10 Computerized Aids to Organic Synthesis in a Pharmaceutical Research Company D. R. EAKIN and W. A. WARR
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
Data Services Section, Imperial Chemical Industries Ltd., Pharmaceuticals Div., P.O. Box 25, Alderley Park, Macclesfield, Cheshire, England SK10 4TG
ICI Pharmaceuticals Division has been investigating the area of computerised aids to Organic Synthesis f o r many years. Our approach has been largely pragmatic and d i f f e r s from other systems i n two main areas. F i r s t l y , we have assumed the research chemist is the best innovator i n h i s specialised area we therefore want to provide facilities which will support h i s i n t e l l e c t u a l c a p a b i l i t i e s , hopefully by removing some of the hit and miss aspects of reaction synthesis planning by l e t t i n g computers do some of the more tedious operations. The second difference is our starting point. We felt many of the techniqu es already developed f o r the computerised manipulation of compounds should be capable of further exploitation in this area. In the process of designing a potential drug, a pharmaceut ical research chemist will have to make use of chemical information systems at various stages. He will have to f i n d answers to such questions as: Is my idea novel - is anyone else workinginthe same or similar areas? How can I make compounds of this type? What starting materials are available to prepare compounds of this type? This compound shows a c t i v i t y , what compounds o f s i m i l a r types are available in quantities large enough to test? Most in-house company information systems w i l l provide some computerised aids to help chemists with these types o f problem, mainly f a l l i n g into one of two classes. F i r s t l y , text searching f a c i l i t i e s on the general l i t e r a t u r e , patent 217 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
218
COMPUTER-ASSISTED ORGANIC SYNTHESIS
information, company reports, etc. enabling the chemist to eval uate the significance of an idea. In most systems, the chemist obtains a l i s t of, hopefully, pertinent references which he must then pursue and evaluate. Secondly, most compan ies provide some substructure search f a c i l i t i e s on their own compound files enabling the chemist to look for suitable starting materials or for compounds for biological evaluation.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
We, at ICI, have tried to extend these f a c i l i t i e s so that the chemists can readily obtain physical samples of compounds whether for synthetic purposes or not. A chemists chosen synthetic route often depends on the availability of intermed iates and we have tried to expose our chemists to a wide range of available coimpounds which are also readily accessible. In pharmaceutical research, chemists generally know what they want to make and often how they want to make i t . and so a good supply of compounds, whether from compound stocks or commercial suppliers, becomes a valuable support f a c i l i t y . However, the chemist i s s t i l l faced with problems i n reaction synthesis. In many cases, he cannot readily find answers to questions such as: How was A made? How can I make compounds similar to A? What examples have we of the type of reactions that make A? The chemist knows the product he wishes to make and must work backwards from there. He has various approaches open to him. Firstly, he can use conventional chemical information f a c i l i t i e s such as Chemical Abstracts. Beginning with the product he wishes to make, he can examine the name and molecular formula indexes to find papers relevant to his prod uct. By examining the abstract, he can find information on whether the compound was made by the author, and i f so, how. He must, however, always look for a specific compound and may involve himself i n a very time-consuming task. His second approach i s to recall a possible method of preparation and to conceive required molecular structures i n terms of pathways through known or even unknown precursors. Having decided on a chosen pathway, he f i r s t examines standard textbooks to find how a certain transformation may be achieved given the constraints imposed by his particular molecule. His next step i s to examine the availability of the starting material; i f necessary extending his pathway back until a suitable starting material i s found.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
10.
EAKiN
AND
WARR
Computerized
Aids to Organic
Synthesis
219
We at ICI Pharmaceuticals Division have been looking at ways i n which computerised chemical mforaiation systems can be used to help chemists plan reaction syntheses more easily. We feel the chemist needs more automatic ways of finding inform ation on: (1) Reaction pathways. (2) Reactant and product, particularly at the compound class level.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
We felt extensions to the existing CROSSBOW f a c i l i t i e s (1-5) could easily cope with the second factor, providing the correct data bases were available. We therefore set about examining ways of coding reaction pathways such that they could be analysed within a computer system. THE TOTAL DESCRIPTION APPROACH METHODOLOGY. Our f i r s t approach was to take as wide a variety of reactions as possible and code as much data about them as available, and then to draw conclusions as to how much of this data was in fact needed to satisfy user demands. We aimed to avoid applying chemical mechanistic knowledge when analysing a reaction, such that our analyses would be independ ent of the chemical knowledge of the encoder. To test our ideas, we chose about 700 literature-based reactions from a standard reference work (6) and recorded a wide variety of information for each reaction: (1) Compound details for reactants and products. WLN was used so that standard CROSSBOW f a c i l i t i e s could be used. (2) Bibliographic details, so that the chemists could locate the originating paper for f u l l details. (3) Reaction conditions, including information on reagents, catalysts, solvents, time, temperature, yield and so on. A dictionary of agreed abbreviat ions was set-up to code this type of data i n an essentially free text form. (4) Details of the reaction site, using a variety of coding systems. REACTION ANALYSIS. The reaction site was defined as the bonds formed or broken i n a reaction and the atoms which these bonds connect. Thus, in: 0
0
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
220
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
i t i s assumed that the O^CH^ bond i s broken, and our supposition would be unchanged even i f Ο*** labelling experiments showed that the COO bond were broken. We manually coded the following information for each reaction: (a) A summary reaction classification. The classific ation depends upon bonds and rings formed and was based on that used at Roussel-Uclaf (7). The twelve classes were not mutually exclusive and a reaction may f a l l into more than one class. For example, the Fischer Indole Synthesis:
This falls into two classes - one indicating the formation of the ring and the second the formation of the carbon-carbon atom. (b) Details of bonds broken and formed, excluding atomto-hydrogen bonds. No account i s taken of bonds which are merely modified, for example, carbon-carbon triple bonds result ing from carbon-carbon double bonds. Consider the reaction: 0
H
0
The carbon-carbon double bond across the ring fusion i s broken and two carbon-oxygen double bonds are formed. The bond inform ation would be coded as: -1C=C:+2C=0 (c) Details of individual rings broken and formed. Thus, the rings broken i n the above reaction would be expressed as: -(-55 BM and the ring formed as: + C-8VM EV The coding system i s based on modifications of WLN. The hyphen indicates that the ring i n question i s part of a larger ring system.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
10. E A K i N A N D WARR
Computerized
Aids to Organic
Synthesis
221
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
(d) Details of the reaction centre. Again, a system of coding was necessary to define the reaction centre, and again we used WLN with two basic modifications. Firstly, we needed to represent individual carbon atoms and we used a variety of symbols to represent the various alternatives. Secondly, we needed to indicate whether each atom was terminal, linking or branched. Using the same reaction, the reaction centre would be represented as: DD = &0Α/&0Α where "D" represents an unsaturated carbon atom at a ring fusion junction; "A" represents the ring carbon atom with an exocyclic double bond; "&" indicates the terminality of the oxygen atoms; "/" indicates the fragments are i n the same molecule (8). ASSESSMENT OF THE APPROACH. The obvious advantages of this approach l i e in the large amount of data held for each reaction, and the precise statement of the reaction site. We tried to avoid the need for subjective chemical judgements either i n coding or searching the f i l e . This i s not to say that chemical problems did not arise. We found that problems arose with tautomerism and mesomeric systems and with defining the reaction sites in certain re-arrangements. However, the reaction coding involved a substantial amount of labour and we had now to ascertain the value of this, and hence, the cost-effectiveness of the approach. The f i r s t stage was to compare the effectiveness of the system with the less labour-intensive system developed at Sheffield University (9, 10). In the latter case the recognit ion and generation of the reaction site is carried out by computer. A l l reactants and products involved in the reaction are input, and an algorithm analyses automatically the differen ce between reaction and product at the level of the small bondcentred fragments. The fragments which are found to be different are then re-assembled to give a skeletal reaction scheme. Six hundred of the reactions analysed manually were subjected to the automatic analysis of reactants and products. The automatic analysis dealt with 84$, but also suffered from some faulty analyses, particularly with acylation reactions. This was partly alleviated by using an extra routine in the analysis, although this produced further defective analyses. Also sufficient infoimation to allow precise searching was not always included. In contrast, the manual system dealt with a l l reactions, and did not suffer from undiscovered faulty analyses, but i t was found again that the coding does not always include
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
222
COMPUTER-ASSISTED ORGANIC SYNTHESIS
sufficient information to characterise the reaction. Neither of the methods was thought ideal, consistently including a l l iinportant details about the reaction. The most significant omission from a purely reaction site analysis was the position and effect of remote groups. The second stage of the evaluation was to look at the needs of the user population. Discussions with chemists led us to believe that chemists would prefer a chemical classification and a mechanistic approach.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
We therefore looked at a system based on reactant and product information coupled with a generalised reaction classification. A SYSTEM BASED ON REACTION CLASSIFICATION METHODOLOGY. Our second approach to reaction indexing was as down-to-earth and practical as the f i r s t was academic. We s t i l l stored the compound information on reactants and products, but reaction sites, broken bonds and so on were a l l discarded and a chemical/mechanistic basis was chosen for the reaction classific ation. The reactions studied were those which had been used by our Process Development Department for evaluation as manufactur ing processes. REACTION CLASSIFICATION. The reaction classification chosen was based on that used by the standard reference work, Organic Syntheses (11). The classes were simplified to suit the specific application. Twenty-one reaction classes were finally defined, many divided into further sub-classes. The object of the classification was to assign each stage of each reaction to only one category, unless i t was a complex reaction, in which case i t was assigned to as few categories as possible. Thus:
would be classed only as N-alkylation and not as a ring cleavage reaction, even though a ring i s cleaved. EVALUATING THE APPROACH. This simple approach worked reason ably well in the area to which i t was applied - a small number of reactions (about 900) for which there was no available standard
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
10. Ε A K I N A N D W A R R
Computerized
Aids to Organic
Synthesis
223
reference work. The system was slightly better than an equival ent published book since: (a) The CROSSBOW facilities could be used to interrogate the reactant and product information for classes of compound. (b) Desk-top tools could be generated to suit individual user needs, such as a molecular formula index of products, solvent indexes, etc.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
We did, however, feel the computer could do more to help the chemist and set about looking for a suitable method of supplementing the reaction classification approach, bearing in mind the p i t f a l l s of the f i r s t system. SEMI-AUTOMATIC REACTION ANALYSIS METHODOLOGY. We went back to think on how chemists them selves evaluate a possible synthetic route. We came to the conclusion that we needed a way to mark the reaction site information on the complete structure such that any part of the molecule can be accessed when necessary. We started off with the basic assumption that we would code the reactant and product information in WLN, hence whatever system we devised we could make use of the substructure search f a c i l i t i e s . We wanted to si^erimpose the reaction details on the reactant and product information in much the same way as a chemist would mark the reaction details on the structure diagr ams indicating the reaction pathway. For example: 0 0 H0^C-CF »NH -C-CF 3
2
3
We needed some way to indicate the reaction site and have devised a system based on the CROSSBOW connection table, which allows us to give a unique number for each node in the structure. For example, the above example gives us the following numbering: 35 0F « ι 6
HO-C-CTF
1 2
F.
i t ->
II
I
7
NL-C-C-F7
1
1 4
2p 2
s
and by quoting node numbers 1 and 2 i t i s possible to indicate simply the bonds fonned or broken and equally any rings formed or broken.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
224
COMPUTER-ASSISTED
ORGANIC SYNTHESIS
The above reaction could therefore be characterised as: -1,2 > +1,2
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
Adaptions to conventional atom-by-atom search programs enable us to algorithmically derive the reaction centre, and we have the f l e x i b i l i t y to make the reaction site as small or large as required by the question. Hence, i n the above example, the influence of the -CF^ group can be included or excluded. Similarly, on presenting the reactions as output from a search, we can represent them as the chemists would themselves. A simple modification to our existing structure display system enables us to indicate bonds formed and broken. EVALUATING THE SYSTEM. We are at present evaluating a system which combines the reaction classification approach with this semi-automatic site analysis. We are using the novel reactions as indicated in Index Chemicus and at the same time are evaluating the ICRS tapes as an automatic way of collecting the reaction and product information. ICRS particularly alerts novel reactions and syntheses, and the relevant abstracts can be found by searching the tapes or by scanning Index Chemicus. The f i r s t stage of the process i s to isolate a l l the WLNs given for the relevant abstract and to display the structures with relevant CROSSBOW connection table information. The total reaction i s examined in Index Chemicus and the reaction information added to the existing information and the appropriate class allocated. We have again used the Organic Synthesis (11) classification, but this time have l e f t i t essentially unmodified. Any relevant compounds not included in the ICRS tapes (because they did not represent novel compounds) are added to the reaction data base. At present we are building up an index of novel reactions reported i n 1975 and are pleased with the level of effort required to code any one reaction. The necessary modifications to the atom-by-atom search program and to the structure display program have been carried out. In the case of atom-by-atom search we have included the f a c i l i t y to indicate the bonds in the substructure which have been formed/broken. This means that a l l types of substructure can be searched for and any part of that sitostructure may be the reaction site. Both reactant and product molecules can be searched so that the following questions can be answered: (a) How can X be made?
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
10.
ΕAKIN A N D WARR
Computerized
Aids
to
Organic
Synthesis
225
(b) What happens when this bond i s broken i n Y? (c) What reactions have been used to convert group X into group Ζ i n the presence of Y? Modifications have been made to the structure display program. Here the bonds broken or formed are marked on the structure of the total molecule. Bonds formed are marked with a circle and bonds broken with a slash mark, so imitating a chemists normal presentation.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
CONCLUSIONS Research into the methods to be used i n reaction indexing and their adaptation to meet user needs has led to the develop ment of a potentially useful reaction indexing system. It uses techniques commonly available i n compound-oriented chemical information systems, with slight modifications, and yet presents data i n a form readily understandable by the chemists themselves. The resulting system should enable us to provide desk-top indexes, more up-to-date than the standard reference works and to combine these with more sophisticated computer methods. Hopefully, i n this way we w i l l improve the ways i n which chemists can find chemical information relevant to chemical reaction pathways.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
226
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010
LITERATURE CITED 1.
Hyde, E., Matthews, F.W., Thomson, L.H. and Wiswesser, W.J. - "Conversion of Wiswesser Notation to a Connectivity Matrix for Organic Compounds". J.Chem.Doc., V.7(4), p.200-204 (1967).
2.
Thomson, L.H., Hyde, E. and Matthews, F.W. - "Organic Search and Display Using a Connectivity Matrix Derived from the Wiswesser Notation". J.Chem.Doc., V.7(4), p.204-207 (1967).
3.
Hyde, E. and Thomson, L.H. - "Structure Display". J.Chem.Doc., V.8, p.138-146, 1968.
4.
Eakin, D.R. - "The ICI CROSSBOW System" in Ash, J.E. and Hyde, E. - "Chemical Information Systems". Horwood, 1975.
5.
Eakin, D.R., Hyde, E. and Palmer, G. - "The Use of Computers with Chemical Structural Information: ICI CROSSBOW System". Pesticide S c i . p.319-326, 1973.
6.
Harrison, I.J. and Harrison, S. - "Compendium of Organic Synthetic Methods". Wiley, Interscience, 1971.
7.
Valls, J . and Schier, O. - "Chemical Reaction Indexing" i n Ash, J.E. and Hyde, E. - "Chemical Information Systems". Horwood, 1975.
8.
Eakin, D.R. and Hyde, E. - "Evaluation of On-Line Techniques in a Substructure Search System" in "Computer Representation and Manipulation of Chemical Information", ed. Wipke, W.T., Heller, S.R., Feldman, R.J. and Hyde, E. Wiley.
9.
Harrison, J.M. and Lynch, M.F. - "Computer Analysis o f Chemical Reactions for Storage and Retrieval". Journal of the Chemical Society, p.2082-2087, 1970.
10.
Seddon, J.M. - "An Evaluation of Chemical Reaction Analysis and Retrieval Systems". University o f Sheffield M.Sc. Thesis, 1973.
11.
Organic Synthesis. Wiley.
Collective Volumes, Annual Volumes.
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
A Absolute space 98 Acessibility, steric 109 Acrylic molecule 89 Act interpretation theory 150 Action rules, situation106 Activity, compile-time 161 Activity, information-gathering 165 Acyclic structures, complete grammar for large subset of 67 Adamantane 211 Adding to the database 27 Addition, functional group 19 Advanced search model 148 Ajmaline, non-indole portion 8 ALCHEM language 105,197 Alcohols, tertiary 71 Aldol condensation 18 reaction 104 transform, excerpts for the 108 Algorithm(s) graph-matching 202 Morgan 132 repeated renumbering 135 sum 131,132 tree 131,132 Alkanols 66,70 1-Alkylmagnesium bromides 66 Alphanumeric CRT terminal 100 Analysis antithetic 2 (LHASA), Logic and Heuristics Applied to Synthetic 1 means-ends 124 reaction 219 retrosynthetic 2 semi-automatic reaction 223 AND-nodes 155 Andrenosterone, Uclaf synthesis of .... 109 Antithetic analysis 2 Antithetic reaction 194 Appendage-based strategies 15,16 Applicability criteria 120,150,167 Applicable transforms 171 Application of reaction sequences to mechanistic problems 208 Application of reaction sequences to a structure elucidation problem 211 Applying a transform 90, 200
Approach, total description Artificial intelligence system for chemical synthesis planning by computer Assignment methods, optimal Asterane, synthesis of Atom(s) chemistry of afixedset of equivalence between global ordering of key of a molecule, computerized procedures for numbering origin sequencing of Atom-by-atom search program Atomic coordinates, canonicalization based on Atomic descriptor Attention foci Β
BADLIST BE-matrices manipulation of tree of i-th diagonal entry of a Binary search techniques Binary search tree 20 Block data statements, FORTRAN Block structuring Blocking reactions 64 Bond(s) covalent cyclic, strategic disconnections for polycyclic targets fusion multiple 6 Boolean set Branch appendages Branched structures BREDT superatom Bredt's rule Bridgeheads 1-Bromoalkanes 66 Bubble sort Building blocks of the search management model Butadiene
229 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
230
COMPUTER-ASSISTED ORGANIC
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
η-Butane, derivation of η-Butane, representations of 2-Butanone-3-ol teri-Butoxide ion Byproducts
SYNTHESIS
65 Chemistry (Continued) of afixedset of atoms 35 70 geometric and group-theoretical 136,137,143 aspects of constitutional 45 197 mathematical model of 83 constitutional 47 packages of LHASA 17 C CHEMONICS program 92 141 Cahn-Ingold-Prelog procedure 130 Chirality files 107 Canonical indexing 48 CHM Canonical ordering 48, 58 CHMTRN—Chemical Data Base Langauge 22 Canonical value, initial 131,137,140 CICLOPS 47,52 Canonicalization based on atomic 35, 203 coordinates 135 Classes, chemical equivalence Classical syntheses 105 Canonicalization of the molecular structure representation 128 Classification of chemical reactions 44,220 Camp's pyramid 89 e-Caprolactam 94 Classification of R-matrices, generation and 54 Carbon site, functionality of a 86 Code, linear symbolic 106 Carbonium ion rearrangements 195 Coded reactions 182,184 ^-Carotene, syntheses of 122 Compile-time activity 161 Catalog of starting compounds 173 Command(s) Categories of pharmaceutical CHAIN and ATNAME 199 synthetic analysis 181 EDITSTRUC 199 Cathode ray tube 199 HRANGE 199 CCTBU 199 iteration 23 CHAIN and ATNAME commands .... 199 TRANSFORM 199 Chaining 70 Common reaction sequence level 104 Character of a site 86 Complete grammar for large subset Chemical Abstracts 218 of acyclic structures 67 Chemical(s) Compounds, catalog of starting 173 constitution 36 Compression method 50 data base in LHASA 17 Computer(s) detecting environmentally active .... 54 -assisted structure elucidation 188 distance 46, 54 -assisted synthesis 191 -assisted synthetic analysis in engineering view of reaction path drug research 179 synthesis 81 and chemistry 33 equivalence classes 35 program(s) 149 fitting 39 for deductive solution of chemical flow chart 20 problems 33 grammar 64, 66 modular system of 47 history 189 Computerized aids to organic synthe problems, computer programs for sis in pharmaceutical research .... 217 deductive solution of 33 problems, solution of 34, 45 Computerized chemical reaction collection 182 reaction(s) 38 classification of 44 Computerized procedures for num bering atoms of a molecule 131 documentation systems for 57 49 predicting the products of 53 Concatenation 18 proposed computerized 182 Condensation, aldol 72 synthetically important organic .. 41 Condensation reactions 106 synthesis planning by computer ... 148 Conditions reaction 85 synthesis: SECS—simulation and evaluation of 97 CONGEN, inability to use stereo chemical information 215 transform(s) 104,189 188 translator 22 CONGEN program CONGEN reaction tree, expansion Chemistry of the 193 computers and 33
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
231
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
INDEX
Congeners to lanosterol, cyclization of squalene epoxide and 195 Connection table, CROSSBOW 223 Connection tables 136 Constraint(s) GOODLIST 206 product 200 reaction site 196,199 structural 206 Constructive graph labeling 203 Contextdependent grammar 11 free language 67 free productions 64 sensitive languages, workspace theorem of 72 sensitive productions 63 Control by electronic effects 112 Controlling tree growth through strategies and plans for transform selection 120 Corey's longifolene synthesis 110,121 Coriolin 211 Cost function, simple 82 Covalent bond 36 CROSSBOW 219 connection table 223 CRT terminal, alphanumeric 100 Cycle finding 191 Cycles, set of 8 Cyclic strategic bonds 10 Cyclization of squalene epoxide and congeners to lanosterol 195 Cyclobutane, numbering atoms of 141 1,1-Cycloheptanediol 205 Cyclohexane, numbering atoms of .... 141 Cyclohexanone 99 Cyclohexene 203 Cyclohexen-3-ol 203 Cyclohexen-4-ol 203
D Database adding to the 27 Language, CHMTRN—CHEMICAL 22 LHASA 4 organization of 25 qualifiers 29 requirements 22 Data lists, linked 90 table of recognition process 13 tables in LHASA 11 Deblocking reactions 64, 72 DEC GT40 graphics terminal 100 Deductive solution of chemical prob lems, computer programs for 33
Defining the synthetic problem 179 Definitions of reaction conditions 116 Dehydrochlorination reaction 197 Demasking of masked functional groups 72 Derwent Chemical Reactions Documentation Service 182,186 Derivation 63 of ethyl bromide 67 of n-butane 65 Description of a chemical transform .. 104 Description of SECS system 99 Description system ( MDS ), meta 151 Descriptor, atomic 48 Design, pilot program for synthetic ... 47 Designing a potential drug 217 Dethiacephalosporins 181 Development indexing and pruning of the reaction sequence tree 205 Dictionary, reaction 78 Diels-Alder addition 19 Diels-Alder transform 20, 21,119 Digitizing data tablet 5 1, l-Dimethyl-3-trichloromethylcyclohexane 129,130 Direct access to structures and substructures 49 Directory set 26 Disconnection, Robinson 28 Discrete dynamic programming 92 Distance, chemical 46, 54 Documentation systems for chemical reactions 57 Double coset problem 203 Dreiding model 103 Drug research, computer-assisted synthetic analysis in 179 Dummy neighbors 49 Duplication among products of a reaction 201 Duplication, symmetry 202 Ε EDITSTRUC 196,197,199 Elaboration of Reactions for Organic Synthesis (EROS) 47 Electron relocation by R-matrices, representation of 38 Electronic effects 110,112 Electrophilic localization energy Ill Elementary functional group 138,142 ELENERGY Ill ^-Elimination 44 Energy, electrophilic localization Ill Energy hypersurface 34 Ensemble formula, partitioned empirical 35 Ensembles of molecules, isomeric 35
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
232
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Functional group Enumeration of synthesis tree, addition 19 order of 92 -based strategies, opportunistic 3 Environmentally active chemicals, -based transforms 17 detecting 54 elementary 138,142 Equivalence interchange 18,106 between atoms 129 list of 2-butanone-3-ol 143 class 203 manipulations 184 classes, chemical 35 modifications 123 EROS ( Elaboration of Reactions for perception 11 Organic Synthesis ) 47, 52 to reaction conditions, sensitivity of 114 EROS, synthetic design program 52 reactivity 14 Ethyl bromide, derivation of 67 switching reactions 64, 72, 75 Ethyl crotonate 61 86 "Eureka syndrome" 97 Functionality of a carbon site 9 EVAL module 119 Fusion bond Evaluation of chemical synthesis : SECS— G simulation and 97 function, heuristic 153 Geminal dimethyl substitution 28 of industrial reactions 82 General problem solver system 169 node 153 Generation and classification EVLTRN (Evaluate Transform) 23 of R-matrices 54 Exhaustive enumeration 55 Generators, reaction 53 Expansion of the CONGEN Geometric and group-theoretical reaction tree 193 aspects of constitutional Explicit goal representation 122 chemistry 45 External addressing structure 26 Global ordering of atoms 130 part structures 62 F symmetry operations 129 18,120 Facilities, substructure search 218 Goal(s) -directed selection of transforms ... 120 Failednodes 161 list, logically structured 121 Family of isomeric ensembles of representation, explicit 122 molecules 35 situation 152 File(s) 206 CHM 107 GOODLIST constraint 62,64-67 RESTART 100 Grammar(s) chemical 66 structure, reaction 57 types of 64 unordered sequential source 107 context-dependent 11 Finite state machine 66 simple synthetic 70 First sphere 57 structural 65 Fischer indole synthesis 220 Graph Fixed set of atoms, chemistry of a 35 labeling, constructive 203 FLEXI, aflexibleadvanced search matching 191 model 170 algorithm 202 Flexible advanced search model, problem solving 175 FLEXI, a 170 as a minimal search model 153 Flow chart, chemical 20 small reaction 76 FNODE 171 Graphics 5 Formal languages, organic chemist's terminal, GT42 186 view of 60 Grignard addition 64 Formula, ensemble 35 Group FORTRAN 4,22,100 addition, functional 19 Forward-branching search strategy .... 92 interchange, functional 18,106 Fragmentations, mass spectral 195 origin 14 Free valences 212 perception, functional 11 Function, heuristic evaluation 153 reactivity, functional 14
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
INDEX
233
Group (Continued) -theoretical aspects of constitu tional chemistry, geometric and
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
H
Κ
45
Key atoms Keywords, CHMTRN
203 22
L
HAREHM module I l l Language(s) 62,64,76 Hendrickson's triangle 89 ALCHEM 105,197 Heuristic(s) CHMTRN—chemical data base .... 22 Applied to Synthetic Analysis context-free 67 (LHASA), Logic and 1 EDITSTRUC .? 197 evaluation function 153 organic chemists' view of formal .... 60 search method 152,173,175 workspace theorem of context search, theory of 155 sensitive 72 n-Hexane 61 Lanosterol, cyclization of squalene Hierarchically organized substructure epoxide and congeners to 195 files 51 Lead analog program 181 HONEYWELL-BULL system 100 LHASA (Logic and Heuristics HRANGE command 199 Applied to Synthetic Hyperstructures 144 Analysis) 1,4,17,47 Hypersurface, energy 34 data tables in 11 Hypersurface of syntheses 97 global view of 4, 5 interpreter 23 perception routines 7 I Line notation, Wiswesser 131 60 IBM-360/67 computer 94 Linear assays, one-dimensional 106 IBM 370 computer 100,136,186 linear symbolic code 90 Impurity reactions 83 Linked data lists 200 Indexing, canonical 48 Linknodes LISP organic synthesis program 70 Indexing and pruning of the reaction sequence tree, development 205 Local symmetry operations 129 Industrial reactions 82,83,92 Localization energy, electrophilic Ill Informationflow,network of 161 Logic and Heuristics Applied to Syn Information gathering 155,165 thetic Analysis (LHASA) .. 1, 4,17,47 Information retrieval application 182 Logical connectives 123 Initial canonical value 131,137 Logically structured goal list 121 Inner loop 152 Longifolene synthesis, Corey's 110,121 Input, structural 6 Interchange, functional group 18 Interconversion map 211 M Interconversions 209 Machine,finitestate 66 Interfering functionality and protect 71 ing groups, recognition of 112 Machines, Turing Inter-machine manipulation 50 Manipulation(s) of BE-matrices 48 Interpreter, LHASA 23 functional group 184 Intra-machine manipulation 50 inter-machine 50 Iodo-lactonization 19, 21 intra-machine 50 Irreducible R-matrix 40, 58 Isomeric ensembles of molecules 35 Map, interconversion 211 Isomerism 35 Mapping 46 Isonootkatone, Marshall's synthesis of 28 Marshall's synthesis of isonootkatone 28 Iteration 179 Masked functional groups, demasking command 23 of 72 Mass spectral fragmentations 195 MATCHEM 47 J Mathematical model of constitutional chemistry 47 JOIN command 199
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
234
Mathematics, solution of chemical problems on the basis of theoretical physics and 34 Matrices, BE36 Matrix reaction 38 MATSYN 47 Means-ends analysis 124 Mechanistic problems, application of reaction sequences to 208 Mechanistic studies 189 Memory, rote 91 Merck chemical structure information system 184 Mesa phenomenon 170 Meta description system (MDS) ... 151,157 Method, heuristic search 175 Methods, optimal assignment 56 6«-Methyl-17a-acetoxyprogesterone .. 136 Michael addition 25 Minimal chemical distance, determination of 54 Minimal search model, problem-solving graph 153 Model advanced search 148 builder module 103 building blocks of the search management 171 FLEXI, a flexible advanced search . 170 molecular orbital (MO) 110 problem-solving graph as a minimal search 153 search 175 management 148 Modelling, chemical reaction sequences used in molecular 188 Modelling, search 148 Module(s) EVAL 119 HAREM Ill model builder 103 of the SECS system 102 strategy 123 SYMIN 109 Modular system of computer programs 47 Molecular modelling, chemical reaction sequences used in 188 orbital (MO) model 110 structure representation, canonicalization of the 128 structure, target 152 symmetry, redundancy from 119 systems 34 Molecule(s) computerized procedures for numbering atoms of a 131
Molecule(s) (Continued) editor options listed by HELP command 102 isomeric ensembles of 35 multiple target 83 and reactions, representation of 86 small target 83 as strings of symbols 60 target 2,72,98 Morgan algorithm 132,134 Multiple bonds 6, 61 Multiple target molecules 83 Ν
Name reaction level Network of information flow Network-oriented strategies Networks, reaction New problems NLENERGY Node evaluation Node, nonstructural Non-radical reactions, R-matrices of some typical Nonstructural node Notation, n-tuplet structural Novel total synthesis Null strategy Numbering the atoms of a molecule, computerized procedures for Nylon monomer intermediates
104 161 121 83 82 Ill 153 61 42 61 78 105 120 131 94
Ο
OCSS-LHASA One-dimensional linear assays One-group transforms Opennodes Operating with R-matrices Operators, substitution Opportunistic functional group-based strategies Opportunistic strategy Optimal assignment methods Order of enumeration of synthesis tree Ordering, canonical Organic chemical reactions, synthetically important Organic chemist's view of formal languages Organic synthesis EROS in a pharmaceutical research com pany, computerized aides to .... program, LISP programs, rapid generation of reactants in
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
99 60 17 161 52 120 3 120 56 92 58 41 60 1 47 217 70 128
235
INDEX
Organization of the database Origin atom Origin, group OR-nodes
25 11 14 155
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
Ρ
Packages, ring transform 28 Parsing 67 Partitioned empirical ensemble formula 35 Path finding 191 Path, reaction 57 PDP-10, 11, 20 51,100 Penicillanic acid 181 Perception 7 functional group 11 routines, LHASA 7 second-order 103 Petrochemical syntheses 92 Pharmaceutical research company, computerized aids to organic synthesis in a 217 Pharmaceutical synthetic analysis, categories of 181 Phenomenon, Mesa 170 Plan, skeletal 170 Planning space 148,167 and state space, combining search in 170 Polyfunctional polycyclic synthesis .... 99 Polyname 205 Potential drug, designing a 217 Power, predictive 86 Precursors 98 Predicting chemical reaction products 53 Predictive power 86 Predictor systems 53 Problem(s) application of reaction sequences to a structure elucidation 211 double coset 203 functional group switching 75 separation 81 solving graph 175 as a minimal search model 153 with products 81 Product(s) constraints 200 predicting chemical reaction 53 problems with 81 of a reaction, duplication among .... 201 Production(s) context-free 64 context-sensitive 63 regular 64 rule form 165
Program ( s ) atom-by-atom search CHEMONICS descriptions lead analog LISP organic synthesis maintenance noninteractive synthesis rapid generation of reactants in organic synthesis REACT structure display synthesis Programming, discrete dynamic Programming, steps of the sum algorithm for Proposed computerized chemical reaction collection Prostaglandin synthesis Protecting groups, recognition of interfering, functionality and Pruning of the reaction sequence tree, development indexing and .. Pruning, tree reduction through Pseudo rings PSG model, rule set for the PSGRAPH Pyramid, Camp's
224 92 182 181 70 96 130 128 92 224 140 92 145 182 31 112 205 119 9 166 157 89
Q Qualifiers, data base
29
R
R-category 40, 41,57 R-transformation 39 Random search 179 Rapid generation of reactants in organic synthesis programs 128 REACT program evaluation 93 input/output 94 molecule representation 93 program features 94 reaction representation 93 search direction 93 search strategies 93 Reactants in organic synthesis pro grams, rapid generation of 128 Reacting molecule, symmetry of the .. 203 Reaction(s) Aldol 104 analysis 219 antithetic 194 blocking 64,72 carrying out the 200 center, details of the 221
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
236
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
Reaction ( s ) ( Continued )
chemical 38 classification system 220, 222 coding sheet 182 conditions 85 definitions of 116 sensitivities of functional groups to 114 core 57 deblocking 64,72 definition 196 dehydrochlorination 197 dictionary 78 discovering new 105 duplication among products of a .... 201 evaluation of industrial 82 file structure 57 functional group switching 64, 72 generators 53 graph, small 76 impurity 83 indexing system 225 level, name 104 matrix 38 networks 83 for Organic Synthesis), EROS (Elaboration of 47 path(s) 57 applications to industrial 92 synthesis 83 chemical engineering view of 81 representation of molecules and 86 retriever 179,184 sequence(s) level, common 104 to mechanistic problems, application of 208 to a structure elucidation problem, application of 211 to structure elucidation, relationship of 194 tree 205 site constraints 196,199 transform 199 types 44 Reactivity, functional group 14 Real rings 9 Recognition of interfering functionality and protecting groups 112 Recognition questions, sample 13 Reduction through greater chemical accuracy, synthesis tree 108 Reduction through symmetry, tree .... 118 Redundancy from molecular symmetry 119 Reference 106 Regular language 76 Regular productions 64
Relationship of reaction sequences to structure elucidation 194 Relevance criteria 150 Relevant transforms 167,171 REPAS case study using Hendrickson's representation 89 Repeated renumbering algorithm 135 Representation ( s ) of n-butane 70 of electron relocation by R-matrices 38 level, ab initio
104
of molecules and reactions 86 substructure 90 RESTART file 100 Retriever vs. reactions for a synthesizer, reactions for a 184 Retrosynthetic analysis 2 Retrosynthetic tree 102 Ring appendages 16 Ring-forming transforms 19 Ring perception 8, 9 Ring transform packages 28 RLENERGY Ill R-matrices 40,57,58 generation and classification of 54 operating with 52 representation of electron relocation by 38 of some typical non-radical reactions 42 RNODE 171 Robinson annelation 19 transform 24 Robinson disconnection 28 Root node 153 Rote memory 91 Routes, synthetic 1 Routines, LHASA, perception 7 Rule form, production 165 Rule set 148 Rules, search guidance 165 S
Sample recognition questions Search guidance rules management model method, heuristic model FLEXI modelling in a planning space random space, structure of the strategies Second-order perception
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
13 148 165 148,171 152 152,175 170 148 167 179 167 91,92 103
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
INDEX
237
SECS—simulation and evaluation of chemical synthesis 97, 99,186 commands 100 modules of the 102 Selection by applicability 120 Semi-automatic reaction analysis 223 Sense definitions 161 Sensitivities of functional groups to reaction conditions 114 Sentential form 65 Separation problems 81 Sequencing of atoms 49 Serial numbers of elementary functional groups 138 Set, Boolean 26 Set of cycles 8 SHOW command 200 Similarities to and contrasts with computer-assisted synthesis 191 Simmons-Smith reaction 19 Simple cost function 82 Simulation and evaluation of chemical synthesis ( SECS ) program .... 186 Site, functionality of carbon 86 Site, reaction 196,199 Situation-action rules 106 Skeletal plan 170 Small reaction graph 76 Small target molecules 83 SNODE 171 Software subroutines 24 Solution of chemical problems 34, 45 Sort, bubble 146 Sourcefiles,unordered sequential 107 Space, absolute 98 Space, planning 167 Space, state 150,167 Squalene epoxide and congeners to lanosterol, cyclization of 195 Start symbol 63 Starting compounds, catalog of 173 State space 150,167,170 Steps of the sum algorithm for programming 145 Stereochemical canonical value 140 graph isomorphism groups 118 information, CONGEN, inability to use 215 relation tables 136 Stereochemistry 6 Steric accessibility 109 Steric effects 108 Stoichiometry 83 Straight-chained structures 61 Strategic bonds, cyclic 10 Strategy (ies) 3 appendage-based 15,16
Strategy (ies)
(Continued)
based on structural features 3 forward-branching search 92 module 123 network-oriented 121 null 120 opportunistic 120 and plans for transform selection, controlling tree growth through 120 search 91 symmetry-based 122 Strings of symbols, molecules as 60 Structural constraints 206 descriptions of problem solving graph model 157 grammar 65 input 6 modifications 123 notation, n-tuplet 78 Structure(s) branched 61 complete grammar for large subset of acyclic 67 display program 224 elucidation, computer-assisted 188 elucidation problem, application of reaction sequences to a 211 elucidation, relationship of reaction sequences to 194 external addressing 26 global part 62 problems 188 reaction file 57 of the search space 167 straight-chained 61 and substructures, direct access to .. 49 target molecular 152 Structuring, block 23 Studies, mechanistic 189 Subgoal 18 generation 124 transforms 30 Subroutines, software 24 Substitution, geminal dimethyl 28 Substitution operators 120 Substructure(s) 106 direct access to structures and 49 files, hierarchically organized 51 numbering of synthetically interesting 141 representation 90 search facilities 218 Sum algorithm ( s ) 131,132,136 for programming, steps of the 145 Summary reaction classification 220 Superatom, BREDT 200
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
238
COMPUTER-ASSISTED ORGANIC SYNTHESIS
Superatoms 188 Symbol(s) 63 molecules as strings of 60 start 63 terminal 64 Symbolic representation 175 Symbolization 175 SYMIN module 109 Symmetry -based strategies 122 duplication 202 operations 129 of the reacting molecule 203 redundancy from molecular 119 transform 119 tree reduction through 118 SYNCHEM 47,149 SYNCOM 107 Syndrome, Eureka 97 Syntheses of β-çarotene 122 classical 105 hypersurface of 97 petrochemical 92 Synthesis of andrenosterone, Uclaf 109 of asterane 118 chemical engineering view of reaction path 81 Corey's longifolene 110,121 Fischer indole 220 industrial reaction path 83 novel total 105 organic 1 planning by computer, artificial intelligence system for chemical 148 polyfunctional polycyclic 99 program(s) completeness and accuracy of a .. 99 noninteractive 130 prostagladin 31 reverse salami 73 SECS—simulation and evaluation of chemical 97 similarities to and contrasts with computer-assisted 191 tree 2,3,92,193 reduction through greater chemical accuracy 108 types of pharmaceutical 181 Synthesizer 179 reactions for a retriever vs. reactions for a 184 Synthetic analysis in drug research, computer-assisted 179,181
Synthetic
(Continued)
Analysis (LHASA), Logic and Heuristics Applied to design, pilot program for design program EROS grammar, simple pathways, tree of problem, defining routes generated by Diels-Alder transform Synthetically important organic chemical reactions Synthetically interesting substructures, numbering System for chemical synthesis planning by computer, artificial intelligence description of SECS general problem solver less labor-intensive modules of the SECS molecular predictor reaction classification reaction indexing
1 47 52 70 47 179 1 20,21 41 141
148 99 169 221 102 34 53 222 225
Τ Table translator 22 Tables, connection 136 Target 23 molecular structure 152 molecule 2,72, 83, 98,191 TBLTRN 22 Techniques, binary search 28 TELENET 100 Teletype terminal 100 Terminal alphanumeric CRT 100 symbols 64 vocabulary 63 Tertiary alcohols 71 Tetrahydrodicyclopentadiene 211 Text searching facilities 217 Theilheimer's series of volumes 182 Theoretical physics and mathematics, solution of chemical problems on the basis of 34 Theory of heuristic search 155 Transform(s) 2,14,105,196 by applicability, selection of 167,171 applying 90,200 chemical 189 description of a chemical 104 Diels-Alder 119 excerpts from the Aldol 108
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
INDEX
239
Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001
Transform ( s ) ( Continued )
functional group-based 17 goal-directed selection 120 level and completeness 105 one-group 17 reaction 199 relevant 167,171 ring-forming 19 ring-oriented 19 Robinson annelation 24 selection 123 selection, controlling tree growth through strategies and plans for 120 subgoal 30 symmetry 119 two-group 17 TRANSFORM command 199 Tree algorithms 131 approach, illustration of 132 development indexing and pruning of the reaction sequence 205 expansion of the CONGEN reaction 193 expansion of the synthesis 193 growth through strategies and plans for transform selection, controlling 120 reaction sequence 205 reduction through pruning 119
Triangle, Hendrickson's 2,2,4-Trimethylhexane n-Tuplet structural notation Turing machines Two-group transforms TYMNET
89 61 78 71 17 100
U
Uclaf synthesis of andrenosterone UNIVAC 1108 system UNJOIN command Unordered sequential source files
109 100 199 107
V
Valences, free Vocabulary, terminal
212 63
W
Wiswesser line notation 131 WLN (Wiswesser Line Notation) .... 220 Workspace theorem of context sensitive languages 72 Y
Yield
In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
83