This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
,x)>),2). w.ight(P(l(x,i(y,x))),2). wight(P(i(i(i(x,y),z),i(y,z))) > 2). w.ight(P(i(x,i(i(x,y),y))),2). wltfit(p(i(i(x,i(y,z».i(y > i(x.z»)),2). »lglit(P(i(i(x,y) > i(i(z,x),i(z,y)))),2). w«igiit(P(i> ,u) ,i(i(y.l<x,z)) ,u») ,2) . «lght i(n(y).D<x)))),2). w«i^it l(y.n(x)))),2). wight(P(i(i(n(x) ,y) ,i(n(y) ,x))) ,2). w.ight ,n(y)).i(y,x)>) > 2). w»ifht(P(l(i(i(n(x),y).z).i(i(n(y) t x),z))),2). w»ight(P(i(i(x,i(y,z)).i(x,l(n(z),n(y))))),2). wi«Jit(P(i(i(x > i(y,B(z»).i(x.l(z > n(y))))),2). v.igbt(P I $*MS(nag.th.47). X -P(iU(n(p),q),i(n(q>,p») I *AMS(nag.th_48). -P(i(l(n(p),n(q)),i(q.p») I *ANS(nag_th_49). X -P(i(l(i(n(q).p),r),i(i(n(p),q),r))) I *MIS(nag.th.80). X -P(i(l(p>i(q,r)),i(prl(n(r)>n(q)))» I $ANS(nag.th.81). X -P(i(i(p.i(q.n(r))),i(p,i(r>n(q))))) I *ANS(nag.th.S2). X -PUU(n(p),q),iU(p,q).q))) I *AMS(nag.th.S3). -P(l(i(p,q),i(l(n(p),q),q))) I $ANS(nag_th_64). X -P(i(i(p,q),i(i(p.n(q)),n(p)))) I *ANS(nag.th.5S). X -P(i(i(i(i(p,q).q),r),i(i(n(p),q).r))> I $ANS(nag_th_S6). X -P(l(i(n(p),r),i(i(p,q).i(l(q,r),r)))) I $AHS(nag_th_57). X -P(i(i(i(i(p.q).i(l(q.r).r)),a),i(i(n(p),r),a))) I $ANS(nag_th_88). -P(i(l(n(p),r),i(i(q,r),i(i(p.q).r)))) I $ANS(nag.th.59). -P(i) I $AMS(nag_th_6S). X -P(i(n(i(p,q)),p)) I «AKS(nag.th_66). X -P(i(n(i(p,q)),n(q)>) I $AKS(nag_th.67). X -P(i(n(i(p,n(q))),q)) I $AHS(nag.th.68). X -P(i(p,i(n(q),n(l(p,q))))) I $JUfSCnag_tn_69). X -P(i(p,i(q,n(i(p,n(q)))))) I $ANS(nag_th_70). X -P(n(i(i(p,p),n(i(q,q))))) I $AKS(nag.th_71). and_of_liat. X X X X X X X ,q),q)» I -P(i(i(p,i(p,q)>,i . 25 0 - P ( i ( x , y » I -P(x) I P(y). 26 n P ( i ( i ( x , y ) , i ( i ( y . z ) , i ( x , z ) ) ) ) . 27 t] P ( i ( i ( n ( x ) , x ) . x ) > . 28 □ P ( l ( x . l ( n ( x ) . y ) ) ) . 29 33 34 35 36 37 38 39 40 41 43 44 45 47 50 S3 55 56 63 70 78 81 83 87
1473
1474
Collected Works of Larry Wos
X walgut(P(l(x,i(y,n(i(x.n(y)))))).2). X walght(P(n(i(i(x,x),n(i(y,y))))>.2>. and_of_list. X Usad to coaplota application! of infaranca rulaa. liat(uMbla). X Tha following clausa ia uaad with hyparrasolution for condanaad datacnaant. -P(i(x,y» I -P(x) I P(y). X Tha following dlajunctions, axcapt thosa mantioning Scott, X ara tha nagatlon of known axiom systaas. -P(i(p,i(q,p))) I -PU(i(p,i(q,r)),i)) I -P(i
-P(i(p,p>) I -P(i(p.i(q.p)>) I -P(i(i(p.i(q.r)).i(q,i(p.r)») I -P(i(i(i(p,q),p),p)> I -P(i(i(p,i(q,r)).i(i(p,q).i(p,r)))) I -P(i(n(n(p)),p)) I -P(i(p.n(n(p)))) I -P(i(i(p,q) ,i(n(q),n(p)))) I »AHSWHKstap_allSeott_orig_16.18_21_24_36_39_40_46).
X -PUCp.p)) I -P(i(p,i(q,p))) I -P(i(i(p,i(q,r)),i(q,l(p,r)))) I X -P(iCl(i(p.q),p),p)> I -P(l(l(p,i(q,r)),i(i(p,q).i(p,r)))) I X -PCi(n(n(p)),p)> I -P(i(p,n(n(p)))) I -P(i(i(n(p),n(q» ,i(q,p))) I X *«IBWER(stap.allSeott.orig0.16.18.21_24.35.39.40.49). and.of.list. X Uaad to initiata applications of infaranca rolas. list(sos). X Tha following thraa ara Luka, 1 2 3. P(i(l(x,y).i(i(y,x).l(x.2)))). P(i(i(nCx),x),x)). P(i(x,i(n(x),y))). and.of.list. X Usad BSinly to datact proof completion and to monitor prograss. llst(pusiva). X -PU(i. X -P(i(i(p,i(q,r)),i(i(s.q),i(p,i(s,r))))) I $AHS(nag_th_06). X -P(i(i(p,q),i(i(i(p.r),s)>i(i(q>r).s)))) I $AMS(nag_th.06). X -P(i(i(t.i(l(p(r).s)),i(l(p,q).i(t,i(i(q,r),s))))) I »JUIS(nag_th_07). X -P(i(i(q,r)ll(i(p>q).l(i(r,s),l(p,s))))) I $JUIS(nag_th_08). X -P(iCi(i(n(p),q).r).i(p,r)>) I $ANS(nag.th_09). X -P(iCp,i(i(i(n(p),p).p),i(i(q,p).p)))) I «ANSCnag_th_10). X -P(i(i(q.i(i(n(p),p),p)),i(i(n(p),p),p))) I $JWS(nag.th.ll). X -P(i(t,i(l(n(p).p),p))) I »A*S(nag_th_12). X -P(l(i(n(p),q),i(t.i(i(q,p),p)))) I »AMS(nag_th_13). X -P(i(i(i(t.i(i(q,p).p)),r),i(i(n(p).q),r))) I »ANS(nag_th_14). X -P(i(i(n(p),q),i(i(q,p),p))) I »AMS(nag.tb_15).
The Power of Combining Resonance with Heat
X -PU(p.p)) I *ANS(nag_th_16). X -PUCp,iUCq,p),p>)> I $AMS(nag_th_17). -PU),i(q,l(p,r)))) I *ANS(nag.th.21). -P(i(i(q.r).i(i(p,q)>l(p,r)))) I $ANS(nag_th_22). X -P(l(i(i(q,i(p.r))la),l(l(p,i(q.r)).a))) I $ANS(nag_th_23). X -P(i(i(i(p,q),p),p)) I tAMS(nag_th_24). X -P(i(i(i(p,r),i),l(l(p.q),i(i(q.r),()))) I *ANS(nag_th.2S). X -P(i(i(l(p,q),r),i
Following list can bs usad to purga unvantad fornulas of various typaa. liat(damodulatora). (n(n(x)) - Junk). (n(n(n(x))> - Junk). (i(i(x,x),y) - Junk). (i(y,i(x.x)) - junk). (i(n(i(x.x)).y) - Junk).
1473
Collected Works of Larry Wos
1476
X X X X X X
(Ky.n(Kx.x))) - junk). (Kjunk.x) - Junk). (i(x,Junk) • junk).
liat(hot). X Tha following clausa ia uaad with hyparrasolution for condansad datachaant. -PU(x.y)) I -P(x) I P(y). X Tha following thraa ara Luka, 1 2 3. P(i(i(x.y),i(i(y,z),i(x.z)))). P(i(i(n(x),x),x)). P(i(x.l(n(x),y))). and.of.llst.
A 31-Stap Proof of the Church A x i o m S y s t e m
Ottar 3.0.2b*. lug 1994 Tha job vaa atartad by woa on altair.mca.anl.gov, Fri Mar 31 15:04:46 1996 Tha coaaund waa "ottar302c". > EMPTY CLAUSE at 1.65 aac > 77 [hypar,4,49,71,46] *MSVER(atap_allBEH_Church_FL_18_3S_49).
Langth of proof ia 21.
Laval of proof la 15.
PHOOp 1 [] -P(i(x,y)> I -P(x) I P(y). 4 [] -P(i(p,i(q,p)>) I - P ( i ( i ( p , i ( q , r ) ) , i ( i ( p , q ) , i ( p . r ) ) ) ) I *ANSUEX(atap_allBER_Church_FL_18_35.49). 8 □ PU(l(x.y),i(i(y,z).i(x,z)))). 9 □ P(i(i(n(x),x).x>). 10 □ P ( i ( x , i ( n ( x ) , y ) ) ) . 28 [] - P ( i ( x , y ) ) I -P(x) I P(y). 26 □ P ( i ( i ( x , y ) . i ( i ( y . x ) , l ( x , z ) ) ) ) . 27 □ P ( l ( l ( i ( i ) , i ) , t » . 28 □ P ( i ( x , i ( n ( x ) , y ) ) ) . 29 33 34 35 36 37 38 39 40 42 46 46 48 49 51 53 56 60 65 69 71 77
I -P(i(i(n(p) ,n(q)) . i ( q . p ) ) )
[hypar,1.8,8] P ( i ( i ( i ( i ( x , y ) , i ( z , y ) ) . i : ) , i ( i ( z , x ) ,u))) . [hypar. 1,8.10] P ( i ( i ( i ( n ( x ) ,y) , x ) , i ( x , « ) ) ) . [hypar,1,10,9] P ( l ( n ( i ( i ( n ( x ) , x ) , x ) ) , y ) ) . (haat-1) thypar,26,26,34) P ( i ( l ( x , y ) , i ( n ( i ( i ( n ( x ) , x ) , z ) ) , y ) ) ) . [hypar.1,29,29] P ( i ( l ( x , i ( y , x ) ) , i ( i ( u , y ) , i ( x , i ( u , z ) ) ) ) ) . (haat-1) Chypar.25.36,27] P ( i ( i ( x , y ) . i ( i ( n ( i ( y , z ) ) , i ( y , z ) ) , i ( x . z ) ) ) ) . [hypar,1,33,35] P ( i ( x , l ( n ( l ( i ( n ( y ) , y ) , y ) ) , z ) ) ) . [hypar,1,36,37] P ( i ( l ( x , i ( n ( l ( y , z ) ) , i ( y , z ) ) ) , i ( i ( u , y ) , i < x , l ( u , z ) ) ) ) ) . [hypar.l.39,38] P ( i ( l ( x , i ( n ( y ) , y ) ) , i ( z , i ( x , y ) ) ) ) . [hypar,1,29,40] P a ( i < n < x ) , y ) , i ( x , i ( i ( y , x ) , x ) ) ) ) . [hypar,1,39,42] P ( i ( i ( x , i ( y , z ) ) , i ( i ( n ( z ) , y ) , i ( x , z ) ) ) ) . (haat-1) [hypar,26,46,28] P ( i ( i ( n ( x ) , n ( y ) ) , i ( y , x ) ) ) . [hypar,1,36,45] P ( i ( i ( x , i ( n ( y ) , z ) ) , i ( l ( u , i ( z , y ) ) , i ( x , i ( u , y ) ) ) ) ) . [hypar,1,33,46] P ( i ( x , l ( y , x ) ) ) . (haat-1) [hypar,25,26,49] P ( i ( i ( i ( x , y ) , z ) , i ( y , z ) ) ) . [hypar,1,29,48] P ( i ( l ( n ( x ) , y ) , i ( i ( z , i ( u , x ) ) . i ( i ( y , u ) , i ( z , x ) ) ) ) ) . [hypar,1,51,46] P ( i ( n ( x ) , i ( x , y ) ) ) . [hypar,1,45,55] P ( i ( l ( n ( x ) , y ) , l ( n ( y ) , x ) ) ) . [hypar,1,60,55] P ( i ( n ( i ( x , y ) ) , x ) ) . [hypar,1,63,65] P ( i ( i ( x , i ( y . l ( z , u ) ) ) , i ( i ( z , y ) , i ( x , i ( z , u ) ) ) ) ) . (haat-1) [hypar,26,69,26] P ( i ( i ( x , i ( y , z ) ) , i ( i ( x , y ) , i ( x , z ) ) ) ) . [hypar,4,49,71,46] tANSVER(atap_allBEH.Church_FL.18_3S_49) .
The Power of Combining Resonance with Heat
A 2 8 - S t e p P r o o f o f t h e Frege A x i o m S y s t e m Otter 3.0.2b+, Aug 1994 Tha job vaa atartad by wo» on altalr.mca.anl.gov, Wad Mar 22 11:12:34 1995 Tha command waa "otter302c". > EMPTY CLAUSE at 1.96 aac > 82 [hypar,2,44,72,60,67,77,46] $ANSVER(atep_allFrege_18_35.39_40_46.21). Langth of proof ia 28.
Level of proof ia 17.
PHOOF
1 [] - P ( i ( x , y ) ) | -P(x) I P(y). 2 [] - P ( i ( p , i ( q . p ) ) ) I - P ( i ( i ( p , l ( q , r ) ) , l ( l ( p , q ) , i ( p , r ) ) ) ) I -P(i(n(n(p)>,p)) I -P(i(p,n(n(p)))) I - P ( i ( i ( p , q ) . l ( n ( q ) , n ( p ) ) ) ) I - P ( l ( l ( p , i ( q , r ) ) , l ( q , i ( p , r ) ) ) ) I $ANSWER(step_allFrege_18_3S_39_40_46_21). 8 [] P ( i ( i ( x , y ) , i ( i ( y , z ) . l ( x , z ) ) ) ) . 9 [] P ( i ( i ( n ( x ) , x ) , x ) ) . 10 □ P ( i ( x , l ( n ( x ) , y ) ) ) . 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 44 46 51 53 54 58 60 65 67 69 72 77 82
[hyper,1,8,8] P ( i ( i ( i ( l ( x , y ) , l ( z , y ) ) , u ) , i ( i ( z , x ) , u ) ) ) . [hyper,1,8,10] P ( l ( i ( i ( n ( x ) , y ) , z ) , l ( x , z ) ) ) . [hyper,1,10,9] P ( i ( n ( i ( l ( n ( x ) , x ) , x ) ) , y ) ) . Chyper.1,25,26] P ( i ( l ( x , l ( y , z ) ) , i ( l ( u , y ) , l ( x , i ( u , z ) ) ) ) ) . [hyper, 1,25,8] PU(i(x.y> , l ( l ( l ( x , z ) , u ) , l ( l ( y , z ) , u ) ) ) ) . [hyper,1,29,30] P ( l ( i ( x , l ( l ( y , z ) , u ) ) , l ( l ( y , v ) , l ( x , i ( i ( v , z ) , u ) ) ) ) ) . [hyper,1,30,28] P ( i ( l ( l ( n ( i ( l ( n ( x ) , x ) , x ) ) , y ) , z ) , i ( i ( u , y ) , z ) ) ) . Chyper.1,32,9] P ( i ( i ( x , l ( i ( n ( y ) ,y) , y » ,i(i(n
A 2 3 - S t e p P r o o f of t h e H i l b e r t A x i o m S y a t e m Otter 3.0.2b+, Aug 1994 Tha job waa atartad by woa on altair.mca.anl.gov, Frl Mar 31 17:12:66 1995 The command waa "otter302c*. > EMPTY CLAUSE at 2.43 aec > 87 [hyper,3,50,53,56,10,81,83] ♦ANSWER(atep.allHilbert.18.21.22.3.54.30). Length of proof ia 23. PROOF
Level of proof i s 15.
1477
Collected Works of Larry Wos
1478
1 [] - P U U . y ) ) I -P(x) I P(y>. 3 [] -PCi(p.iCq.p))) I - P ( l ( l ( p , i ( q . r ) ) . i ( q , i ( p , r ) ) ) ) I - P ( i ( i ( q , r ) , i ( i ( p . q ) , i ( p . r ) ) ) > I -P(i(p,i(n
[hypar,1,8.8] P U ( i ( i ( i ( x , y ) , i ( z , y ) ) ,u) , i ( l ( z , x ) ,u))) . Cb.yp»r,l,8,10] P ( i ( i ( i ( n ( x ) , y ) , z ) , i ( x , z ) ) ) . [hyper, 1,10,9] P ( i ( n ( l ( l ( n ( x ) , x ) , x ) ) , y ) ) . ( h u f l ) [hyper.25.33,27] P ( l ( x , x ) ) . (haat-1) [hypar. 25,26,34] P ( l ( l ( x , y ) , l ( n ( l ( l ( n ( z ) ,z) , z ) ) , y ) ) ) . [hyper,1,29,29) P ( i ( i ( x , i ( y , z ) ) , i ( i ( u , y ) , l ( x , i ( u , z ) ) ) ) > . (heat-1) [hypar.25,37,27] P ( i ( i ( x , y ) , i ( i ( n ( i < y , z ) ) , i ( y , z ) ) , l ( x , z ) ) ) ) . [hyper.l,33,36] P ( i ( x , i ( n ( i ( i ( n ( y ) , y ) , y ) ) , x ) ) ) . [hyper,1,37,38] P ( i ( l ( x , l ( n ( l ( y , z ) ) , i ( y , z ) ) ) , i ( i ( u , y ) , l ( x , i ( u , z ) ) ) ) ) . [hypar,1,40,39] P ( l ( l ( x , l ( n ( y ) , y ) ) , l ( z , i ( x , y ) ) ) ) . [hyper,1,29,41] P ( i ( i < n ( x ) , y ) , i ( z , i ( i ( y , x ) , x ) ) ) ) . [hyper.1,40,43] P ( i ( l ( x , l ( y , z ) ) , l ( l ( n ( z ) , y ) , l ( x , z ) ) ) ) . ( h u f l ) [hyper.26.44.28] P ( i ( i ( n ( x ) , n ( y ) ) , l ( y , x ) ) ) . [hyper,1,37,44] P < l ( i < x , l ( n < y ) , z » , l ( i ( u , i ( z , y ) ) , l ( x , i ( u , y ) ) ) ) ) . [hyper,1,33,45] P ( l ( x , l ( y , x ) ) ) . [hyper,1,47,60] P ( i ( i ( x , i ( y , z ) ) , i ( y , i ( x , z ) ) ) ) . (heat-1) [hyper, 25,53,28] P ( i ( n ( x ) , i ( x , y » ) . (heet-1) [hyper, 25, S3,26] P U ( i ( x , y ) , i ( i ( z , x ) , i ( z , y ) ) » . [hyper,1,44,65] P ( i ( i ( n ( x ) , y ) . i ( n ( y ) . x ) ) ) . [hyper,1,47,63] P ( i ( l ( x , i ( y , z ) ) , i ( i ( n ( y ) , z ) , i ( x , z ) ) ) ) . [hyper,1,70,36] P ( i ( i ( n ( x ) . y ) . i ( l ( x , y ) , y ) ) ) . [hyper,1,53,78] P ( i ( i ( x , y ) . i ( i ( n ( x ) , y ) , y ) ) ) . [hyper,1,78,65) P ( i ( i ( x , l ( x , y ) ) , i ( x , y ) ) ) . [hyper,3,50,53,56.10,81,83] llNSVER(stap_allHilbart_18_21_22_3_54_30).
A 2 4 - S t e p P r o o f o f t h e A l t e r n a t e Lukasiewicz A x i o m S y s t e m Otter 3.0.2M, Aug 1994 The job ins started by wos on sltair.mcs.uil.gov, Fri Mar 31 18:33:12 1995 Ths coasund was "otter302c". > EMPTY CLAUSE at 2.64 ssc $ANSWER(step_allLuka.x.l9.37_59).
> 93 [byp*r,5,51,63,85]
Lsngth of proof is 24. Laval of proof is IS. PROOF 1 [] -P). 10 [] P(i(x,i(a(x).y))). 25 [] -PU(x.y)) I -P(x) I P(y). 26 [] P(i(i(x,y),i(i(y.z),i(x,z)))). 27 [) P(i(i(n(x),x),x)). 28 0 P(i(x,i(a(x),y)». 29 [hypar,l,8,8] P(i(i(i(i(x,y),i(z,y)),u),i(i(z,x),u))). 33 [hypar.1,8,10] P(i(i(i(n(x),y),z),i(x,z))). 34 [hyper.1,10,9] P(i(a(l(i(a(x),x),x)),y)).
The Power of Combining Resonance with Heat
35 36 37 38 39 40 42 44 45 47 49 61 54 55 57 68 63 67 72 78 85 93
(heat-1) [hyper,25,26,34] P ( l ( l ( x , y ) , i ( n ( i ( i ( n ( z ) , z ) , z ) ) , y ) ) ) . [hyper,1.29,29] P ( i ( l ( x , l ( y , z ) ) , l ( l ( u , y ) , i ( x . i ( u , z ) ) ) ) ) . (heat-1) [hyper.25,36,27] P < i ( i ( x , y ) , i < i ( a ( l ( y , z ) ) , l ( y , z ) ) , l ( x , z ) ) ) ) . [hyper,1,33,35] P ( i ( x , i ( n ( i ( i ( n < y ) , y ) , y ) ) , z ) > ) . [hyper,1.36.37] P ( i ( i ( x , l ( n ( i ( y , z ) ) , i ( y , z ) ) ) , l ( l ( u , y ) , i ( x , i ( u , z ) ) ) ) ) . [hyper,1,39,38] P ( i ( i ( x , i ( n ( y ) , y ) ) , l ( z , i ( x , y ) ) ) ) . [hyper,1,29,40] P ( i ( i ( n ( x ) , y ) , i ( z , i ( i ( y . x ) . x ) ) ) ) . [hyper,1,39,42] P ( i ( i ( x , i ( y , z ) > , i ( i ( n ( z ) , y ) , i ( x , z ) ) ) ) . (heat-1) [hyper,25,44.28] P(l(i(n(x) ,n(y)) , l ( y , x ) ) ) . [hyper, 1,36,44] P ( i ( i ( x , i ( n ( y ) , z ) ) , i ( i ( u , i ( z , y ) ) , i ( x . i ( u , y ) ) ) ) ) . Chypar.1,33,45] P ( i ( x , i ( y , x ) ) ) . (heat-1) [hyper.25,26.49] P ( i ( l ( l ( x , y ) , z ) , i ( y , z ) ) ) . [hyper,1,47,42] P ( l ( i ( x , i ( l ( i ( y , z ) , z ) , u ) ) , l ( l ( n ( z ) , y ) , i ( x , u ) ) ) ) . [hyper,1,47,49] P ( i ( i ( x , i ( y . z ) ) , l ( y , l ( x , z ) ) ) ) . (heat-1) [hyper,25,55,28] P ( l ( n ( x ) , i ( x , y ) ) ) . (heat-1) [hyper,26.55,26] P ( l ( l ( x , y ) , i ( l ( z , x ) , i ( z , y ) ) ) ) . [hyper,1,8,57] P ( i ( i ( l ( x , y ) , z ) , i ( n ( x ) , z ) ) ) . [hypar,1,8,58] P ( i ( i ( i ( i ( x , y ) , i ( x , z ) ) , u ) , l ( i ( y , z ) , u ) ) ) . [hypar,1,47,63] P ( i ( i ( x . i ( y , z ) ) , i ( i ( i ( z , u ) , y ) , i ( x , z ) ) ) ) . Chypar.1,67,72] P ( l ( i ( x . y ) , l ( l ( i ( y , z ) , u ) . l ( i ( u . x ) , y ) » ) . Chyp«r,l,54,78] P ( i ( i ( n ( x ) , y ) , i ( i ( z , y ) , i ( i ( x , z ) , y ) ) ) ) . [hypar.5,51,63,85] $*NSWER(lt«p.allLuka.x.l9.37.59).
A 2 5 - S t e p P r o o f of t h e Woa A x i o m S y s t e m Ottsr 3.0.2b+, Aug 1994 Th« job i u atartad by wos on altalr.mcs.anl.gov, Frl Mar 31 20:47:12 1995 Ths comaand was "ott«r302c*. > EMPTY CLAUSE at 3.61 sac $AHSWER(stap.allWos.x.l9.37.60).
Length of proof i s 25.
> 110 [hyper,6,61,63,101]
Laval of proof Is 16.
PROOF
1 [] - P ( i ( x , y ) ) I -P(x) I P(y). 6 [] - P ( l ( i ( i ( p . q ) . r ) , i ( q , r ) ) ) I - P ( l ( l ( l ( p , q ) , r ) , i ( n ( p ) , r ) ) ) I - P ( i ( i ( s , i ( n ( p ) , r ) ) , i ( s , i ( i ( q , r ) , i ( l ( p , q ) , r ) ) ) ) ) I *ANSVER(step_allVos_x_19_37_60). 8 [] P ( l ( l ( x , y ) , i ( l ( y , z ) , i ( x , z ) ) ) ) . 9 [] P ( i ( i ( n ( x ) , x ) , x ) ) . 10 □ P ( l ( x , l ( n ( x ) , y ) ) ) . 25 □ - P ( i ( x , y ) ) I -P(x) I P(y). 26 □ P ( l ( l ( x , y ) , i ( i ( y , z ) . i ( x . z ) ) ) ) . 27 [] P ( i ( i ( n ( x ) , x ) , x ) ) . 28 0 P ( i ( x , i ( n ( x ) , y ) ) ) . 29 33 34 36 36 37 38 39 40 42 44 45 47 49 51 54 55
[hyper,1,8,8] P ( i ( i ( l ( l ( x , y ) , l ( z , y ) ) , u ) , l ( i ( z , x ) , u ) ) ) . [hyper,1,8,10] P ( i ( l ( i ( n ( x ) , y ) , z ) , l ( x , z ) ) ) . [hyper,1,10,9] P ( l ( n ( l ( i ( n ( x ) , x ) , x ) ) , y ) ) . Cheat-1) [hyper,25,26,34] P ( i ( l ( x , y ) , l ( n ( l ( i ( n ( z ) , z ) , z ) ) , y ) ) ) . [hyper,1,29,29] P ( l ( i ( x , i ( y , z ) ) , l ( l ( u , y ) , i ( x , i ( u , z ) ) ) ) ) . (heat-1) [hyper,25,36,27] P ( i ( i < x , y ) , i ( l < n ( l ( y , z » , i < y , z ) ) , l ( x , z ) ) ) ) . [hyper,1,33,35] P ( i ( x , l ( n < i ( i ( n ( y ) , y ) , y ) ) , z ) ) ) . [hyper, 1,36,37] P U ( l ( x . i ( n ( l ( y , z ) > , i ( y , z ) ) ) , i ( i ( u , y ) , i ( x , i ( u , z ) ) ) ) ) . [hyper,1,39,38] P ( l ( l ( x , l ( n ( y ) , y ) ) , i ( z , i ( x , y ) ) ) ) . [hyper,1,29,40] P ( i ( l ( n ( x ) , y ) , i ( z , l ( l ( y , x ) , x ) ) ) ) . [hyper,1,39,42] P ( l ( l ( x , i ( y , z ) ) , l ( l ( n ( z ) , y ) , i ( x , z ) ) ) ) . (heat-1) [hyper,25,44,28] P ( l ( l ( n ( x ) , n ( y ) ) , i ( y , x ) ) ) . [hyper, 1,36,44] P ( l ( l ( x , i ( n ( y ) , i ) ) , i a ( u , i ( z , y ) ) , i ( x , i ( u , y ) ) ) ) ) . [hyper,1,33,45] P ( i ( x , i ( y , x ) ) ) . (heat-1) [hyper,25,26.49] P ( i ( i ( i ( x , y ) , z ) , i ( y , z ) ) ) . [hyper,1,47,42] P ( i ( i ( x , i ( i ( i ( y , z ) , z ) , u ) ) , i ( i ( n ( z ) , y ) , i ( x , u ) ) ) ) . [hyper,1,47,49] P ( i ( i ( x , i ( y , z ) ) , i ( y , l ( x , z ) ) ) ) .
1479
1480
Collected Works of Larry Wos
67 ( h M t - l ) [hyp«r,25,55,28] P(i(n(x) , l ( x , y ) ) ) . 58 (h«at-l) Chyp«r.25,68,26] P U ( l ( x , y ) , i ( i ( z , x ) , i ( z , y ) ) ) ) . 63 fcypwr, 1,8,87] P ( i ( i ( i ( x , y ) , z ) , i < n ( x ) , z ) ) ) . 67 [hypw.l,8,660 P ( i ( i ( i ( i ( x , y ) , i ( x , z ) ) , u ) , l ( i ( y , z ) , u ) ) ) . 73 [hyptr.1,47,63] P ( i ( i ( x , i ( y , z > ) , i ( i ( i ( z , u ) , y ) , i ( x , z ) ) » . 81 thyp«:,l,67,73] P ( l ( i ( x , y ) , i ( i ( i ( y . z ) , u ) . i ( i ( u . x ) , y ) > > ) . 89 Chyp.r,l,M,81] P ( i ( l ( n ( x ) , y ) , i ( i ( z , y ) , l ( i ( x , z ) , y ) ) ) ) . 101 [hyp«r,l,58,89] P ( l ( i ( x , l ( n ( y ) , z ) ) , l ( z , i ( l ( u , z ) , i ( i ( y , u ) , z ) ) ) ) ) . 110 Chyp*r,6,61,63,101] *AMSVER(«t«p_*llVo»_i_19_37_60).
References 1. Kalman, J., A shortest single axiom for the classical equivalential calculus, Notre Dame J. Formal Logic 19 (1978) 141-144. 2. Kalman, J., Condensed detachment as a rule of inference, Studia Logica 42 (1983) 443-451. 3. Lukasiewicz, J., Elements of Mathematical Logic, Pergamon Press: Oxford, 1963. 4. McCharen, J., Overbeek, R., and Wos, L., Complexity and related en hancements for automated theorem-proving programs, Computers and Mathematics with Applications 2 (1976) 1-16. 5. McCune, W., Otter 3.0, Preprint MCS-P399-1193, Mathematics and Com puter Science Division, Argonne National Laboratory, Argonne, Illinois, November 1993. 6. McCune, W., and Wos, L., Experiments in automated deduction with condensed detachment, in D. Kapur (ed.), Proceedings of the Eleventh International Conference on Automated Deduction (CADE-11), Lecture Notes in Artificial Intelligence, Vol. 607, Springer-Verlag, New York, 1992, pp. 209-223. 7. Padmanabhan, P., and McCune, W., Automated reasoning about cubic curves, Computers and Mathematics with Applications 29 (1995) 17-26. 8. Padmanabhan, P., and McCune, W., Single identities for ternary Boolean algebras, Computers and Mathematics with Applications 29 (1995) 13-16. 9. Wos, L., Automated Reasoning: 33 Basic Research Problems, PrenticeHall: Englewood Cliffs, NJ, 1987. 10. Wos, L., Automated reasoning and Bledsoe's dream for the field, in Auto mated Reasoning: Essays in Honor of Woody Bledsoe, R. S. Boyer (ed.), Kluwer Academic Publishers: Dordrecht, 1991, pp. 297-345. 11. Wos, L., The Automation of Reasoning: An Experimenter's Notebook with OTTER Tutorial, accepted for publication by Academic Press (1995).
The Power of Combining Resonance with Heat
1481
12. Wos, L., The kernel strategy and its use for the study of combinatory logic, J. Automated Reasoning 10 (1993) 287-343. 13. Wos, L., Meeting the challenge of fifty years of logic, J. Automated Rea soning 6 (1990) 213-232. 14. Wos, L., The resonance strategy, Computers and Mathematics with Appli cations 29 (1995) 133-178. 15. Wos, L., Searching for circles of pure proofs, J. Automated Reasoning, accepted for publication (1995). 16. Wos, L., Overbeek, R., Lusk, E., and Boyle, J., Automated Reasoning: Introduction and Applications, 2nd edn., McGraw-Hill: New York, 1992.
The Hot List Strategy* LARRY WOS and GAIL W. PIEPER Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439-4844, U.S.A. e-mail: wosQmcs.anl.gov; pieverOmcs.anl.gov (Received: 15 September 1997; accepted: 13 October 1997) A b s t r a c t . Experimentation strongly suggests that, for attacking deep questions and hard problems with the assistance of an automated reasoning program, the more effective paradigms rely on the retention of deduced information. A significant obstacle ordinarily presented by such a paradigm is the deduction and retention of one or more needed conclusions whose complexity sharply delays their consideration. To mitigate the severity of the cited obstacle, I formulated and feature in this article the hot list strategy. The hot list strategy asks the researcher to choose, usually from among the input statements characterizing the problem under study, one or more statements that are conjectured to play a key role for assignment completion. The chosen statements—conjectured to merit revisiting, again and again—are placed in an input list of statements, called the hot list. When an automated reasoning program has decided to retain a new conclusion C—before any other statement is chosen to initiate conclusion drawing—the presence of a nonempty hot list (with an appropriate assignment of the input parameter known as heat) causes each inference rule in use to be applied to C together with the appropriate number of members of the hot list. Members of the hot list are used to complete applications of inference rules and not to initiate applications. The use of the hot list strategy thus enables an automated reasoning program to briefly consider a newly retained conclusion whose complexity would otherwise prevent its use for perhaps many CPU-hours. To give evidence of the value of the strategy, I focus on four contexts: (1) dramatically reducing the CPU time required to reach a desired goal, (2) finding a proof of a theorem that had previously resisted all but the more inventive automated attempts, (3) discovering a proof that is more elegant than previously known, and (4) answering a question that had steadfastly eluded researchers relying on an automated reasoning program. I also discuss a related strategy, the dynamic hot list strategy (formulated by my colleague W. McCune), that enables the program during a run to augment the contents of the hot list. In the Appendix, I give useful input files and interesting proofs. Because of frequent 'This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Computational and Technology Research, U.S. Depart ment of Energy, under Contract W-31-109-Eng-38. Reprinted from the Journal of Automated Reasoning, Vol. 22, 1-44, 1999, with kind permission from Kluwer Academic Publishers.
The Hot List Strategy
1483
requests to do so, I include challenge problems to consider, commentary on my approach to experimentation and research, and suggestions to guide one in the use of McCune's automated reasoning program OTTER. Key words: automated reasoning programs, hot list strategy, OTTER.
1
Paradigms and Motivation
Two distinctly different paradigms exist for automating logical reasoning. A key difference between the two paradigms concerns whether to accrue new deduced conclusions. Perhaps the most effective approaches where no new information is retained are based on Prolog technology. Among the more effective approaches in which new information is accrued are the computational logic paradigm and the clause language paradigm [17, 18, 21]. The former provides the basis for the Boyer-Moore program [1, 2] (mainly used for program verification), and the latter (in the Argonne variant) provides the basis for McCune's OTTER [7, 8]. Based on my own experiments spread over more than thirty years, and supported by comments from other researchers in automated reasoning, I currently have little doubt that an automated attack on deep questions and hard problems virtually requires the accrual of information—sometimes more than 300,000 deduced facts, relations, and lemmas. With the retention of so many deduced conclusions, however, the program needs a means for selecting where next to focus its attention. Typically, the researcher instructs the program to use either weighting (conclusion complexity) or level saturation. With level saturation, in the vast majority of cases, the size of the levels of retained conclusions grows so rapidly that the objective remains out of reach. Therefore, of the two cited direction strategies, weighting is ordinarily far more effective. However, with its use, one encounters a significant obstacle: A needed conclusion may have been retained, but, because of its complexity, the program delays focusing on it—sometimes forever. Why does the complexity of a conclusion have this effect? The answer rests with the fact that (1) the program typically chooses as the focus of attention retained conclusions by less complex first and (2) the program typically lacks effective means for looking ahead and identifying good choices from among very complex conclusions. To address just such an obstacle, I formulated the hot list strategy. 1.1. TYPES OF STRATEGY AND THE H O T LIST
For increasing the power of automated reasoning programs, the area of research that offers the greatest promise is strategy. My preference is for strategy that restricts reasoning, but clearly strategy that directs reasoning is also vital. The hot list strategy perhaps belongs in neither category. Its use rearranges the order in which conclusions are drawn.
Collected Works of Larry Wos
1484
Specifically, the hot list strategy asks the researcher to choose, from among the statements characterizing the question under attack, those that might be especially useful and merit revisiting again and again. The chosen statements are consulted by the program repeatedly, for completing (rather than initiating) applications of inference rules. Indeed, with the hot list strategy, members of the hot list are automatically and immediately considered with each newly retained clause, before another conclusion is chosen from list(sos) to be the focus of attention to drive the program's reasoning. 1.2.
A MOTIVATING EXAMPLE
Consider the following theorem from algebra. The theorem asserts that commutativity can be proved in rings in which, for every element x, the cube of x equals x. The mathematical proof begins with xxx = x and substitutes the square of v + w for x\ in other words, instantiation is employed, an inference rule that is not offered by OTTER. (Instantiation should not be offered by a program, for currently no means is known for wisely applying it; in other words, one encounters an important difference between mathematics and logic on one hand and automated reasoning on the other.) The left side becomes the cube of v + w, and the right side is simply v 4- w; call this equation (1). After ex panding and simplifying with the hypothesis xxx = x, the left side becomes v + w + vxrw + vwv + vww + tow + www + wwv. Call the resulting equation (2). Equation (3) is obtained from equation (2) by subtracting v + w from both sides. For equation (4), set v = w and simplify with the hypothesis xxx = x. One now has the key lemma 0 = 6x, taken from a proof shown to me by S. Winker. From the viewpoint of automated reasoning with paramodulation as the inference rule, the corresponding proof (in clause notation) begins by paramodulating (into the clause equivalent of) x(xx) = x, with the focus on the into term xx, from the clause equivalent of left distributivity, with the focus on the left-hand argument. After appropriate demodulation, the result is the clause equivalent of equation (2). No clause equivalent of equation (1) is produced, other than the intermediate result obtained by applying the unification to the into clause of the preceding paramodulation. Then, to obtain equation (4), the program can first deduce the clause equivalent of equation (3) by various means, for example, a nonstandard use of demodulation; see Section 2.2 showing how the hot list strategy can replace such an approach. At this point, one has an illustration of the obstacle under discussion: The clause equivalent of equation (3) has weight equal to 37, if measured purely in symbol count. Such a "heavy" clause will be delayed from consideration—if ever—for a substantial amount of CPU time, for many, many clauses will almost certainly be retained with weight less than 37. Therefore, the clause equivalent of equation (4), the desired lemma, will not be adjoined to the growing database of deduced conclusions for far, far too long.
The Hot List Strategy
1485
This situation is not uncommon. Experiments repeatedly encounter the ob stacle of the program's needing to consider some retained conclusion C whose complexity (weight) is so great that far too much CPU time is required before that conclusion is chosen as the focus of attention to initiate applications of some inference rule. When this occurs, the program is prevented from access to those conclusions that would otherwise be deduced. With the hot list strategy, however, the obstacle is far less formidable. In deed, in the case under discussion, the obstacle is overcome. Specifically, when I attempted to obtain a proof of the 0 = 6a; lemma with the hot list strategy, only 0.3 CPU-seconds were required, compared with approximately 17 CPU-seconds without the strategy. 1.3. A BRIEF GLIMPSE INTO HISTORY
Motivated by my wish to give a reasoning program the power to easily prove the cited lemma, 0 = 6a;, in the context of the theorem that asserts that rings are commutative in the presence of xxx = x, I introduced the concept of the hot list in the mid-1980s. My notion focused on paramodulation and no other inference rule. Approximately a decade later (on November 2, 1993), McCune implemented the hot list strategy in OTTER 3.0. Significantly, he generalized my original notion by admitting the use of the hot list for all inference rules. He also generalized the notion of the hot list strategy by formulating a dynamic version to complement my static version; see Section 4.1. For historical interest, I now correct an error found in some of my earlier writings. Specifically, OTTER was not the first program to offer the hot list strategy. Rather, the strategy was first offered in the program ITP [6] in the mid-1980s. This information escaped me because of my lack of experimentation with ITP, in turn explained (in part) by the program being menu driven rather than file driven; I sharply prefer the latter. 1.4.
AREAS BENEFITED AND CHALLENGES PRESENTED
To enable researchers to estimate the potential of the hot list strategy, I present (in Section 5) the results of various experiments in lattice ordered groups, in Robbins algebra, and in logic calculi. The evidence (given in Section 5) support ing the value of the hot list strategy focuses on four areas: dramatically reducing the CPU time required to produce proofs, finding proofs that had previously resisted all but the more inventive attacks, discovering proofs more elegant than had been known, and answering a question previously considered intractable for an automated reasoning program. To encourage researchers—especially mathematicians and logicians—to use the hot list strategy (as well as other strategies), I include (in the Appendix) var ious input files that are acceptable to the powerful automated reasoning program OTTER, and I include interesting proofs as well. (The input clauses and proofs
Collected Works of Larry Wos
1486
are also available on the Web; see the RL htpp://www.mcs.anl.gov/people/wos/ hotlist-input.html.) Also, to increase the likelihood of success when using the hot list strategy, I include (in Section 6) diverse hints. Finally, I offer a challenge problem for researchers (see Section 7).
2
Relation to Other Strategies, Procedures, and Inference Rules
The hot list strategy shares several features with other strategies and inference rules implemented in the automated reasoning program OTTER. In this section, I focus first on the possible relevance of the set of support strategy, then (in order) on demodulation, AC-unification, and linked inference rules. 2.1. S E T OF SUPPORT STRATEGY
Like the hot list, the set of support strategy was motivated by the study of a single (and very simple to prove) theorem: Commutativity can be proved for groups of exponent 2, those in which the square of x (for every element x) is the identity e. My notion (regarding the set of support strategy) was to restrict the applications of the inference rules in use and to force the search to key on information chosen by the researcher. To implement the strategy, the researcher places the key information in an input list, the initial set of support. Such is also the case for implementing the hot list strategy: The researcher places what is conjectured to be key information in an input list, the hot list. New conclusions for which the set of support strategy plays a role are recursively traceable to the initial set of support; new conclusions for which the hot list strategy plays a role are recursively traceable to the initial hot list. For a second way in which the two strategies are related, I have always recommended that the special hypothesis (clauses) be included in the initial set of support, and I typically recommend that such information also be placed in the initial hot list. For a third similarity, just as the newly retained conclusions in which the set of support strategy plays a role are added to the set of support list, so also can clauses be added to the hot list if the dynamic version of the hot list strategy, due to W. McCune, is in use (see Section 4.1 for details). Finally, the conclusions that are retained and that are traceable to the initial set of support can be used, without violating the set of support strategy, to deduce additional conclusions; such is also the case for clauses deduced with the hot list strategy, depending on the value assigned to the heat parameter. Of the cited similarities between the two strategies, perhaps the most im portant concerns keying the program's search on information selected by the researcher. As for a key difference, a member from the set of support is chosen to initiate an application of an inference rule (when the set of support strategy is in use); in
The Hot List Strategy
1487
contrast, the members of the hot list come into play only after a clause has been chosen as the focus of attention to drive the program's reasoning. Indeed, the members of the hot list are used only to complete an application of an inference rule. For a second important difference, the main object of the set of support strategy is to restrict the program's reasoning; on the other hand, the object of the hot list strategy is to rearrange the order in which conclusions are drawn. 2.2.
DEMODULATION
Demodulation is used by many programs for simplification and canonicalization. A dramatic improvement in efficiency is often due directly to automatically applying various equalities (demodulators) to each deduced conclusion. In the mid-1980s, no doubt influenced by the effectiveness of using demod ulation, I conjectured that the automatic consideration by paramodulation of each newly retained clause with each member of a chosen set of equalities might also prove to be a powerful move. The chosen set of equalities would be placed on a list to be called the hot list. Of course, for the hot list strategy, two constraints on the actions of the program must be relaxed: (1) In contrast to demodulation, rather than requiring that no instantiation of variables be permitted in the into clause, full two-way unification must be permitted; and (2) in contrast to the usual use of inference rules, rather than having the program wait until the newly retained clause is chosen as the focus of attention, the program must be allowed to immediately use it as one of the parents in the attempt to draw additional conclusions. Indeed, with the hot list strategy, members of the hot list are automatically and immediately considered by paramodulation, if in use, and by any other inference rule in use (as McCune suggested by way of a generalization) with each newly retained clause, before another conclusion is chosen from list(sos) to be the focus of attention to drive the program's reasoning. As noted in Section 1.2, one can also use the hot list strategy to replace cer tain nonstandard uses of demodulation. In particular, the use of demodulation at the literal level can enable a program to apply extended cancellation. Instead, clauses that function as nuclei and that capture various types of cancellation can be placed in the (input) hot list, and hyperresolution can be used as one of the inference rules. Then, when the program decides to retain a new clause, before another clause is chosen as the focus of attention to initiate applications of inference rules, the clause will be processed with cancellation of the type present in the hot list. One might find this alternative more attractive than either (1) waiting for the clause to be chosen as the focus of attention to then be considered with included clauses for cancellation or (2) using demodulation in some usual form or some nonstandard form.
Collected Worics of Larry Wos
1488
2.3.
AC-UNIFICATION
For this objective, one begins by including in the hot list a clause for associativity and a clause for commutativity, assigning the heat parameter the value 2, and invokes the use of paramodulation. Then, for each newly retained clause before another clause is chosen from list(sos) to be the focus of attention, the program will automatically apply paramodulation to the new clause together with that for associativity and also apply the inference rule to the new clause together with that for commutativity. For each newly retained clause, the two clauses that are so deduced will have heat level 1, and they will be retained depending on the other input parameters and subsumption and such. Then, because of the assignment of the value 2 to the heat parameter, the heat-level-1 clauses (under discussion) that are retained will each immediately be considered by paramodulation with associativity and also with commutativity. In the case under discussion, a limited form of associative-commutative unification is used. Of course, also in use in a limited way in this case is associative unification and commutative unification, at heat level 1 and at heat level 2. Also deduced at heat level 2 are clauses to which associativity has been applied twice and to which commutativity has been applied twice. Use of the hot list strategy in the described manner can produce clauses early in a run that, because of the reassociation and commuting of terms, admit further canonicalization. By choosing the appropriate assignment of the heat parameter, one has control over the amount of AC-unification that is used. I find this alternative to AC-unification appealing, for I have always been wary of a general and full use of that form of unification. Indeed, a full use of ACunification can drown a program in unwanted conclusions; in general, for prac tical considerations, restrictions must be imposed. If associative unification without commutativity is desired, a clause for asso ciativity is included in the hot list, and no clause for commutativity is included. 2.4. LINKED INFERENCE RULES
Use of the hot list strategy can also (in effect) partially substitute for access to linked inference rules [11, 16]. Consider the case in which the clause C is deduced, the decision is to retain C, C has high weight, and, were it not for the fact that a particular term t in C was left associated, paramodulation would apply to C and a clause corresponding to the special hypothesis with the into term being the right association of t. With linked paramodulation, one could use associativity as a link to right associate t in C to then permit the result to unify appropriately with the special hypothesis clause. If one was using the hot list strategy with associativity and the special hypothesis clause in the hot list, and if the heat parameter was assigned the value 2, then the decision to retain C would immediately trigger the application of paramodulation to C and the clause for associativity. Then, if the decision was to retain the result,
The Hot List Strategy
1489
immediately paramodulation would consider the reassociated version of C with the special hypothesis clause. Linked paramodulation does not work precisely as does the hot list strategy. Among the differences is that concerning so-called intermediate clauses. Specif ically, once the program begins an application of a linked inference rule, the weight of a clause that is temporarily deduced on the way to the linked conclu sion is ignored. Linked paramodulation cannot produce an intermediate clause whose weight prevents the continuation of the application of linked paramod ulation. In particular, in the example just discussed, the reassociation of the clause C because of using associativity as a link cannot prevent completion of the application of the linked inference rule because of the weight of the reasso ciated clause. In contrast, with the hot list strategy, the reassociated C must be retained in order to permit its consideration with the special hypothesis clause. (I am curious about the possible usefulness of considering each clause with mem bers of the hot list before the decision concerning retention is made.) A second difference concerns the possible use of demodulation. The intermediate clauses resulting from a partial application of a linked inference rule are not subject to demodulation, but their correspondents are with the hot list strategy.
3
Intuitive View of t h e Hot List Strategy
In the following sense, one sees that the use of the hot list strategy enables an automated reasoning program to "look ahead". Assume that paramodulation is the only inference rule in use and that a clause A has just been selected to be considered (by paramodulation) with each of the various clauses that have already been chosen as the focus of attention. Let H be a member of the hot list, and assume that the assignment to the appropriate input parameter (called heat) permits consideration of H with each new clause that the program decides to retain. Also assume that the program is choosing where next to focus its attention based purely on symbol count and that the weight (number of symbols) of A is 12. Let C be a clause with weight 25 that is deduced from A and some earlier-considered clause such that the program decides to retain C. Finally, assume that, prior to the decision to retain C, the consideration of A has resulted in the retention of ten new clauses each with weight 15. Ordinarily, without the intervention of the hot list strategy, the program would not consider applying paramodulation to C and another clause until after focusing first on the ten newly retained clauses each with weight 15. Quite likely the consideration of C would be delayed further, for the focus on the weight-15 clauses would probably result in the retention of additional clauses of weight less than 25. However—and here is how the use of the hot list strategy enables the pro gram to look ahead—before another clause is chosen as the focus of attention, paramodulation is applied to C and H. If the application yields a clause D with
1490
Collected Works of Larry Wos
weight 10 that is retained, then the program will have almost immediate access to the use of D for initiating applications of inference rules. Otherwise, without the hot list strategy and assuming that H was available to be used, the program would be forced to wait for D to be deduced when C is chosen as the focus of attention—if C is ever chosen. Indeed, as soon as A has completed its role as the focus of attention, D will be chosen to initiate applications of inference rules if all eligible clauses have weight greater than 10. In addition to the deduction of D, other low-weight clauses might be retained whose parents are C and H, and still others from C and some other member of the hot list. If the size of the hot list is small, the researcher need not in general worry about the program being forced to cope with an avalanche of clauses of the type under discussion. In effect, the program looks ahead to deduce just those immediate descendants of C whose other parent is a member of the hot list, assuming that the heat parameter is assigned the value 1. Such a deduced clause D has heat level equal to 1 (defined formally in Section 4). If the value the researcher assigns to the (input) heat parameter is 2, then after the program decides to retain a clause D with heat level 1 but before another clause is chosen as the focus of attention (in this case) paramodulation is applied to D and each member of the hot list, before that newly retained clause is used in its fullest to initiate applications of inference rules. Any clause that is so deduced has heat level equal to 2. Clauses of heat level 2 are immediate descendants of immediate descendants of a newly retained clause. If the program does retain clauses of heat level 2, then the program is looking even further ahead.
4
Formalism, a Powerful Option, and an Illustration
In this section, I give needed definitions, discuss parameters and options, and employ one or more of the notations acceptable to McCune's program OTTER. I then briefly turn to a discussion of the dynamic hot list strategy, and I close this section with an illustration of the use of the hot list strategy. When I use the phrase "clause or its equivalent", I am not restricting the definitions and terminology to OTTER-like programs. Rather, I intend that the appropriate adjustment(s) be made when the program in use requires input in some other form. Often, I use the term "clause" to mean "clause or its equivalent". DEFINITION, INITIAL AND EXTENDED HOT LIST. The initial hot list is a (possibly empty) set of clauses (or their equivalent) selected by the researcher and included in the input. Depending on the exercising of appropriate options, the initial hot list can be extended by adjoining new members to it during a run. Each member of the hot list (initial and extended) is eligible for automatic and immediate consideration to complete applications of the inference rules in
The Hot List Strategy
1491
use, with the requirement that the application of one of those inference rules be initiated by focusing on a clause (or its equivalent) that the program has decided to retain. In particular, no inference rule is permitted to apply to a set of clauses all of which are members of the hot list. DEFINITION, HEAT LEVEL. The heat level of a clause is 0 if and only if no clauses of the hot list participate in the application of an inference rule; the heat level of a clause is 1 if and only if (1) clauses from the hot list participate and (2) the heat level of the clause initiating the application of the inference rule is 0; for n > 2, the heat level of a clause is n if and only if the heat level of the clause that initiates the application of the inference rule is n — 1. Regarding the eligibility of the members of the hot list, the (input) parameter known as heat must be assigned a value greater than or equal to 1 for members to be eligible for use. In other words, permission must be given to deduce clauses with heat level greater than or equal to 1. To instruct OTTER to attempt to deduce clauses with heat level 1, one adds a single command to the input file. If the value in the command of the following type is equal to 1, after OTTER decides to retain a newly generated clause A but before another clause is chosen as the focus of attention (to drive the program's reasoning), each inference rule in use is applied to A (as if A were the focus of attention) and the appropriate number of clauses H in the hot list, where that number is determined by the inference rule being applied. assign(heat,1). The default of the parameter heat is 1. For example, if paramodulation is being applied, then A is considered with each H in the hot list; if hyperresolution is being applied, then, consistent with the requirements of nucleus and satellites, all subsets of clauses from the hot list are considered with A. Any clause B deduced from A and one or more clauses H is treated as all deduced clauses are treated (with regard to subsumption, weighting, demodulation, and the like). If such a clause B is used in a proof, the proof will show for that clause (heat=l), meaning that its heat level is 1. To enable OTTER to deduce and possibly use clauses whose heat level equals 2, one modifies the preceding command to be the following. assign(heat,2). With this modified command, after OTTER decides to retain a newly generated clause B whose heat level equals 1 but before another clause is chosen as the focus of attention, each inference rule in use is applied to B (as if B were the focus
of attention) and the appropriate number of clauses H in the hot list, where that number is determined by the inference rule being applied. As expected, any clauses that are deduced whose heat level equals 2 are treated as all deduced clauses are treated.
1492
Collected Works of Larry Wos
Of course, one can assign to the heat parameter values greater than 2. An assignment of the value 0 instructs the program not to consult the hot list. 4.1. DYNAMIC H O T LIST STRATEGY
As a powerful option, OTTER can also be instructed to dynamically adjoin new clauses to the hot list during the run. This extension of the hot list, developed by McCune, relies on a command of the following type. assign(dynamic_heat_weight, 20). With this command as part of the input, OTTER will—during a run—adjoin to list(hot) any clause that (1) the program has decided to retain and (2) the program has assigned a weight less than or equal to 20. (Clauses adjoined to list (hot) during a run must have an assigned weight less than or equal to the max-weight currently in use.) The dynamic hot list strategy shares some similarity with the set of support strategy. Specifically, one expects or intends that clauses dynamically adjoined to list(hot) play a key role in a program's attack on the question or problem under study, just as one wishes, ideally, that clauses dynamically adjoined to list(sos) play a key role. Of course, as experimentation repeatedly shows, the ideal case (for the set of support strategy) is not even approximated: Typically, a few CPU-minutes suffices to produce a large and growing list(sos), many of whose members will never be chosen as the focus of attention to drive the program's reasoning. The most common cause for a clause not being chosen as the focus of attention is its high weight or complexity. Perhaps in the future, this deficiency will be sharply reduced by having the program automatically (and possibly self-analytically) move certain clauses from list(sos) to list (usable) before they are chosen as the focus of attention; see Section 13.4 of [21]. List(sos) is the name of the list of clauses that have not yet been chosen as the focus of attention but are recursively traceable to the initial set of support or were in the initial set of support. List(usable) consists of the input clauses that were part of the problem description but not placed in the initial set of support and also the clauses that were selected from list(sos) to be the focus of attention to drive the program's reasoning. However, an immediate move of a clause to list(usable) will permit its use only for inference rule completion, not for inference rule initiation. 4.2. AN ILLUSTRATION
For an example of a somewhat elaborate use of the hot list strategy, consider the following excerpt from an input file, where "|" denotes logical or, "-" denotes logical n o t , the predicate P can be interpreted as "provable", the function i can be interpreted as "implication", and the function n as "negation". (When a line contains a "%", the characters from the first "%" to the end of the line are treated by the program as a comment.)
The Hot List Strategy assign(heat,3). list(hot). '/, Following is for condensed detachment. -P(i(x,y)) | -P(x) I P(y). '/, Following is Meredith's single axiom. P(i(i(i(i(i(x,y),i(n(z),n(u))),z),v),i(i(v,x),i(u,x)))). 7. Following were proved in temp.otter3.meredith.hot.out P(i(x,i(y,x))). P(i(i(i(x,y),z),i(y.z))). P(i(x,i(n(x),y))). P(i(n(n(x)),x)). P(i(n(n(x)),x)). P(i(i(i(x,y),z),i(n(x),z))). P(i(x,n(n(x)))). P(i(x,n(n(x)))). P(i(i(n(x),x),x)). P(i(i(x,i(x,y)),i(xIy))). end_of_list.
1493
'/. CN-CAM
The theorem under study asserts that Meredith's axiom is a single axiom for two-valued sentential (or propositional) calculus; see [10]. I give Meredith's proof in the Appendix. As one sees, the inference rule used in the study is condensed detachment, used by Kalman in his landmark study of equivalential calculus [4, 5]; hyperresolution is used. Typically, in such investigations, the only clauses used to initiate applications of an inference rule are unit clauses. Therefore, for the hot list strategy to be usable, one must include at least one nucleus in the hot list; the cited nucleus is the only such clause in the study. After all, as commented earlier, with the hot list strategy, all but the initiating clause for the application of each inference rule must be members of the hot list. Similarly, because the nucleus contains two negative literals, the hot list must also contain at least one other positive clause to permit the use of the hot list strategy to complete applications of condensed detachment through the use of hyperresolution. In the given example, I included in the hot list Meredith's axiom and various members of known axiom systems, each of which had been proved in a prior experiment. The assignment of the value 3 to the heat parameter instructs OTTER to attempt to deduce clauses with heat level less than or equal to 3. Whether any clauses that are thus deduced are retained depends, of course, on other parameters, such as the max.weight parameter (which places an upper bound on the weight of a retained clause).
Collected Works of Larry Wos
1494
5
Experiments
At this point, I present evidence of the power of the hot list strategy by dis cussing various experiments, comparing results with and without the use of this strategy. The experiments focus on four separate contexts: (1) reducing the amount of CPU time to complete a proof, (2) producing a proof of a theorem previously out of reach without much intervention by the researcher, (3) finding elegant proofs (in terms of length), and (4) answering challenging questions. 5.1. REDUCING THE CPU
TIME
In this subsection, I demonstrate that using the hot list strategy can reduce CPU time. However, since no panacea exists, in Section 5.3 one sees that using the hot list strategy can increase CPU time. I begin with a theorem concerned with lattice ordered groups. The theorem was brought to my attention by I. Dahn at the 1994 QED workshop. The the orem (which I call LOGTi for Lattice Ordered Groups Theorem 1) asks one to prove that, for each element x in a lattice ordered group, x is equal to the prod uct of its positive part pp(x) and its negative part np(x), where 1 is the identity of the group, pp(x) is the union of x and 1, and np(x) is the intersection of x and 1. The following (in another notation acceptable to OTTER) gives the axioms, needed definitions, and various members of a complete set of reductions for groups. Regarding the notation in the following, i denotes inverse, 1 the group identity, u union, n intersection, pp positive part, and np negative part; !» denotes "not equal". (The anomaly of having associativity of both union and intersection expressed as they are results from the manner in which I was given the problem; OTTER, when processing the input, interchanges their respective arguments so that the left-associated argument is on the left.) The significance of the line of dashes in the following will become clear when I focus on the set of support strategy. x * x. ( x * y ) * z - x* ( y * z ) . 1*X -
X.
X*l
X.
-
i(x)*x - 1. x*i(x) - 1. i ( l ) • 1. i(i(x)) - x. i(x*y) - i(y)*i(x). n(x,x) » x. u(x,x) ■ x. n(x,y) ■ n(y,x). u(x,y) - u(y,x). n(x,n
The Hot List Strategy
1495
u(x,u(y,z)) - u(u(x,y),z). u ( n ( x , y ) , y ) - y. n ( u ( x , y ) , y ) » y. x*u(y,z) « u ( x * y , x * z ) . x*n(y,z) = n(x*y,x*z). u(y,z)*x « u(y*x,z*x). n(y,z)*x « n(y*x,z*x). pp(x) = u ( x , l ) . np(x) = n ( x , l ) . pp(a)*np(a) != a. Dahn noted that the theorem had been proved in 30 CPU-seconds on a SPARCstation-10 by a program called Discount. (Actually, in addition to the SPARCstation-10, simultaneously two other computers were used, each a SPARCstation-2; after 30 CPU-seconds, Discount announced the theorem provable, but an additional 90 CPU-seconds was required to return the proof.) For my study of LOGTl with OTTER, my colleague McCune in part paved the way; he chose a Knuth-Bendix approach with the following symbol ordering. lex([l,a,u(_,_),n(_,_),*(_,_),i(_),pp(_),np(_)]). McCune assigned max.weight the value 15 (for a bound on the complexity of retained conclusions, measured in symbol count) and assigned the value 4 to the pick-given-ratio. (This latter assignment instructs OTTER to focus on four conclusions based on their complexity, one by first come first serve, then four, then one, and the like.) Finally, McCune placed all clauses in list(sos), in effect instructing OTTER not to use the set of support strategy. On a SPARCstation2, OTTER found a proof (given in the Appendix), of length 33, in approximately 19,280 CPU-seconds. Following McCune's lead, I then took up the attack, using a SPARCstation10 (approximately two times faster than the SPARCstation-2). I report here (not in chronological order) the results of two experiments. In the first, I chose for the hot list precisely the positive clauses in the initial (input) set of support—the clauses given earlier that follow the line of dashes— assigned the heat parameter the value 1. The experiment produced a proof (given in the Appendix), of length 32, in approximately 3148 CPU-seconds. When the run was terminated, after choosing as the focus of attention 3122 clauses, 12,187 (of which 1714 were hot) were retained and 4,670,281 (of which 61,980 were hot) were generated. This experiment provides the inexperienced researcher with a simple rule to follow for choosing clauses for the hot list and for assigning the heat parameter. The second experiment provided more dramatic evidence of the power of the hot list in reducing CPU time. The hot list consisted of the members of the set
Collected Works of Larry Wos
1496
of support used in the just-cited experiment (the positive clauses found after the line of dashes given earlier) augmented by the clauses for commutativity of union and of intersection and the clauses for left and right inverse. I assigned a value of 2 to the heat parameter. The experiment produced a proof, of length 42, in approximately 347 CPU-seconds. 5.2.
BRINGING A THEOREM WITHIN RANGE
In the preceding subsection, I provided evidence of the value of using the hot list strategy to sharply reduce the CPU time required to complete an assignment. Here, my focus is on the use of this strategy to bring within range theorems whose proof resisted various automated attempts. The area is Robbins algebra, an area that I find fascinating mainly for three factors. First, just three axioms suffice to study this algebra, the following expressed in yet one more notation acceptable to OTTER, where one can interpret the function n as complement and the function + as union. EQ(+(x,y),+(y,x)). % commutativity Eq(+(+(x,y),z), + (x, + (y,z))). '/, associativity EQ(n(+(n(+(x,y)),n(+(x,n(y))))),x). '/, Robbins axiom
Second, at least on the surface, Robbins algebra is a natural target for auto mated reasoning; indeed, one can easily study this field by using paramodulation and choosing various options to control this inference rule or by using paramod ulation within a Knuth-Bendix approach. Third, and so intriguing, the question of whether every Robbins algebra is a Boolean algebra was open until McCune with his program EQP answered it in the affirmative [9]; in fact, Tarski and his students failed to answer the question. The question is posed in [3]. My focus here is on RAJZ, a theorem that provides a splendid challenge for automated reasoning programs, especially those that do not offer AC-unification or induction. In terms of the "ordering relation" on the elements of Robbins algebra, the theorem says that the existence of two elements c and d with d less than or equal to c together with the Robbins axioms is all that is needed to imply Boolean. The theorem was first proved by Winker using induction [12, 13]; McCune later obtained a proof with AC-unification. My goal, for years, has been to prove the theorem without induction and without AC-unification. In 1996,1 made yet another attempt. I assigned heat the value 1 and placed in the (input) hot list only the clause corresponding to the special hypothesis, c 4- d = c. I assigned max.weight the value 30, the pick-given .ratio the value 3, and max_distinct.vars the value 3 and included clear(eq-units_both.ways). Regarding weight templates, in weight Jist (pick_given), I included two to respectively purge associative variants in four and five variables, one to purge expressions in which n(n(n(t))) terms occur, and the template for the tail strategy. The inclusion of the template for the tail strategy causes OTTER to prefer clauses in the equality predicate whose
The Hot List Strategy
1497
right-hand argument is short. Success: In approximately 44,926 CPU-seconds, OTTER produced a proof of length 80 and level 18, with retention of clause (66147). In fairness, I must admit that I also succeeded vrithout the hot list strategy. Indeed, in approximately 9770 CPU-seconds, OTTER produced a proof (given in the Appendix) of length 78 and level 16, with retention of clause (48308). Nevertheless, my delight at finding such a proof—for the first time, with and without the hot list—remains unbounded. 5.3. FINDING ELEGANT PROOFS
One measure of the elegance of a proof is its brevity. In this subsection, I show how the hot list strategy has proved useful in the search for elegant proofs, in the context of proof length. Note that no practical algorithm appears to exist for searching for short proofs, and note that numerous obstacles, some of which are indeed subtle, are encountered in such a search; see [21], which takes the form of an experimental notebook. I focus on the formulas known as XHK and XHN, each of which alone is strong enough to provide a complete axiomatization for equivalential calculus. P(e(x,e(e(y,2),e(e(x,z),y)))). P(e(x,e(e(y,z),e(e(z,x),y)))).
'/. XHK 7. XHN
To prove either of the corresponding theorems (by deducing one of the other known single axioms) provides an excellent test for ideas and for programs. Indeed, whether either is a single axiom was an open question until Winker obtained proofs with excellent insight, many computer runs, much time, and considerable assistance from one of Argonne's automated reasoning programs [14, 15]. As an indication of the difficulty offered by the two benchmark theo rems, (not counting the predicate P) Winker's 84-step proof for XHK relies on the use of a formula of length 71, and his 159-step proof for XHN relies on the use of a formula of length 103. To enable researchers to conduct similar experiments, here is a complete list of the shortest single axioms for equivalential calculus, each expressed in clause notation. 'I, Following are a l l of the s h o r t e s t single axioms '/, for e q u i v a l e n t i a l calculus. P ( e ( e ( x , y ) , e ( e ( z , y ) , e ( x , z ) ) ) ) . */. Pl.YQL P ( e ( e ( x , y ) , e ( e ( x , z ) , e ( z , y ) ) ) ) . '/. P2.YQF P ( e ( e ( x , y ) , e ( e ( z , x ) , e ( y , z ) ) ) ) . 7. P3.YQJ P ( e ( e ( e ( x , y ) , z ) , e ( y , e ( z , x ) ) ) ) . 7. P4.UM P ( e ( x , e ( e ( y , e ( x , z ) ) , e ( z , y ) ) ) ) . 7. P5.XGF P ( e ( e ( x , e ( y , z ) ) , e ( z , e ( x , y ) ) ) ) . 7. P7.WN P ( e ( e ( x , y ) , e ( z , e ( e ( y , z ) , x ) ) ) ) . 7. P8.YRM P ( e ( e ( x , y ) , e ( z , e ( e ( z , y ) , x ) ) ) ) . 7. P9.YR0
Collected Works of Larry Wos
1498
P(e(e(e(x,e(y,z)),z),e(y,x))). P(e(e(e(x,e(y,z)),y).e(z,x))). P(e(x,e(e(y,e(z,x)),e(z,y)))). P(e(x,e(e(y,z),e(e(x,z),y)))). P(e(x,e(o(y,z),e(e(z,x),y)))).
7. PYO X PYM */. XGK 7. XHK 7. XHN
From various experiments, I had found a 27-8tep proof showing that XHK is a single axiom and a 24-step proof showing that XHN is also a single axiom. I decided next to use the hot list strategy to attempt to find even shorter (more elegant) proofs. I began with XHN, using a level-saturation approach, assigning the value of 36 to max_weight and the value of 2 to each of 24 resonators corresponding to the steps of the 24-step proof I had obtained. I included in list (passive) the negations of each of the other twelve shortest single axioms, expecting a deduction of UM only. I assigned the value 1 to the heat parameter and placed in the hot list the clauses corresponding to XHN and the condensed detachment nucleus. In approximately 38 CPU-seconds (on the equivalent of a SPARCstation-2), OTTER deduced UM with a proof of length 22 and level 11, with retention of clause (864). When I then deleted the use of the resonance strategy [19, 23], in approximately 770 CPU-seconds OTTER deduced UM with a proof of length 20 and level 14, with retention of clause (9777). Why did the hot list strategy succeed? A key rests with the effect the strat egy has when level saturation is being used. For an illustration of how this combination causes the program to look ahead, assume the heat parameter is assigned the value 1, that condensed detachment is in use, and that the hot list contains the needed clauses (such as that for condensed detachment and, say, the shortest single axiom candidate under study). When a level-1 clause A is deduced and retained and the hot list strategy is in use, A will immediately be used to initiate applications of condensed detachment with the clauses needed to complete the applications chosen from the hot list. If condensed detachment succeeds, yielding a clause B, B will have level 2 (and heat level 1). If B is retained, it will be placed among the level-1 clauses, even though it has level 2. Keep in mind that, very likely, the program is still generating level-1 clauses and simply paused, because of the use of the hot list strategy. Then, when the program is using the level-1 clauses to deduce those of level 2, B (of level 2) will be used; but its use will generate level-3 clauses, which, if retained will be placed among the level-2 clauses. In addition, because of the use of the hot list strategy, such a level-3 clause C will be used immediately. If clauses are deduced and retained, they will be of level 4, but be placed among those of level 2, just as B was deduced and placed among the level-1 clauses. Regarding the successful completion of the cited 20-step proof, a glance at the output file shows that the program was deducing and retaining clauses of level 11 when it found and used a level-14 clause to complete the proof. In other words, the program, because of using the hot list strategy in conjunction with
The Hot List Strategy
1499
level saturation, was able to look ahead into higher levels. One thus sees how the program found a different proof, one of length 20, by traversing a sharply different search path. To determine the effect on CPU time, on a computer that is perhaps 1.3 times as fast as a SPARCstation-10,1 conducted two experiments. Approximately 38 CPU-seconds suffices with level saturation and without the hot list strategy, in contrast to approximately 306 CPU-seconds with the combination of level saturation and the hot list strategy. Again, for part of the explanation, one need only glance at the corresponding output files, finding that level 10 completes with clause (87) when the hot list strategy is not in use, and level 10 completes with clause (1488) when it is in use. These two figures further illustrate how the hot list strategy, when level saturation is in use, causes the program to look ahead into higher levels. The figures also illustrate a disadvantage of using this combination of strategies, for the size of the levels can grow far more rapidly. One final experiment merits discussion. In the spirit of cursory proof checking (as opposed to rigorous proof checking), both covered in [21], I used as resonators the 20 steps of the just-cited proof, assigned a value of 2 to each, and assigned to the max.weight the value 2. Again, I used the hot list strategy, motivated by a distantly related experiment in another logic calculus, an experiment that yielded under similar conditions an even shorter proof; see [24] and Section 3.4 of the technical report [26] that is a far longer version of this article. I was not rewarded: OTTER merely returned the 20-step proof already discussed. On a whim, I repeated the experiment with one change, that of omitting the use of the hot list strategy. I was more than startled, for OTTER completed a 19step proof of level 14 (given in the Appendix), showing that a shorter proof can be found with cursory proof checking either by adding the use of the hot list strategy or (in this case) by removing its use. I know of no shorter proof establishing XHN to be a single axiom for equivalential calculus, a fact that implicitly poses a possible research question. Next I turned to a study of XHK, applying a similar approach to that which yielded the 20-step proof that XHN is indeed a shortest single axiom for equiv alential calculus. In one of several experiments, I assigned max.weight the value 48, used ancestor subsumption (and, therefore, used back subsumption), as signed the pick_given.ratio the value 3, reassigned the max.weight to the value 20 after 30 clauses were chosen as the focus of attention, and used the hot list strategy with the heat parameter assigned the value 1.1 placed in the hot list the clauses corresponding to XHK and the condensed detachment nucleus. I used the pick.and.purge weight.list and included, for the resonance strategy, weight templates corresponding to the steps of the earlier-mentioned 27-step proof, which completed with a deduction of YRO. OTTER succeeded in finding a 26-step proof. Four of the steps reflect the use of the hot list strategy, each showing (heat=l). As with XHN, one additional experiment merits citing because of the progress that resulted. However, rather than cursory proof checking providing the key,
Collected Works of Larry Wos
1500
parameter changes proved to be crucial. Regarding the changes, I assigned the pick_given_ratio the value 2 rather than 3, instructed the program to reduce the max_weight from 48 to 24 after 50 clauses were chosen as the focus of attention, and used for resonators weight templates that correspond to the 26 steps of the just-cited proof. OTTER succeeded in finding a 23-step proof of level 19 (given in the Appendix), but, rather than deducing YRO, the proof completed with the deduction of YQL. Again, four of the steps reflect the use of the hot list strategy, each showing (heat=l). 5.4. ANSWERING CHALLENGING QUESTIONS
In Section 5.1, I focused on the theorem LOGTI to show how the hot list strategy can be used to sharply reduce the time required to find a proof. In this subsection, I focus on another problem in lattice ordered groups, LOGT2, which provides evidence of how the hot list strategy can be used to obtain a solution to an interesting question whose answer had eluded researchers. Again, Dahn brought the original theorem to my attention. In this case, however, his program had not been able to obtain a proof. One is asked in LOGTI to prove a relation among inverse, intersection, and union, a relation whose negation is the following, where i denotes inverse, n denotes intersection, and u denotes union. i(n(a,b)> !-
u(i(a),i(b)).
As the problem was proposed to me, one is permitted to use essentially the entire underlying theory. I began by discarding all nonunit clauses and all new (not in the input for LOGTI) equalities but two, the following. u(x,n(y,z)) * n(u(x,y),u(x,z)). n(x,u(y,z)) - u(n(x,y),n(x,z)). I added in list(sos) the two positive equalities to the input for LOGTi (of course, omitting the denial of its conclusion) and added the negative equality to list(passive). I chose a level saturation approach, using the following command. set(sos_queue). I assigned the value 2 to the heat parameter and used the following hot list. list(hot). n(x,y) - n ( y , x ) . u(x,y) » u ( y , x ) . i(x)*x - 1. x*i(x) « 1. u ( n ( x , y ) , y ) - y. n ( u ( x , y ) , y ) « y.
The Hot List Strategy
1501
x*u(y,z) - u(x*y,x*z). x*n(y,z) - n(x*y,x*z). u(y,z)*x » u(y*x,z*x). n(y,z)*x - n(y*x,z*x). u(x,n(y,z)) » n(u(x,y),u(x,z)). n(x,u(y,z)) » u(n(x,y),n(x,z)). end_of_list. I also used resonators from an earlier success, each assigned the value 2. With the hot list strategy, OTTER produced a proof in approximately 1826 CPU-seconds with length 37 and level 15, with retention of clause (6698). By comparison, without the hot list strategy, OTTER produced no proof. The explanation for the success rests with the following. With level satura tion and the heat parameter assigned the value 2, when a new clause is retained at, say, level 4, the hot list strategy will first immediately generate clauses of level 5 (and heat level 1) and then, if any of them are retained, use them to immediately generate clauses of level 6 (and heat level 2). So, in one sense, the use of the hot list strategy with a breadth-first search enables a program to look ahead; see Section 6.2 for more discussion. Rather than simply turning to another topic, I mention here one additional set of experiments concerning LOGT2. The results of the experiments nicely illustrate how narrow can be the window of opportunity to answer a difficult question and how intertwined various procedures often are. Whereas one of the experiments yielded the shortest proof (given in the Appendix) of LOGT2 of which I know—a proof of length 22—the other experiments yielded no proof of any type. The 22-step proof was found by dropping the use of level saturation, assigning the value 10 rather than 6 to the pick.given_ratio, assigning the value 3 rather than 2 to the heat parameter, and using a hot list consistent with the recommendation (given in Section 6.1) concerning "short and simple" clauses that occur in the input set of clauses. The hot list consisted of the following ten clauses. l*x = x. x*l = x. i(x)*x = 1. x*i(x) - 1. i ( l ) - 1. i ( i ( x ) ) - x. n(x,x) = x. u(x,x) = x. u ( n ( x , y ) , y ) - y. n ( u ( x , y ) , y ) - y. The other options and assignments were the same as were used in the cited suc cessful level-saturation run for LOGT2. Although some of the other experiments
Collected Works of Larry Wos
1502
from the set failed to yield a proof, they were each most valuable, for their re spective failure provides evidence of how narrow is the window of opportunity. In one of the experiments that failed to yield any proof, except for dropping the use of level saturation, the experiment was identical to that which yielded the 37-step proof; in other words, the hot list consisted of the elements used in the level-saturation experiment that succeeded. In another experiment that failed, the hot list consisted of just the following two clauses. i(x)*x - 1. x*i(x) - 1. In my view, the narrowness of the window of success is not a weakness of the hot list strategy; rather, it simply reflects the depth of mathematics and the fact that no panacea exists.
6
Recommendations and Hints for Using t h e H o t List Strategy
This section is devoted to guidance and notions about using the hot list strategy most effectively. I begin with a few recommendations, then follow with more specific hints for choosing parameters. 6.1. RECOMMENDATIONS
I recommend that an input clause placed in list(hot) also be placed in some other list. (Among the exceptions was that discussed earlier, at the end of Section 2.2, in the context of cancellation.) For example, if an input clause would ordinarily be included to complete the application of an inference rule rather than initiate the application, then I recommend that, if the clause is placed in list(hot), it also be placed in list (usable). As an aside, and independent of the hot list strategy, clauses one suspects are best used to complete rather than initiate inferences belong, in my view, in list(usable). As another aside, I conjecture that the effectiveness of an automated reasoning program would be increased if, when such a (completion) clause is retained, it were immediately placed in list(usable) rather than being placed in list(sos). This option is not offered by OTTER or, for that matter, from what I know by any program, and it might make an interesting research problem. When one is studying logic calculi (in which condensed detachment is used in the presence of the inference rule hyperresolution), I recommend placing a clause of the following type both in list(usable) and in list(hot). - P ( i ( x , y ) ) 1 -P(x) I P ( y ) . This clause is best used to complete applications of an inference rule, and almost never to initiate them.
The Hot List Strategy
1503
On the other hand, if I were using OTTER to apply the hot list strategy to study rings in which the cube of (every element) x is x, I would place the clause equivalent of xxx = x both in list(sos) and in list(hot). I would take this action even though such a clause is best used to initiate applications of an inference rule, rather than complete an application. One might be puzzled by this recommendation, for clauses in list(sos) are best used to initiate applications of an inference rule, while clauses in the hot list are used only to complete applications. Nevertheless, my experience with the hot list strategy suggests that the inclusion (in the hot list) of such clauses adds to the effectiveness of this strategy. As a global recommendation, I suggest including in the hot list those clauses that correspond to the special hypothesis of the proposed theorem under attack. Such clauses also, in my view, are wisely placed in list(sos). For a related global recommendation, I suggest the hot list consist of those equations from the input set of support having eight or fewer symbols (ignoring parentheses and commas) whose right-hand argument is a single symbol, con stant or variable. Of course, I have in mind that predicates, functions, and the like are represented with single letters. The hot list can also be augmented with all similar "short and simple" clauses taken from the usable list. I recommend using the dynamic hot list strategy when one suspects that some of the clauses adjoined during the run merit repeated visiting as hypotheses for completing applications of an inference rule. In particular, I recommend the assignment of a small value for the dynamicJieat.weight, enough to permit new clauses to be adjoined to the hot list during the run, but not so big as to cause the hot list to become large. Even a small value can drown the hot list, if the weight -list contains templates that both have smaller values assigned to them and are frequently matched during the run. Indeed, one must exercise care when combining the dynamic hot list strategy with the resonance strategy. For example, if one includes resonators corresponding to formulas from equivalential calculus, because many formulas can match a single resonator, havoc may be the result. A clue is provided when one sees that OTTER is spending substantial CPU time on a single clause that is chosen as the focus of attention. Regarding assignments for the values for the heat and the dynamic Jieat.weight parameters—with the exceptions just noted and those discussed in Section 6.2— I can only suggest experimentation. One might profitably glance at some of the experiments I feature in Section 5; see also [20, 21, 22, 23] and especially [24] and [25]. (Of the various references, [24] is the choice for the researcher wishing far more detail concerning tendencies exhibited by the options offered by OTTER.) Before I turn to hints for using the hot list strategy, the following observation needs utterance. The use of the hot list strategy, as is the case for various options offered by OTTER that affect the search space, can produce unexpected results. For example, an assignment of the value 2 to the heat parameter can yield for a given problem a shorter proof than previously in hand, where an assignment of the value 1 may yield no proof. Indeed, in the latter case, the program might
1504
Collected Works of Larry Wos
inform the user that the set of support has gone empty. The explanation rests with the reordering of the space of drawn conclusions and canonicalizations that can occur with procedures such as demodulation and subsumption. For a second example, a small hot list may produce no proof, a slightly larger one may produce the best proof one has seen, and an even larger hot list may produce a proof of little interest. See Section 5.4 for examples of the type just discussed. In general, when one takes actions that change the search, one can expect that a longer clause might be needed to get a proof or expect other odd occurrences. 6.2. HINTS REGARDING THE H O T LIST STRATEGY
To complement the cited general recommendations, I offer the following more specific suggestions and examples. The first bulleted item concerns early exper imentation; the remaining items pertain to use of the hot list at any time in one's research. • Especially in the beginning of one's experiments, I recommend that the heat parameter be assigned the value 1, and I recommend that the (in put) hot list consist of the clause or clauses that correspond to the special hypothesis of the theorem under attack. For example, if the theorem con cerns groups in which the cube of x is the identity e, then the special hypothesis is the equation xxx = e. When studying some logic calculus, for a second recommendation, I suggest the axioms of the theory (if fewer than eight in number) and the nonunit clause (if such is used) correspond ing to condensed detachment. For a third recommendation, if the theorem under study offers no special hypothesis, or if the special hypothesis is messy (consisting of several nonunit clauses, for example), then I suggest putting in the hot list the elements of the (input) set of support that take the form of positive unit clauses. On the other hand, especially when no special hypothesis exists (as when one is studying some logic calculus), I repeat my second recommendation. • A value of 2 or greater for the heat parameter is suggested when one wishes a recursively heavier emphasis on the members of the hot list. The inclusion of an input hot list (and, of course, the use of the hot list strategy) is suggested when one conjectures that certain input clauses have been identified as meriting repeated consideration as hypotheses for drawing conclusions by completing applications of an inference rule. • By placing in the (input) hot list clauses for associativity and commutativity, one can use the hot list in place of a limited form of AC-unification. The greater the value assigned to the heat parameter, the more AC-unification that occurs. However, effectiveness can be severely impaired with asso ciativity in the hot list if the heat parameter is assigned the value 3 or greater.
The Hot List Strategy
1505
• Combining a level saturation search with the hot list strategy often pro duces impressive results. For a taste, when the program is adjoining clauses at level 4, with the hot list strategy in use and the heat parameter assigned the value 2, the program also is deducing (for possible retention) clauses at level 6. In the obvious sense, the cited combination permits the pro gram to look ahead, and the distance is greater than or equal to the value assigned to the heat parameter. For example, although the heat parame ter was assigned the value 1 when studying XHN, a proof of level 14 was completed as the program was deducing clauses of level 11.
7
Conclusions and a Challenge
In this article, I have featured the hot list strategy, presenting numerous per tinent experiments. I have also discussed briefly the dynamic hot list strategy. The hot list strategy asks the researcher to provide an input list (called the hot list) of statements (clauses) that the automated reasoning program uses to complete, in contrast to initiate, applications of the inference rules in use. Ordi narily, one chooses for the members of the hot list clauses that are conjectured to merit revisiting repeatedly, clauses on which to key the program's attack. The dynamic hot list strategy (formulated by McCune) extends the hot list strategy to permit the program to adjoin members to the hot list during the run. Both formulations address the inaccessibility of certain retained clauses, for far too long, because of their complexity. The evidence presented in this article shows that the hot list strategy can be used successfully in at least four con texts: reducing CPU time, finding proofs of theorems previously out of reach without much intervention of the researcher, finding more elegant proofs, and answering a question whose answer had steadfastly eluded researchers relying on an automated reasoning program. Continuing in the tradition begun at Argonne National Laboratory approxi mately three decades ago (in the early 1960s), I close with a challenging problem for interested researchers. • Evaluate the effectiveness of the hot list strategy, using as members of the hot list generated clauses, rather than retained clauses. This incarnation or modification of the hot list strategy would permit the program to draw conclusions that are children of clauses that might be discarded because of being too complex as measured in weight.
Appendix As promised, here I present input files and proofs. The input files are intended to facilitate the further study by researchers of the areas touched on in this article. They also are intended to serve as templates for research in other areas
1506
Collected Works of Larry Wos
of mathematics and logic. In some cases, I include lines preceded with "%", which McCune's program OTTER treats as a comment. The input files, as well as the proofs given here, provide the merest taste of what one can do with OTTER; more is found in my new book [24]. One of my main reasons for including specific proofs is my strong convic tion that likelihood of experimentation producing valuable results is sharply increased. Indeed, when the objective is the formulation of, say, a new strategy or new inference rule or the testing of a reasoning program, I have always been more than puzzled at the nonchalance of some regarding the value of having in hand a proof of the theorem under attack. Few (if any) means are better for measuring progress than seeing how many proof steps of a given proof have been produced with the new approach or the program under evaluation. I begin with an input file that can be used to initiate one's attack on finding a means for an automated reasoning program to prove, definitely not in a proof checking mode, that Meredith's single axiom suffices for an axiom system for two-valued sentential calculus. To aid one's research, I also include (essentially) Meredith's proof; it was produced with a cursory proof-checking run, using his steps as resonators and assigning max-weight the value 2. Input File for Studying Meredith's Single Axiom (at (hypar.ras). assignCaax.vaight, 28). aaaign(chanfa_liait.aftar, 2000). assignGiavjux.waight, 20). usign(aax_proofs, - 1 ) . claar(print.kapt). claar(back.sub). u « l g n ( u x _ B M , 110000). assign(r •port, 1800). assign(aax_dlstinct_vars, 7 ) . assign(piek_givan_ratio, 3 ) . asslgn(haat,l). sat(ordsr.hlstory). sat(lnput_sos_iirst). vaight.list(pick.glvan). X Tha following i s Maradith's singla axioa. »aight(PU(i,i(y,z)>),2). vaight(P,i(i(z,x),l(z,y)))),2). saight(P(i(i(x,i(x,y)).i(x.y))),2). »aight(P(i(iCx,i(y,z)).i(l(x.y),i(x,z)))),2). waight(P(i(i(i(x,y).z),i(n(x),z))),2). «aignt(P(i(n(n(x)),x)),2). »alght(P(i(x,n(n(x)))),2). waight(P(i(i(x,y),i(n(y),n(x)))).2). »aijat(P(i(i(n(x) ,n(y» ,i(y,x))) ,2) . »aight(P(i(i(x,y)(l(i(n<x),y),y))),2). waifht(P(i(i(n(x),y),i(i(z,y),l(i(x.z),y)))),2).
The Hot List Strategy
1507
weight(P(i(i(x,i(n(y).z)),i(x,i(i(u,z),l(i(y,u),z))))),2). X Following Is for recursive tail strategy. veight(i($U),t(2)),l>. end.of.list. list(usable). X Following is for condensed detachment. -P(i(x,y)) I -P(x) I P(y). X
The following disjunctions sxs known axiom systems.
-P(i(q,i(p,q))) I -P(i(i(p,i(q,r)).l(i(p>q),i(p,r)))) I -P(i(n(n(p)),p)) I -P(i(p,n(n(p)))) | -P(i(i(p,q),l(n(q).n(p)))) I -P(i(i(p.i(q,r)),i(q.i(p,r)))) I *ANSVER(step_allFrege_18_35.39.40_46_21). X 21 is dependent. -P(i(q,i(p.q))) I -P(i(i(p,i(q.r)).i(q,i(p,r)))) I -P(i(i(q,r) ,i(i(p.q) ,i(p,r)))) I -P(i(p,i(n(p),q))) I -P(i(i(p,q),i(i(n(p).q),q») I -P(i(i(p,i(p.q)) ,i(p.q))) I *AMSWER(stsp.«llHilbsrt. 18.21.22.3.54.30). X 30 is dapsndant. -P(i(q,i(p,q))) I -P(i(l(p,l(q,r)),l(i(p,q),i(p,r)))) *ANSUER(stsp.allBEH.Church.FL.18.35.49).
I -P(i(i(n(p) ,n(q)) ,i(q,p))) I
-P(i(i(i(p,q),r),i(q,r))) I -P(i(i(i(p,q),r),i(n(p),r))) $ANSVER(step_allLuka.x.l9_37_S9).
I -P(i(i(n(p),r),l(i(q,r),l(i(p.q),r)))) I
-P(i(i(i(p,q),r),i(q,r))) I -P(l(l(i(p.q),r),i(n(p),r))) I -P(i(i(s,i(n(p),r)),i(s,iz),i(i,i))». X P(i
X CN-CAM
list(passive). - P ( i ( i ( p , q ) . i ( i ( q , r ) , i ( p . r ) ) ) ) I $ANSVER(step_Ll). - P ( i ( i ( n ( p ) , p ) , p ) ) I *ANSWER(stap.L2). - P ( i ( p , i ( n ( p ) , q ) ) ) I t«(SWER(stap.L3). - P ( i ( q , i ( p . q ) ) ) I $JU(SWER(stap.l8). - P ( i ( l ( l ( p , q ) , r ) , i ( q , r ) ) ) I «JWSWER(stap.l9). - P C K K p . K q . r ^ . K q . K p . r ) ) ) ) I tlHSVER(step_21). - P ( i ( i ( q , r ) . i ( i ( p , q ) , l ( p , r ) ) ) ) I M»SVER(step_22). - P ( i ( i ( p , i ( p , q ) ) . l ( p , q ) ) ) I «*MSWER(stap.30). - P ( i ( i ( p , i ( q , r ) ) , i ( i ( p . q ) , l ( p . r ) ) ) ) | $»NSWER(step_35). - P ( i ( i ( l ( p , q ) , r ) , l ( n ( p ) . r ) ) ) I »ANSyER(atep.37). - P ( i ( n ( n ( p ) ) , p ) ) I *AHSVER(atep_39). - P ( i ( p , n ( n ( p ) ) ) ) I ti(NSWER(stap.40). - P ( i ( i ( p , q ) , i ( n ( q ) , n ( p ) ) ) ) I *AHSWER(step_46). - P ( i ( i ( n ( p ) , n ( q ) ) , i ( q , p ) ) ) I $AMSWER(step_49). - P ( i ( i ( p , q ) . i ( i ( n ( p ) , q ) , q ) ) ) I *ANSVER(step_54). - P ( i ( i ( n ( p ) , r ) , i ( i ( q , r ) , i ( i ( p . q ) , r ) ) ) ) I $A»SWER(stap.S9). -P(i(iCs,i(n(p> , r ) > , i t s , i ( i ( q , r ) , i ( i ( p , q ) , , . ) ) ) ) ) | $jutSWER(step_60). - P ( i ( n ( n ( a ) ) , a > ) I $ANSWE»(lama_24). - P ( l ( a , n ( n ( a ) ) ) ) I $ANSVER(lemma_29). - P ( i ( i ( a , b ) , i ( i ( c , a ) , i ( c , b ) ) ) ) I *ANSUER(lajuna.25). - P ( i ( i ( a , b ) , i ( n ( b ) , n ( a ) ) ) ) I $ANSVER(lemaa_36). and.of.list.
Collected Works of Larry Wos
1508
list(demodulators). X (n(n(x)) - junk). (o
X CH-CAM
Meredith's Proof > H O T Y CLAUSE »t
1.35 sec
> 58 [hyper,6,67,47,38]
SANSWERUuka,[1,2,3]). Length of proof is 41.
Level of proof is 30.
PROOF 1 [] - P U ( x . y ) ) I -P(x) IP(y). 6 [] - P ( i ( i ( p , q ) , i ( i ( q , r ) , i ( p , r ) ) ) ) I - P ( i ( i ( n ( p ) , p ) , p ) ) I - P ( i ( p , i ( n ( p ) , q ) ) ) I $ANSVER(Luka,[1,2,3]). 7 D P(i(i(i(i(i(x,y),i(n(z),n(u))),z),v),i(i(v.x).i(u,x)))). 8 [hyper,1,7,7] P ( i ( i ( i ( i ( x , y ) , i ( z , y ) ) , i ( y , u ) ) , i ( v , i ( y , u ) ) ) ) . 9 [hyper,1,7,8] P ( i ( i ( i ( x , i ( n ( y ) , z ) ) , u ) . i ( y , u ) ) ) . 10 [hyper,1,7,9] P ( i ( i ( i ( x , x ) , y ) , i ( z , y ) ) ) . 11 [hyper,1,10,10] P ( i ( x , l ( y , i ( z , z ) ) ) ) . 13 [hyper,1,7.11] P ( l ( i ( i ( x , i ( y , y ) ) , z ) , i ( u , z ) ) ) . IS [hyper,1,7,13] P ( i ( i ( i ( x , y ) , z ) , i ( y , z ) ) ) . 17 [hyper,1,15,7] P ( i ( x , i ( i ( x , y ) , i ( z , y ) ) ) ) . 18 [hyper.1,15,17) P ( i ( x , i ( i ( i ( y , x ) , z ) , i ( u , z ) ) ) ) . 19 [hyper, 1,17,9] P ( l ( l ( i ( i ( i ( x , i ( n ( y ) , z ) ) , u ) , i ( y , u ) ) , v ) , i ( w , » ) ) ) . 20 [hyper, 1,7,18] P ( i ( l ( i ( i ( i ( x , i ( i ( i ( y . z ) , i ( n ( u ) , n ( v ) ) ) , u ) ) , » ) , l ( v 6 , v ) ) , y ) , i ( v , y ) ) ) . 21 [hyper, 1,7,19] P ( l ( l ( i ( x , y ) , i ( z , i ( n ( n ( y ) ) ,u))) , i ( v . l ( z , i ( n ( u ( y ) ) , u ) ) ) ) ) . 22 [hyper. 1,7,20] P ( l ( l ( i ( x , y ) , i ( z , i ( i ( i ( y , u ) , i ( n ( v ) , n ( x ) ) ) , » ) ) ) , i(w,l(z,i(i(l(y.u),l(n(r),n(x))),v))))). 23 [hyper,1,21,7] P ( l ( x . i ( l ( y , z ) , i ( n ( n ( y ) ) , z ) ) ) ) . 24 [hyper.1,22,17] P ( l ( x , l ( l ( l ( y , z ) , u ) , i ( i ( i ( z , v ) , i ( n ( u ) , n ( y ) ) ) , u ) ) ) ) . 25 [hyp«r,1,23,23] P ( l ( l ( x . y ) , l ( n ( n ( x ) ) , y ) ) ) . 26 [hyper, 1,7,23] P ( i ( i ( i ( i ( x , y ) , i ( n ( n ( x ) ) , y ) ) , z > , i ( u , z ) » . 27 [hyper, 1,24,24] P ( i ( i ( i ( x . y ) , z ) , i ( i ( i ( y , u ) , i ( n ( z ) , n ( x ) ) ) , z ) ) ) . 29 [hyper,1,10,25] P ( i ( x . i ( n ( n ( y ) ) , y ) ) ) . 30 [hyper,1,27,18] P ( i ( i ( i ( x , y ) , i ( n ( l ( l ( i ( z , i ( u , x ) ) , » ) , i ( w , T ) ) ) , n ( u ) ) ) , i(i(i(z,i(u,x)).v).i(if.v)))). 31 [hyper,1,17,29] P ( l ( i ( i ( x . i ( n ( n ( y ) ) , y ) ) , z ) , l ( u , z ) ) ) . 32 [hyper,1,7,30] P ( l ( i ( i ( i ( i ( x , l ( y . i ( z , u ) ) ) , » ) , i ( w , v ) ) , z ) , i ( v 6 , z ) ) ) . 33 [hyper.1,7,32] P ( i ( i ( i ( x , y ) , i ( z , i ( u , l ( y , v ) ) ) ) , i ( w , i ( z , i ( u , i ( y , v ) ) ) ) ) ) . 34 [hyper,1,33,7] P ( i ( x . i ( i ( y , i ( y , z ) ) , i ( u , i ( y , z ) ) ) ) ) . 35 [hyper,1,34,34] P ( i ( i ( x , i ( x , y ) ) , i ( z , l ( x , y ) ) ) ) . 36 [hyper,1,35,36] P ( i ( x , i ( i ( y , i ( y , z ) ) . i ( y , z ) ) ) ) . 37 [hyper,1,36,36] P ( i ( l ( x , i ( x , y ) ) , i ( x , y ) ) ) . 38 [hyper.1,9,37] P ( i ( x , l ( n ( x ) , y ) ) ) .
The Hot List Str&tegy
39 40 42 43 46 47 48 49 60 62 53 54 55 57 58
1509
Chyp«r,l,37,31] P ( i ( i ( i < x , l ( n ( n ( y ) ) . y ) ) , z ) , z ) ) . B»yp«r,1,37,263 P ( i ( i ( l ( l < x , y ) , i ( a ( n ( x ) ) , y ) ) , z ) , z ) ) . [hyp«r,l,26,39] P < i ( n ( n ( i U ( x , l ( n ( n ( y ) ) , y ) ) , z ) ) ) , z ) ) . [hyp«r,1,7,40] P ( i { i ( n ( x ) , x ) , l ( y , r ) ) ) . [hyp«r,1,18,42] P ( i ( i ( i ( x , i ( n ( n ( i ( i ( y , i C n ( n ( z ) ) , z ) ) , u ) ) ) , u ) ) , v ) , i ( w , v ) ) ) . [hyp«r,l,37,43] P ( i ( i ( n ( x ) , x ) , x ) > . [hyp«r,1.7,46) P ( l ( i ( i ( x , n ( i ( i ( y , i ( n { o ( z ) ) , z ) ) , n ( u ) ) ) ) , v ) , i ( u , v ) ) ) . Chyp«r, 1,48,47] P ( i ( x , n ( i ( l ( y , i ( n ( i i < z ) ) . z ) ) , n ( x ) ) ) ) ) . Chyp«r,1,18,49] P ( i ( i U ( x . i ( y , n ( i ( i ( z . i ( n ( n ( u ) ) ,u)) , n ( y ) ) ) ) ) , v ) . i ( w , v ) ) ) . Chypwr.1.37,60] P ( i ( i ( i ( x , i ( y , n ( i ( i ( z , i ( n ( n ( u ) ) , u ) ) , n ( y ) ) ) ) ) , v ) , v ) ) . Chyp«r,l,7,52] P ( l ( i ( x , y ) , i ( i ( i ( z , i ( n ( n ( u ) ) , u ) ) , n ( n ( x ) ) ) , y ) » . Diyp«r,1,53,53] P ( i ( i ( i ( x . i ( n ( n ( y > ) , y ) ) , n ( n ( i ( z . u ) ) ) ) , l ( l ( i ( v , l ( n ( n ( w ) ) , v ) > , n ( n ( z ) ) ) , u ) ) ) . [hyp«r,l,7,54] P ( i ( i ( i ( i ( i ( x , i ( n ( n ( y ) ) , y ) ) , n ( n ( z ) ) ) , u ) , v ) , i ( l < z , n ) , r ) ) ) . Chyp.r.1,55,7] P ( i ( i ( x , y ) , i ( i ( y , z ) , i ( x , z ) ) ) ) . Chypcr,6,57,47,38] $JWSVER(Luka, [1,2,3]) .
I now give an input file for proving LOGTl. Through appropriate modifica tion, one can use the file to study lattice ordered groups. The 32-step proof I give immediately after the file is obtained by assigning the heat parameter the value 1 (rather than 2) and by commenting out in the hot list the clauses for commutativity of union and of intersection and those for inverse. I then give McCune's 33-step proof of LOGTl. Input File for Studying LOGTl ••tOmuth.bondil). l«x([l,a,u(_,_),n<..._),«(_._).i(_),pp(_),npO]). usign(m*x_v*icht, 15). mssign(B&x_proof*, 36). u«lgn(Bax.B«s, 40000). uiign(pick_giv«n_r«tio, 6 ) . a»«lgn(h«»t, 2 ) . u«ign(r«port, 900). X s«t(r«ftlly_
X.
(x»y)«z - x» (y»z). l«x - x. x«l - x. i(x)»x - 1. x«i(x) - 1. i ( l ) - 1. iUCx)) - x. l(x«y) - i ( y ) « i ( x ) . n(x,x) - x. u(x,x) - x. o(x,y) - n ( y , x ) . u(x,y) • u ( y , x ) . n(x,n(y,z)) - n ( n ( x , y ) , z ) . u(x,u(y,z)) - u ( u ( x , y ) , z ) . •nd.of.liit. liit(»o»). u(n(x,y),y) - y. n(u(x,y),y) - y. x*u(y,z) ■ u(x«y,x*z). z*n(y,z) - n ( x » y , x » z ) . u(y,z)«x - u(y»x,z«x).
1510
Collected Works of Larry Wos
n(y,z)»x - n(y*x,z»x). pp(x) - u ( x , l ) . np(x) - n ( x , l ) . pp(»)«np(») ! - a. •nd_of_list. llst(passivs). pp(a)«np(a) !- a I $ANSWER(st*p.d33). i(q)*q*r !- r I $UI3UER(st*p.d01). u(n(q,r).q) I - q I tANSUHl(stsp_d02). u(q,n(r,q)) I- q I $ANSWER(«t*p.d03). n(u(q,r),q) !- q I *iHSUEK(st«p.d04). u(q,u(r,q)) !- u(r,q) I tANSWERdtap.dOS). u(q,n(q,r)) I- q I $ANSWER(st«p_d06). n(q,n(r,q)) !- n(r.q) I *AHSWER(»t«p.d07). u(q,u(q,r)) !- u(q,r) I »AJtSVER(«tap_d08). u(n(q,r),n(q,n(r,»))) !• n(q,r) I IAHSWER(.t.p.d09). u ( i ( u ( q , r ) ) * q , i ( u ( q , r ) ) * r ) !- 1 I «AHSWER(.t.p.dlO). n ( i ( n ( q , r ) ) « q , l ( n ( q . r ) ) « r ) I- 1 I lAMSWERdt.p.dll). n(u(x,x»x),u(x,l)) - x I *AMSWHl(stap_dl2). n(u(x.x*x),u(l,x)) - x I t*MSUn(st*p.dl3). u(l(tt(q,r))*q,l) ! - 1 I »AISVER(.t.p_dl4). u ( l ( u ( l . q ) ) , l ) !• 1 I ♦ANSWHK.t.p.dlB). u(l(u(q,r))»q»s,s) ' - * I UNSVHl(it.p_dl6). n ( l , l ( u ( l , q ) ) ) I- Ktt(l.q)) I «JKSVER(stsp_dl7) . n ( i ( n ( q , r ) ) * r , l ) !- 1 I UMSVERUt.p.dlS). n ( i ( n ( q , l ) ) , l ) !- 1 I *ANSVER(stap_dl9). u ( l , i ( n ( q , l ) ) ) I- Kn(q,l>) I «AHSKER(stap_d20). n ( u ( l , x ) , u ( x , x * x ) ) - x I *JUrSKER<«t«p_d21>. u ( a ( q , l ) , D ( q , i < u ( l , r ) ) ) ) !- n(q,l) I *ANSWER(st«p_d22). u ( i ( u ( q , r ) ) , i ( q ) ) I- i(q) I $ANSWER(stap_d23). u ( i ( u ( i ( q ) , r ) ) , q ) !- q I $ANSWER(stap.d24) . u ( i ( u ( q . l ( r ) » , r ) I- r I *ANSWER(.t.p.d25). u(q.i(u(r,i(q)>>) !- q I *AKSWER(«t.p.d26). n ( q , l ( u ( r , i ( q ) » ) !- i ( u ( r , t ( q ) ) ) I «ANSWER(stap_d27). u ( i ( q ) , i ( n ( q , r ) » !- l(n(q,r)) I *ANSVER(st«p_d28). u ( n ( q . l ) , i ( u < l , l ( q ) ) ) ) !- n(q,l) I $ANSWER(stap_d29). i ( n ( q , D ) !- u ( l , i ( q ) ) I $»NSVER<st«p_d30). n(u(q»r,r),u(r,l(q)*r)) !- r I $ANSVER(stap_d31). n(u(l,q).u(q.q«q>) !- q I *ANSVER(stap_d32). •nd.of.list. list(hot). X (x»y)»z - x« (y»z). n(x,y) - n(y,x). u(x,y) - u(y,x). Xn(n(x,y),z) - n(x,n(y,z)). % u(u(x,y),z) - u(x,u(y,z)). pp(x) - u ( x , l ) . np(x) • n ( x , l ) . i(x)*x - 1. x»i(x) - 1. u(n(x,y) ,y) - y. n(u(x,y) ,y) - y, x»u(y,z) - n ( x » y , x * z ) . x*n(y,z) ■ n(x*y,x*z). u(y,z)*x - u ( y * x , z » x ) . n(y,z)*x - n ( y » x , z » x ) . •nd.of.list.
A Quicker Proof of LOGT1 > UNIT CONFLICT »t 3148.67 sac > 23567 [binary,23565.1,86.1] $F. Length of proof is 32. Laval of proof is 13.
The Hot List Strategy
1511
PROOF
36 [] 37 [] 40 [] 41 [] 44,43 46,45 48,47 50,49 51 [] 64,53 56,55 63 0 64 □ 66,65 68,67 70,69 71 [] 74,73 76,75 78,77 82,81 84,83
u(n(x,y),y) - y. n(u(x,y),y) - y. u(y,z)«x - u(y«x,z«x). n(y,z)»x - n(y»x,z»x). [] (x«y)»z • x«y«z. [] l»x - x. [] x»l - x. [] i(x)«x - 1. x«i(x) - 1. [] 1(1) - 1. [] i ( i ( x ) ) - x. n(x.y) - n(y,x). u(x,y) - u(y,x). [] n(n(x,y),z) - n ( x , n ( y , z ) ) . [] u(u(x,y),z) - u(x,u(y,z>). [] u(n(x.y).y) - y. n(u(x,y),y) - y. [] x«u(y,z) - u ( x « y . x » z ) . [] x»n(y,z) - n(x«y,x»z). [] u(x,y)*z - u(x«z,y«z). [] pp(x) - u ( x , l ) . t] np(x) - n ( x , l ) .
85 [d«mod,82,84,76,78,46,48] n(u(a»a,a) ,u(», 1) ) !- a. 88 [para.into,69.1.1.1,63.1.1] u(n(x,y),x) « x. 94 (haat-1) tpara_into,88.1.1.1,37.1.1] u(x,u(y,x)) - u(y,x). 100 [para_froB,69.1.1,67.1.1.1] u(n(x,y),u(y,z)) - u ( y , z ) . 103,102 (haat-1) [para.into,100.1.1.1,37.1.1] u(x,u(x,y)) - u(x,y). 107 [para_into,71.1.1.1,64.1.1] n(u(x,y),x) - x. 111,110 [para_lnto,71.1.1,63.1.1] n(x,u(y,x)) - x. 123 [para_xroB,88.1.1,67.1.1.1] u(n(x,y),u(x,z)) - u(x,z). 141 [para_into,107.1.1,63.1.1] n(x,u(x,y)) - x. 152,151 [para_into.73.1.1,49.1.1] u(i(u(x,y))»x,i(u(x,y))»y) - 1. 154 (haat-1) [para_iroB,151.1.1,37.1.1.1] n ( l , l ( u ( x , y ) ) » y ) - i(u(x,y))»y. 307,306 [para_lrOB,123.1.1,141.1.1.2,daBod,66] n(x,n(y,u(x,z))) - n ( x , y ) . 818 [para_into,151.1.1.1.1.1,102.l.l.daaod,103,74,152] u ( i ( u ( x , y ) ) » x , l ) - 1. 833 (haat-1) [para_iroa,818.1.1,40.1.l.l.daaod,46,44,46] u(i(u(x,y))«x«z,z) - z. 866 [para_into,818.1.1.1,47.1.1] u ( i ( u ( l , x ) ) , 1 ) - 1. 931 [para_iron,866.1.1,306.1.1.2.2] n ( i ( u ( l , x ) ) , n ( y , l > ) - n ( i ( u ( l , x ) ) , y ) . 957 (haat-1) [para_xroa,931.1.1,36.1.1.1] u ( n ( i ( u ( l , x ) ) ,y) , n ( y , D ) - n ( y , l ) . 1413 [para.into,833.1.1.1.2,51.l.l.d.mod,48] u(l(u(x,y)) , i ( x ) ) - i d ) . 1416 [para_into,833.1.1.1.2,49.l.l.daaod,48] u ( i ( u ( i ( x ) , y ) ) , x ) - x. 1443 [para.into,1416.1.1.1.1,94.1.1] u ( i ( u ( x , i ( y ) ) ) , y ) - y. 1460 [para.into,1416.1.1,64.1.1] u ( x , i ( u ( i ( x ) , y ) ) ) - x. 1567 [para.iroa,1443.1.1,141.1.1.2] n ( i ( u ( x , l ( y ) ) > , y ) - i ( u ( x , l ( y ) ) ) . 1662,1661 [para_into,1413.1.1.1.1,88.1.1] u ( i ( x ) , i ( n ( x , y ) ) ) - i(n(x,y)>. 1684 [para.iroa,1413.1.1,141.1.1.2] n ( l ( u ( x , y ) ) , i ( x ) ) - i ( u ( x , y ) ) . 1705 ( h a a f 1) [para.into, 1684.1.1.1.1,36.1.1,daaod,70] n(l(x) , l ( n ( y , x ) } ) - i d ) . 1910 [para.into,1705.1.1.1,S3.1.1,damod,S4] n ( l , i ( n ( x , l ) ) ) - 1. 1933,1932 (haat-1) [para_froB,1910.1.1,36.1.1.1] u ( l , l ( n ( x , l ) ) ) - i ( n ( x , l ) > . 7237,7236 [para.into,957.1.1.1,1567.1.1] u ( i ( u ( l , i ( x ) ) ) , n ( x , 1 ) ) - n < x , l ) . 7252,7251 [para_froB,7236.1.1,1460.1.1.2.l.damod,68,1662,1933] i ( n ( i , D ) - u ( l , l ( x ) ) . 7302 [para.from,7236.1.1,lS4.1.1.2.1.1,daBod,7252,76,78,46,60,48,307,lll,7237,72S2,76,78,46 > S0,48] n ( u ( x , l ) , u ( l , i ( x ) ) ) - 1. 7338 (haat-1) [para.froB.7302.1.1,41.1.1.1,daaod,46,78,46,78,46] n(u(x«y,y),u(y,i(x)»y)) - y. 23565 [para.into,7338.1.1.2.2,61.l.l,d«B0d,56,56,56,56] n(u(x»x,x),u(x,l>) - x. 23567 [binary,23565.1,85.1] *F.
1512
Collected Works of Larry Wos
McCune's 33-Step Proof of L0GT1 > UNIT CONFLICT *t 19280.69 H C > 34171 [binary,34169.1,1674.1] 8F. Lanjta of proof i» 33. Laval of proof Is 10. PROOF 3,2 [] (x»y)«z - x»y»z. 5,4 [] l»x - x. 7,6 [] x«l - x. 8 CJ i(x)»x - 1. 10 □ x»l(x) - 1. 16,14 [] i(i(x)> - x. 20 t) u(x,x) - x. 22 C] n(x,y) - n ( y , x ) . 23 [) u(x,y) - u(y,x). 24 [) n(n(x,y),z) - n ( x , n ( y , z ) ) . 27,26 C] u ( u ( x , y ) , i ) - u ( x , u ( y , z ) ) . 28 0 u(n(x,y) ,y) - y. 30 (3 n(u(x,y) ,y) - y. 33,32 [] x«u(y,z) - u(x«y,x»z). 35,34 C] x*n(y,z) - n(x»y,x»z). 37,36 [] u(x,y)»z - u(x»z,y»z). 39,38 C] n(x,y)»z - n(x»z,y«z). 41,40 CJ pp(x) " u ( x , l ) . 43,42 [] np(x) - o ( x , l ) . 44 [d«Bod,41,43,35,37,6,7] n(u(a*a,&) ,u(a, 1)) !- a. 46,45 [para_xroa,8.1.1.2.1.1.1,d*nod,5] i(x)*x*y - y. 60 [para_into,28.1.1.1.22.1.1] u(n(x,y),x) - x. 62 [par*.into,28.1.1,23.1.1] u(x.n(y,x>) - x. 64 [para.into.30.1.1.1,23.1.1] n(u(x,y).x) - x. 70 [para.into,60.1.1.1,30.1.1] u(x,u(y,x)) - u ( y , x ) . 74 [para.lnto,60.1.1,23.1.1] u(x,n(x,y)) - x. 79,78 Cpara_frco,62.1.1.30.1.1.1] n(x,n(y,x)) - n ( y , x ) . 88,87 Cpara_into,26.1.1.1,20.1.1] u(x,u(x,y)) - u ( x , y ) . 107 [para_lnto,74.1.1.2,24.1.1] u ( n ( x , y ) , n ( x , n ( y , z ) » - n ( x , y ) . 120,119 Cpara.into,32.1.1,8.1.1] u(l(u(x,y))>x,l(u(x,y))«y) - 1. 134,133 tpara.lnto.34.1.1,8.1.1] n. 19584 [para.into,1783.1.1.2,2699.1.1] u ( n ( x , l ) , i ( u ( l , i ( x ) ) ) ) - n ( x , l ) . 19808 Cpara_froa, 19584.1.1,2567.1.1.2.l.draod,27,2649,1169] i ( n ( x , D ) - u ( l , l ( r ) ) . 20017 Cpara.froa,19808.1.1,45.1.1.1,daaod,39,6,3S,37,5,46,37,5] n(u(x»y,y) , u ( y . i ( x ) * y » - y. 34169 [para_lnto,20017.1.1.1.1,8.1.1,d«aod,15] n(u(l,x),u(x,x*x)> - x. 34171 [binary,34169.1,1674.1] $F.
The following input file can be used, with suitable modifications, to study Robbins algebra. It was used to prove VL47T5. The commented-out weight tern-
The Hot List Strategy
1513
plates illustrate, if comments are removed, how one can used the resonance strategy by keying on the positive proof steps from a related theorem. Input File for Studying Robbins Algebra sst(knuth_bandix). claar(aq_units_both_ways). sat(ind«x_for_back_deBOd) . sat (dynaBic_daatod_l8x_d8p). X aat(lax_rpo). sst(procaas.input). X sat (display.taraa). ast(input_soa_first). daar(print_kapt). claar(print_nav_daaod). clsar(print_back_daaod). assign(Bax_proofa, 2 ) . asaignCreport, 1800). assign(Bax_vsight, 30). assign(pick_givan_ratio, 3 ) . assign(Bax_aaa, 80000). X a»eign(haat,l). X assign(dyna«ic_haat_waight, 2 ) . assigsdax.diatinct.vars, 3). l a x ( [ a , b, c, d, «, f. g ( x ) , n(x), X lrpo_lr_status([*(x,x)]).
«-<x,x)]).
waight_list(pick_givan). X Following is hypothesis. X waight(EQC+(c,d),c),2). X Following ars positive staps from a 31-atap proof of +(c,c) " ct X tha union of c and c • c, aodifiad to not msntion constants. X wsight(EC.(n(f(n(+(x,y)) ,n(+(y.n(x))))),y),2). X waight(Eq(n(*(n(+(x,y)) ,n(f(n(y) ,x)))) ,x) .2). X waight(Eq(n(*(n(+(x,+(y,z))),n(*(z.n(+(x.y)))))).z).2). X waight(EQ(n<*(n(+(x,y)).n(-»(n(x),y)))),y).2). X waight(EQ(n(+(D(*(x,n(y)))>n(*(y,x)))),x),2). X walght(EQ(n(+(n(t(n(x),y)),n(t(y,x)))).y),2). X waight(EQ(n(*(n(*(x,t(y.z))),n(+(y,n(*(x,z))))))>y)>2). X waight(EQ(D(*(n(*(x,+(y,z))),n(+(x,n(*(z,y)))))),x),2). X waight<EQ(n(+(n(*(x,-»(y,z))),n(-Kn(+(x,z)).y)))).y),2). X walght(EQ(n(*(n(»(D) ,n(+($(l) ,n(«(l)))))) ,«(D) ,-2) . X walght(EQ(+(«U) ,♦(*■<» ,x)), HK1). x)),-l). X walght(EQ(n(*(t(l).n(*(n<«(l)),♦(«(!),n($(l))))))).n(«(l))).-3). X waight(EQ(n(*(n(+(*a),x)).n(+(n($(l)).+(x.n(*($(l),n($(l))))))))).x),-2). X waight(EQ(+(«(l),+<x,*(l))),♦(«(»,x)),-l). X waight(Eq(n(*(n(«
Collected Works of Larry Wos
1514
«i^it(E<)<*(x,+(x.+(x>+(x,x))))l + (x,t(i>*(x,t(x>x)))))t 600). X Following Is for tall stratagy. vaight(Eq($a),»(2», 1), X Folloving 1* for discarding tripla s. vaight(n(n(nU))>, 500). •nd.of.list. list(usabla). EQ(x,x). EQ(*(x,y),*(y,x)). EQC*(+(x,y).x),+(x, + ( y . z » ) . •nd.of.list. list(sos). Eq(n(»(n(+(x.y)),n(*(x,n(y))))),x). X Robbins axiom EQ(*(c,d),e). X hypothasis -EQ(+(n(+(a.n(b))).n(+(n(a),n(b)))),b). X denial of Huntington axiom and.of.list. list(passiva). -EQ(+(x,x),x) I *ANS(stap.thm). -EQ(n(n(n(n(a)*b)+n> I $ANS(st«p.m03). -EQ((n(n(n(n(a)+b)+n(a+b)+a)+n(b*a))),«) I »ANS(st»p.m04). -EQ<(n(n
X X X X X
list(daaodulators). EQ(*(x,y),+(y,x)). EQ(+(*(x,y).x),»(x.+(y.x))). EQ(n(*(n(*(x,y)).n(i-(x.n(y))))).x). and.of.list.
X X X X X X
list(hot). EQ(+(e,d),c). X nypothasls Eq(n(»(n(*(x,y)).n(+(x,n(y))))).x). EQ(+(x,y),+(y,x>). EQ(+(*(x,y),x),+(x.+(y,z))). and.of.list.
X Bobbins axiom
X Robbins axiom
In view of the historical significance of finding a proof of RAT§ without induction and without AC-unification, I include the following proof.
The Hot List Strategy
1515
A Historically Significant Proof of RATH > UNIT CONFLICT at 9770.64 Mc Laagth of proof i s 78.
> 48310 [biaary,48308.1,1.1] t/UfS<st«p_tttm) .
Laval of proof i s 16.
PROOF 1 [] -Eq(x*x,x) I *ANS(stsp_th«). 29,28 [] H}(x+y,y*x). 31,30 [] Eq((x+y)+z,x+y+z). 32 D EQ(n(n(x*y)tn(x-m(y))),x). 35,34 [] E q ( o d , c ) . 37 [para_iato,32.1.1.1.1.1,30.1.1,da«od,31] EQ(n(n(x*yfz)*n(x+y+n(z))),x+y). 39 [para.iato,32.1.1.1.1.1,28.1.1] Eq(n(n(x*y)+n(y+n(x))) ,y) . 41 [para_iBto,32.1.1.1.1,32.1.1] Eq(D(x+n(n(x*y)*n(n(x+n(y))))),n(x*y)). 43 [para.into,32.1.1.1.2.1.2,32.l.l.daaod,29] EQ(n(n(xty)*n(xtn(y*z)*n(y*n(z)))),x). 45 [para_iato,32.1.1.1.2.1,28.1.1] EQ(n(n(x+y)+n(n(y)*x)) ,x) . 52,51 [para_fro«,34.1.1,30.1.1.1] EQ(c*d*x,c+x). 61 [para.into,37.1.1.1.1.1.2,28.1.1] Eq(n(n(x+y+z)-m(x+z*n(y))),x+z). 63 [para.iato,37.1.1.1.1.1,28.l.l.daaod,31] EQ(n(n(x*y+z)+n(z+x+n(y))),z+x). 71 [para.from,37.1.1,32.1.1.1.2,daaod,29,31] BJ(n(x+y*n(n(x+y+z)+x+y+n(z))),n(x*y*z)). 75 [para.into,51.1.1.2,28.1.1] Eq(c*x+d,c+x). 79 Epsxa.froB.51.1.1,32.1.1.1.1.1] EQ(n(n(c»x)*n(c*n(d+x))),c). 81 [para.iato,75.1.1.2,30.1.1] Eq(ctxty+d,c+x*y). 90,89 [para.froa,75.1.1,30.1.l.l.daaod,31,31] EQ(cfx*d+y,c->x+y). 95 [para.into,39.1.1.1.1.1,34.1.1] EQ(n(n(c)+n(d+n(c))) ,d) . 97 [ p a r a . i a t o , 3 9 . 1 . 1 . 1 . 1 . 1 , 3 0 . 1 . 1 ] EQ(n(n(x+yfz)+n{z*n<x*y))) ,z) . 101 [para_into,39.1.1.1.1,39.1.1] EQ(n(x+n(n(x+n(y))+n(n(y+x)))),a(x+a(y))). 115 [para.into,39.1.1.1.2.1,28.1.1] EQ(n(n(xty)+n(n(x)*y)),y). 119 [para.iato.39.1.1.1,28.1.1] EO(n(n(x+n(y))+n(y+x)),x). 126,125 [para.froa,39.1.1,32.1.1.1.1] EQ(n(x*n(a(y+x)+n(aCx+n(y))))),n(y*x)). 131 [para.iato,41.1.1.1.2.1.1.1,28.1.1,das»d,126] EQ(a(x*y),n(y+x)). 166,165 [para_iato,131.1.1.1.30.1.1] Eq(a(x+y*z),a(z*x*y)). 173 [para_from,131.1.1,39.1.1.1.2.1.2,d»aod,31] EQ(n(n(x<-y+z)+a(z+n(y+x))),z). 195 [ p a r a . i a t o . 4 5 . 1 . 1 . 1 . 2 . 1 . 1 , 3 9 . 1 . 1 ] EQ(a(a(x*n(yH)+n(z+a(y)))-fn(z+x)) , x ) . 199 [para_iato,45.1.1.1.2,95.l.l.daaod,29,29] EQ(a(d+a(c»B(d+a(c)))) ,a(d*n(c))) . 203 [para_iato,45.1.1.1.2,45.l.l.daaod,166,29,29] EQ(n(x+n(y+x*a(n(y)»x))),n(n(y)*x)). 209 [para.iato,45.1.1.1,28.1.1] EQ(a(a(a(x)*y)*n(y*x)).y). 229 [ p a r a . i n t o , 4 3 . 1 . 1 . 1 . 1 , 9 S . l . l ] EQ(a(d*B(n(c)+a(n(d*n(c))*x)-Hi(n(d*n(c))*a(x)))),a(c)). 267 [para.into,115.1.1.1.1.1,30.1.1] EQ(n(n(x+y*z)+n(n(x*y)*z)),z). 325 [para_into,119.1.1.1.1.1,30.1.1] EQ(n(n(x+y+n(z))+a(z+x»y)),x*y). 329 [para_from,119.1.1,43.1.1.1.2.1.2.2,damod,166,29,29] EQ(n(n(x*n(y*n(r)))♦n(i+y+n(y*z ♦n(y«a(z))))).x). 361 [para_into,209.1.1.1.2.1,30.1.1] EQ(n(a(a(x)*y+z)*n(y*z+x)),y*z). 411 [ p a r a . i a t o . 7 9 . 1 . 1 . 1 . 1 . 1 , 2 8 . 1 . 1 ] EQ(n(n(x+c)*a(c«i(d*x))),c). 651 [para_iBto,411.1.1.1,28.1.1] EQ(n(n(c*n(d*x))+n(x+c)),e). 739 [para.into,81.1.1.2,28.1.1,daaod,31,90] EQ(c«x+y,c+y*x). 755 [para_lato,739.1.1.2,739.l.l.daaod,29] EQ(ctc*x«y,c*c*y+x). 862,861 [para_from,61.1.1,32.l.l.l.2,damod,29,31] Eq(a(x*y+n(a(x*z+y)+x*y+n(z))),n(x*z+y)). 889 [para_iBto,97.1.1.1.1.1.2.78.1.1,d«mod,31] EQ(a(n(x*c*y)tn(y*d*a(x*c))),y+d). 903 [para_iato,97.1.1.1.1.1,28.1.1,damod,31] Bq(n(a(x*y»z)«i(y+B(z+x))),y). 1000,999 [para.lnto,63.1.1.1.1.1.2,739.l.l.danod,31,31] EQ(o(a(x+e+y+z)+n(z+y+x4n(e))),z+y+x). 1047 [para.into,63.1.1.1,28.1.1] EQ(n(n(x+y+n(z))+a(y*z+x)),x+y) . 1829 [para.into,71.1.1.1.2.2.1.1.1.2,28.l.l.danod,862] EQ(a(x+y+z),a<x+z+y)). 1997 [para_into,173.1.1.1.2.1.2.1,75.1.1.damod,31] EQ(a(n(x+d+c*y)«n(y+a(c*x))),y). 2087,2086 [para.from,173.1.1,119.1.1.1.2,damod,29] EQ(n(x+a(n(x+n(y*z))ta(n(z*y+x)))), n(x*n(y+z))). 2410 [para.into,267.1.1.1.1.1.2,28.1.1] EQ(n(n(xtyfz)-fn(n(xtz)+y)),y). 2416 [para.iato,267.1.1.1.1.1,28.1.1,damod,31] EQ(a(B(x+y*z)*a(n(z*x)+y)),y). 3178 [para.iato,903.1.1.1.2.1.2.1,51.l.l.daaod,31] HKa(a(d+x+y+c)+a(y+a(c*x))),y). 3294 [para_iBto,101.1.1.1.2.1.1.1.2,131.1.1.damod,31,2087] EQ(n(x*n(y+z)),n(x*a(z+y))). 4032 [para.iato,2410.1.1.1.2.1.1,32.l.l.daood,29] EQ(n(n(i*y)*n(n(x-»z)*y+n(x*n(z)))),y). 4064 [para.iato,2416.1.1.1.2.1.1.1,75.1.1,damod,31] EQ(a(B(x+d*y*c)+B(a(c+x)+y)),y). 4887,4886 [para.into,165.1.1.1.2,28.1.1] Eq(a(x*y*z),a(y+xtz)).
Collected Works of Larry Wos
1516
5674 Cpar«_lnto, 1829.1.1.1,28. l . l . d u o d , 3 1 ] H}(n(x+y+z) ,n(z+y+x)). 7392 [para_lnto,1997.1.1.1.2.1.2.1,34.1.1] EQ(n(n(d+d*c*x)*n(xtn(c))),x). 7662 tp«ra_into,7392.1.1.1.1.1.2.2,28.1.1] EQ(n(n(d+d*x+c)+n(xtn(c))) , x ) . 7710 Cp*»_lnto,7562.1.1.1,28.1.1] H}
Next, for those interested in equivalential calculus, I give two short proofs. The first is the shortest of which I know that the formula XHN implies the formula UM, and the second is the shortest of which I know that the formula XHK implies the formula YQL. A Short Proof for XHN Implies UM > UIIT CONFLICT mt Length of proof la 19.
0.30 HC
> 38 Cbinary,37.1,6.1] UXSVBUM.UM).
Loral of proof i s 14.
PROOF
1 □ -P(o(x,y)> I -P(l) I P(y). 2 (] PCa(x.a(a
22 [hyper, 1,18,20] P(«(.(x.«(y.«(.(x,u),.(.(u,y).x)))),.(.(.(v.v),«(.(y,i),»)),.(v€,.(«(»7,»8). O(O(T«,»«>,T7)»»).
23 [hyper,1.21.2) P(e(.(.(x,y),e(e(y,e(x..(
The Hot List Strategy
1517
27 [hyp«r,l,26,22] P ( « ( « ( « < « ( x . « ( « ( y , z ) , « ( « ( z , x ) , y ) ) ) , u ) , u ) , « ( y . « ( « ( w , y 6 ) , « < a ( y 6 , y ) , w ) ) ) ) ) . 28 [hypar,1,22,27] P ( « ( a < « ( x , y ) . « < « ( y , a ( a ( a ( z . « ( « ( u , v ) , a < « ( y , z ) , u » ) ,v) , » ) > , x ) ) , • (v6,a<«
A Short Proof for XHK Implies YQL > UWT C0KFL1CT at 380.90 w
> 4789 [binary, 4788.1,3.1] 8AMSUEll(Pt_YQL> .
Langth of proof i i 23. Laval of proof i s 19. PK00F 1 U -P(a(z,y)) I -P(z) I P(y). 2 [] P(a(z.a(a(y,z),a(a(z,z).y)))>. 3 Cl - P ( a ( a ( a . b ) . a ( a ( c . b ) , a < a , e ) ) ) ) | lAMSVERCPl.YQL). 16 C] -P(a(z,y)) I -P(z) I PCjr). 17 [J P,y>)». 18 [hypar. 1,2,2] P ( a ( a ( * , y > . a ( a ( a ( z . a ( a ( u . y > , a ( a ( z . y ) , u ) ) ) . y ) , x ) ) ) . 19 (haat-1) ttyp.r.16,17,18) P(a(a(z.y),a(a(a(a(z.u).a(a(a(y,a(a(v.y6),a(a(v,v6),v)>),u),z)>,y) , z ) ) ) . 20 (haat-1) [hypar.16,18,17] P ( a ( a ( a ( z . a ( a ( y , z ) , a ( a ( x . z ) , y ) ) ) , a ( a ( u , y ) , a ( a ( w , y > , u ) ) ) , v ) ) . 21 [hypar,1,18,18] P(a(a(a(z.a(a(y,z) , . ( . ( z , z ) . y ) ) ) , a ( a ( a ( u , a < a ( y . v ) , a ( a ( u , y ) , v ) ) ) , v 6 ) ,»7)) , . ( y 7 , v 6 ) ) ) . 27 [hypar, 1,19,18] P ( a ( a ( a ( a ( z , y ) , a ( a ( a ( z , a ( a ( u , y ) , . ( . ( z , y ) , u ) ) ) , y ) , z ) ) , a ( a ( a ( v , a ( a ( y 6 , y 7 ) . a(a(v,y7).y6))),y8),y9)),a(y9,y8))). 35 (haat-1) [hypar,16,27,17) P ( a ( a ( a ( a ( a ( x , y ) . a ( a C a ( z , a ( a ( u , y ) , . ( . ( z . y ) , u ) ) ) , y ) , z ) ) , w ) , a(v6,a(a(y7,y8),a(a(y6.y8),y7)))).w)). 47 [hypar,1,21,20] P ( a ( a ( a ( x . y ) , a ( a ( a ( a ( z , a ( a ( u , y ) , a ( a ( z . y ) . u ) ) ) , v ) , y ) . » ) ) , » ) ) . 58 [hypar,1,35,47] P ( a ( a ( a ( a ( x , a ( a ( y , z ) , a ( a ( z , z ) , y ) ) ) , a ( u , a ( a ( v , v ) , a ( a ( u , v ) , y ) ) ) ) , a ( a ( a ( v 6 , a ( a ( y 7 , y 8 ) , a(a(y6.y8),v7))),v9),vl0)),a(yl0,y9>)). 77 [hypar,1.S8.21) P ( a ( z , a ( a ( a ( a ( y , a ( a ( z . u ) . a ( a ( y , u ) , z ) ) ) , z ) , a ( a ( y , w ) , a ( a ( y 6 . v ) , v ) ) ) , v 6 ) ) ) . 81 [hypar, 1,77.47] P ( a C a ( a ( a ( x , a ( a ( y , z ) , a ( a ( z , z ) , y ) ) ) , a ( a ( a ( u , y ) , a ( a ( a ( a ( w , a ( a ( y 6 , v 7 ) , a(a(v,v7).y6))),y8),y).u)).y8)),a(a(v9,yl0),a(a(yll,yl0),v9))),yll)). 91 D>ypar,1,20,81] P ( . ( . ( . ( x , y ) , a ( a ( z , a ( a ( u . y ) , a ( a ( z , y ) , » ) ) ) , x ) ) , y ) ) . 93 [hypar,1,77,91] P ( a C a ( a ( a ( x , a ( a ( y , z ) , . ( . ( x , z ) , y ) ) ) . . ( . ( . ( u . y ) , . ( . ( v . . ( . ( v 6 . v 7 ) , a(a(w,v7),y6))),u)),v)).a(a(y8,y9),a(a(yl0,y9),y8))),vl0)). 97 [hypar.1,91.47] P ( a ( a ( a ( a ( x , a ( a ( y , z ) , a ( a ( x , z ) , y ) ) ) , a ( a ( u , a ( a ( v , v ) , a ( a ( u , v ) , v ) ) ) , a ( v 6 , v 7 ) ) ) , v 7 ) , v 6 ) ) . 121 [hypar,1,20,93] P ( a ( a ( a ( z , a ( a ( y , z ) , a ( a ( u , z ) , y ) ) ) . u ) , z ) ) . 133 [hypar,1,121,121] P(a(x,a(a(y.z),a(a(a(a( , a ( a ( a ( a ( v , y 6 ) , a ( a ( y 7 , v 6 ) , v ) ) , v ) ,u))) , y 7 ) ) . 177 [hypar,1,97,154] P ( a ( a ( a ( x , y ) , a ( a ( z , y ) , z ) ) , a ( a ( u , y ) , a ( a ( z , y ) , u ) ) ) ) . 181 [hypar,1,121,177] P ( a ( a ( a ( a ( a ( z , y ) , a ( a ( z , y ) , z ) ) , u ) , z ) , u ) ) . 197 [hypar,1,121,181] P ( a ( a ( a ( z , y ) , a ( a ( a ( a ( z . u ) , a ( a ( y , u ) , z > ) , y ) , z ) ) , v » . 219 [hypar, 1,181,197] P ( a ( a ( a ( a ( z , y ) , a ( a ( z . y ) , x ) ) , a ( a ( z , u ) , v ) ) , a ( y , u ) ) ) . 240 [hypar, 1,18,219] Pta(a(a(z.a(a(y,2) , a ( a ( z , z ) , y ) ) ) ,a(u,v)) ,a(a(a(v,y6) , a ( a ( y 7 , v 6 ) , » ) ) ,a(a(v7,v) , u ) ) ) ) . 4776 [hypar, 1,219,240] P ( a ( a ( a ( z , y ) , a ( a ( j , u ) , a ( a ( y , v ) ,a(a(y,v> , v ) ) ) ) , a ( y ( x , u ) , z ) ) ) , 4788 (haat-1) [hypar, 16,4776,17] P ( a ( a ( z , y ) . a ( a ( z . y ) ,aCz,z)))) . 4789 [btnary.4788.1,3.1] *»)iSVE».(Pl_YQL).
To close this article, I give the shortest proof of which I know for LOGTZ.
A Short Proof of LOGT2 > UNIT CONFLICT at 3566.71 sac
> 19786 [binary, 19784.1,1.1] *ANSWER(step_thn).
Collected WorJcs of Larry Wos
1518
Lanfth of proof la 22.
Laval of proof i t 10.
PROOF I [] i(n(«,b)> •- u(i(a).l(b))l*ANSWER(atap_tha). 4 [] i(x)*x - 1. 5 [] x»i(x) - 1. 10 □ u(n(x,y),y) - y. II [] n(u(x,y),y) - y. 13 [] x«y«z - (x«y)«z. 16,14 [copy,13,flip.1] (x«y)»z - x»y»z. 17,18 □ l»x - x. 19,18 [] x*l - x. 20 [] l(x)»x - 1. 34 □ n(x.y) - n(y,x). 35 [] u(x.y) - u(y,x). 39 [] u(x,u(y,z)) - u ( u ( x , y ) , z ) . 40 [copy,39,flip. 1] u(u(x.y),z) - u ( x , u ( y , z ) ) . 46 □ u(n(x,y),y) - y. 61,60 □ x*u(y,z) - u(x»y,x»z). 64 O u(x,y)»z - u(x»z.y»x). 61 □ n(x.u(y,z)) - u(o(x,y) , n ( x , z ) ) . 66 [para_into,46.1.1.1,34.1.1] u(n(x,y),x) - x. 70,69 (b.aat-1) [ p a r a _ l n t o , 6 6 . 1 . 1 . 1 , l l . l . l ] u(x,u(y,x)) - u ( y , x ) . 73 [para_fro»,46.1.1,40.1.1.1.flip.1] u(a(x,y) , u ( y , z ) ) - u ( y , z ) . 76,76 (haat-1) Cpara_lnto,73.1.1.1,ll.l.l] u(x,u(x,y)) - u(x,y). 151,160 Cpara_into,50.1.1,20.1.1,flip.l] u(i(u(x,y))«x,i(u<x,y))«y) - 1. 164 [para.into,150. 1.1.1.1.1,75.l.l.danod,76,51,151] u(i(u(x,y))*x,1) - 1. 166 [para_into,lS0.1.1.1.1.1.69.1.1,daBOd,70,51,161] u ( i ( u ( x , y ) ) * y , l ) - 1. 342 [para_ijito,64.1.1.1,166.1.1,daaod,17,16,17,Uip.l] u(i(u(x,y))»y*z,z) - z. 344 [par*.into,64.1.1.1,164.1.1,daaod,17,16,17,flip.l] u(l(u(x,y))*x*z,z) - z . 364 (haat-1) [para_into.342.1.1.1.2,4.l.l.damod,19] u ( i ( u ( x , i ( y ) ) ) , y > - y. 366 (haat-1) [para.lnto,344.1.1.1.2,6.1.1,daaod,19] u(i(u(x,y)) , i ( x ) ) - i ( x ) . 371,370 (haat-2) [ p a r a _ i n t o , 3 5 6 . l . l . l . l , 1 0 . 1 . 1 ] u ( l ( x ) , i ( n ( y , x ) ) ) - i ( n ( y , x ) ) . 514 [para.into.344.1.1.35.1.1] u(x,i(u(y,z))*y*x) - x. 518 ( h . » t - l ) [para_into,514.1.1.2.2,4.1.1,daa>od,19] u ( x , l ( u ( l ( x ) , y ) ) ) - x. 526 (haat-2) [para_fro»,518.1.1,11.1.1.1] n ( x , i ( u ( i ( x ) , y ) ) ) - i ( u ( l ( x ) , y ) > . 599,598 [para.into.356.1.1.1.1,65.1.1] u ( i ( x ) , l ( n ( x , y ) ) ) • i ( n ( x , y ) ) . 1009 [para_fro»,354.1.1,61.1.1.2,flip.l] u ( n ( x , i ( u ( y , i ( z ) ) ) ) , n ( x , x ) ) - n ( x . z ) . 1033 [para.into,518.1.1,40.1.1] u ( x , u ( y , l ( u ( i ( u ( x , y » , z ) ) ) ) - u ( x , y ) . 13284 [para_into,1009.1.1.1,626.1.1] u(i(u(l(x),i(y))),n(x,y)) -n(x,y). 19784 [para_froa,13284.1.1,1033.1.1.2.2.1,daaod,371,599] Kn(x.y)) - u(i(x) ,i(y)). 19786 [binary, 19784.1,1.1] tANSWER(atap_tha) .
References
[1] Boyer, R., and Moore, J, A Computational Logic. New York: Academic Press, 1979. [2] Boyer, R. S., and Moore, J S., A Computational Logic Handbook, 2nd edn., New York: Academic Press, 1998 (also Web information ftp://ftp.cs.utexas.edu/pub/boyer/nqthm/index.html). [3] Henkin, L., Monk, J., and Tarski, A., Cylindric Algebras, Part I, NorthHolland, Amsterdam, 1971.
The Hot List Strategy
1519
[4] Kalman, J., "A shortest single axiom for the classical equivalential calculus", Notre Dame J. Formal Logic 19 (1978) 141-144. [5] Kalman, J., "Condensed detachment as a rule of inference", Studia Logica 42 (1983) 443-451. [6] Lusk, E., and Overbeek, R., The Automated Reasoning System ITP, Tech nical Report ANL-84-27, Argonne National Laboratory, Argonne, IL, 1984. [7] McCune, W., OTTER 2.0 Users Guide, Technical Report ANL-90/9, Ar gonne National Laboratory, Argonne, IL, 1990. [8] McCune, W., OTTER 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, IL, 1994. [9] McCune, W., "Solution of the Robbins problem", J. Automated Reasoning, accepted for publication, 1997. [10] Meredith, C. A., "Single axioms for the systems (C,N), (C,0), and (A,N) of the two-valued propositional calculus", J. Computing Systems 1 (1953) 155-164. [11] Veroff, R., and Wos, L., "The linked inference principle, I: The formal treatment", J. Automated Reasoning 8(2) (1992) 213-274. [12] Winker, S., "Robbins algebra: Conditions that make a near-Boolean algebra Boolean", J. Automated Reasoning 6(4) (1990) 465-489. [13] Winker, S., "Absorption and idempotency criteria for a problem in nearBoolean algebras", J. Algebra 153(1) (1992) 414-423. [14] Wos, L., Winker, S., Veroff, R., Smith, B., and Henschen, L., "Questions concerning possible shortest single axioms in equivalential calculus: An ap plication of automated theorem proving to infinite domains", Notre Dame J. Formal Logic 24 (1983) 205-223. [15] Wos, L., Winker, S., Veroff, R., Smith, B., and Henschen, L., "A new use of an automated reasoning assistant: Open questions in equivalential calculus and the study of infinite domains", Artificial Intelligence 22 (1984) 303-356. [16] Wos, L., Veroff, R., Smith, B., and McCune, W., "The linked inference principle, II: The user's viewpoint", in R. E. Shostak (ed.), Lecture Notes in Computer Science, Vol. 170, Springer-Verlag, New York, 1984, pp. 316-332. [17] Wos, L., Automated Reasoning: 33 Basic Research Problems, Prentice-Hall, Englewood Cliffs, NJ: 1987. [18] Wos, L., Overbeek, R., Lusk, E., and Boyle, J. Automated Reasoning: In troduction and Applications, 2nd edn., McGraw-Hill, New York, 1992. [19] Wos, L., "The resonance strategy", Computers and Mathematics with Ap plications (special issue on automated reasoning) 29(2) (1995) 133-178. [20] Wos, L., "Searching for circles of pure proofs", J. Automated Reasoning 15(3) (1995) 279-315.
1520
Collected Works of Larry Wos
[21] Wos, L., The Automation of Reasoning: An Experimenter's Note book with OTTER Tutorial, Academic Press: New York, 1996 (see http://www.mcs.anl.gov/people/wos/index.html for input files and infor mation on shorter proofs). [22] Wos, L., "OTTER and the Moufang identity problem", J. Automated Rea soning 17(2) (1996) 215-257. [23] Wos, L., "The power of combining resonance with heat", J. Automated Reasoning 17(2) (1996) 23-81. [24] Wos, L., "Automating the search for elegant proofs", J. Automated Rea soning, accepted for publication, 1997. [25] Wos, L., "Experiments concerning the automated search for elegant proofs", Technical Memorandum ANL/MCS-TM-221, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 1997. [26] Wos, L., "Experiments with the hot list strategy", Technical Memorandum ANL/MCS-TM-232, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 1997.
OTTER and the Moufang Identity Problem LARRY WOS* Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439-4844} U.S.A. e-mail: [email protected] (Received: 26 September 1995) Abstract. This article provides additional evidence of the value of using an automated reasoning program as a research assistant. Featured is the use of Bill McCune's program OTTER to find proofs of theorems taken from the study of Moufang loops, but not just any proofs. Specifically, the proofs satisfy the prop erty of purity. In particular, when given, say, four equivalent identities (which is the case in this article), one is asked to prove the second identity from the first, the third from the second, the fourth from the third, and the first from the fourth. If the proof that 1 implies 2 does not rely on 3 or 4, then by definition the proof is pure with respect to 3 and 4, or simply the proof is pure. If for the four identities one finds four pure proofs showing that 1 implies 2, 2 implies 3, 3 implies 4, and 4 implies 1, then by definition one has found a circle of pure proofs. By finding the needed twelve pure proofs, this article shows that there does exist a circle of pure proofs for the four equivalent identities for Moufang loops and for all orderings of the identities; however, for much of this article, the emphasis is on the first three identities. In addition—in part to promote the use of automated reasoning programs and to answer questions concerning the choice of options—featured here is the methodology that was employed and a discussion of some of the obstacles, some of which are subtle. The approach relies on paramodulation (which generalizes equality substitution), on demod ulation, and—so crucial for attacking deep questions and hard problems—on various strategies, most important of which are the hot list strategy, the set of support strategy, and McCune's ratio strategy. To permit verification of the re sults presented here, extension of them, and application of the methodology to other unrelated fields, a sample input file and four proofs (relevant to a circle of pure proofs for the four identities) are included. Research topics and challenges are offered at the close of this article. Key words: automated reasoning, OTTER, Moufang loops, circle of pure proofs. 'This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Computational and Technology Research, U.S. Depart ment of Energy, under Contract W-31-109-Eng-38. Reprinted from the Journal of Automated Reasoning, Vol. 17, 215-257, 1996, with kind permission from Kluwer Academic Publishers.
Collected Works of Larry Wos
1522
1
The Problem, the History, and the Objectives
Automated reasoning programs are now (in 1995) used to answer open questions and prove interesting theorems from a wide variety of disciplines; see [12, 13, 18, 19]. Two programs that have proved most useful in this context and that are readily available to researchers are Bill McCune's OTTER [10] and Robert Boyer and J Moore's Nqthm [1]. The fields in which successes have occurred include universal algebra, algebraic geometry, logic calculi, program verification, and chip design. At least in mathematics and in logic—with added insight regarding the use of OTTER—far more is within reach. One of the objectives of this article is to add to that needed insight by il lustrating various features of OTTER; see the sample input file and four proofs given in the Appendix. In that context, some light will be shed on making wise choices from among the numerous options offered by McCune's program. By ex ploring some of OTTER's options, this article will acquaint the researcher with various techniques for using this program as an assistant. Additional knowledge and understanding regarding OTTER and, more generally, regarding automated reasoning can be gleaned from consulting three books [17, 15, 23]. The first of the three books provides a thorough introduction to automated reasoning and, therefore, serves well for learning more about various inference rules, strategies, and procedures cited in this article, for example, pwramodulotion, the set of sup port strategy, and demodulation. The second book offers research problems in automated reasoning, and its fourth chapter provides a rather extensive review of the field. The third book (to be published shortly) contains a diskette of OT TER, demonstrates in detail how research in various fields was conducted with this program, and (in Chapter 8) gives guidelines for option choosing. This article has for a second objective the demonstration of the usefulness of OTTER to answer questions that have remained unanswered for many years; see the cited history given shortly. Such demonstrations may encourage researchers totally unfamiliar with automated reasoning to experiment with a reasoning program. The final and most pressing objective is to present the specific problem under study, the methodology for attacking the problem, and the resulting successes. With the objectives stated, next in order is the promised history. In the mid-1960s, I first heard (from Wayne Cowell, a colleague at Argonne National Laboratory) about Moufang loops [11], formally defined in Section 2. Informally, a Moufang loop is an algebraic structure in which multiplication is the operation; however, one's usual intuition is interfered with because certain expected laws do not hold. Indeed, reasoning about such structures is made more difficult because associativity is replaced by an identity of the following type, where multiplication is the implicit operator. Axiom, Moufang 1: (xy)(zx) = (x(yz))x
OTTER and the Moufang Identity Problem
1523
1 was informed that, in the presence of the axioms for a loop (given in Section 2), Moufang 1 is equivalent to Moufang 2, which is in turn equivalent to Moufang 3. Axiom, Moufang 2: ((xy)z)y = x{y(zy)) Axiom, Moufang 3: x(y(xz)) = ((xy)x)z Such in fact is the case. (A fourth equivalent Moufang identity, given later in this section, will also be of concern in this article; the fourth Moufang identity is the mirror image of the first Moufang identity with the lefthand and righthand arguments interchanged, and the third identity is the mirror image of the second.) When presented with n equivalent identities (such as the three Moufang identities just given), properties, or definitions, mathematicians and logicians sometimes seek an aesthetic property that can be termed purity of proof [22]. Intuitively (with the formal definitions given in Section 2), purity asks for n proofs showing that identity 1 implies 2, 2 implies 3, ..., identity n implies iden tity 1 such that, other than the hypothesis and conclusion, no identity among the n is present in a proof. For the (first) three Moufang identities, three proofs would be required. However, such was not available. Instead, the proof of the equivalence of the (first) three identities first showed that 1 implies 2, next that 2 implies 3, then—and here is the so-called flaw in the context of purity—that 3 implies 2, and finally that 2 implies 1. (To be faithful to the proof given by Bruck [2], I note that he proves Moufang 1 if and only if Moufang 2, then Mo ufang 2 if and only if Moufang 3; hence, purity is absent. See also [3].) In other words, the proof consisted of four, rather than three, subproofs. To show how the implied open question was answered—that focusing on the possible existence of three pure proofs for the equivalence of the (first) three Moufang identities—is central to this article. Without access to OTTER, such proofs might have remained out of reach. Indeed, although I made some effort in the mid-1960s to use an automated reasoning program to study Moufang loops, too little power was at that time offered by such programs. Clearly, as one learns here, the situation has changed, in part because of the significant advances in the design of such programs—OTTER is a fine example— and in part because of the formulation of powerful strategies. Of the strategies that did not exist in the mid-1960s, the ones that play the more important role in this article are McCune's ratio strategy [10, 23] and the hot list strategy [20, 22, 23]. Both for the ordering 1, 2, 3 (of the Moufang identities) and the ordering 1, 3, 2, this article heavily emphasizes showing how the needed proofs were obtained. However, after learning of the fourth Moufang identity (given almost immediately), I tested the methodology (featured here) by successfully applying it to finding the needed twelve pure proofs, the twelve pure proofs that show that all orderings of the four Moufang identities admit a circle of pure
Collected Works of Larry Wos
1524 proofs;
Section 8.
Axiom, Moufang 4: x((yz)x) = (xy){zx). (Only a historical accident caused the emphasis to be placed on the first three identities.) Before turning to the formal elements of this article, one might wish the answers to three questions. How does the methodology used here relate to that used in the study of pure proofs for the thirteen shortest single axioms for equivalential calculus [22]? Why are Moufang loops of interest? What makes the property of purity significant? Regarding the first question, in [22], the object is to produce (if possible) a circle of pure proofs for some ordering of all thirteen shortest single axioms for equivalential calculus. Where that area of logic is not concerned with equality, the study of Moufang loops is, thus introducing various complications in that the use of demodulation is virtually required; see Section 3. Naturally, therefore, a different inference rule was used; instead of hyperresolution, paramodulation (which generalizes equality substitution) was used. Regarding strategy, the set of support strategy played a key role here, the hot list strategy and the ratio strategy played a key role here and in the study of equivalential calculus, but level saturation and the dynamic hot list strategy were not used here. For this study of Moufang loops, the methodology did not rely on a phase designed to see how many of the sought-after conclusions could be deduced given an equation as hypothesis. Such a phase was included in the equivalential-calculus study because many possible orderings of the thirteen shortest axioms exist and the idea was to partially determine which were more likely to succeed in yielding a circle of pure proofs. As for the second question (for whose answer I thank Ken Kunen), Moufang loops are of interest in part because of their relation to group theory, which itself is of substantial interest to mathematicians and physicists. For example, every subloop generated by two elements is in fact a group. For a second example, every subloop generated by three elements such that (ab)c = a(bc) is a group. Because of having so many subloops that are groups, much nontrivial structure is present in a Moufang loop. Chein found all 159 non-group Moufang loops of order less than 64. An attempt at finding all such non-group Moufang loops of order 64 may be computationally difficult indeed. Various open questions are of interest. For example, does every finite Moufang loop contain Sylow P-subgroups for all primes p that divide its order? (For those interested in loops and related structures such as quasigroups, see [4, 5, 6, 14]; the two papers by Fenyves might provide interesting ideas for future research in automated reasoning, and the other two cited references provide a good picture of the modern uses of loops, Moufang loops, and quasigroups, each formally defined in Section 2.) Regarding the third question, purity of proof is related to proof elegance, as are proof length and proof structure (with respect to the type of term that is
OTTER and the Moufang Identity Problem
1525
present or absent). An interest in purity is also distantly related to an interest in the independence of axioms that characterize a set of structures. For example, in group theory, one can dispense with the axioms of right inverse and right identity, for they are provable from the remaining axioms. Then, to perhaps add to one's intuition concerning the significance of purity of proof, one might keep in mind the sometimes-expressed preference for proofs that avoid the use of certain lemmas (such as the inverse of the inverse of x equals x) or avoid the use of a law such as commutativity, and the seeking of a proof avoiding some type of term (such as n(n(t)) for any term t, where n denotes negation). Finally, though only distantly related, an interest in purity (to me) is somewhat reminiscent of the seeking of single axioms for some variety, for example, the seeking of a single axiom for all groups such that (for all elements x) the 17th power of x equals the identity e. It must be noted that the notion of pure proof is indeed sensitive to the particular formal framework that is in use.
2
Definitions and Notation
A quasigroup is a set in which multiplication is denned and in which unique left and right solution exist. For example, regarding right solution, for all x and all y, there exists a unique z such that xz = y. A loop is a quasigroup in which an identity exists, a 1 with lx = xl = x. A Moufang loop is a loop satisfying any one of the four equivalent Moufang identities given in Section 1; of course, satisfaction of one of the four means satisfaction of all of the four, the following given in a notation acceptable to the program OTTER. (When a line contains a "%", the characters from the first "%" to the end of the line are treated by OTTER as a comment.) '/, Axiom, Moufang (x * y) * (z * x) '/, Axiom, Moufang ((x * y) * z) * y '/, Axiom, Moufang x * (y * (x * z)) '/, Axiom, Moufang x * ((y * z) * x)
1: = (x * (y * z)) * x. 2: = x * (y * (z » y)). 3: - ((x * y) * x) * z. 4: - (x * y) * (z * x ) .
The Moufang identities just given are expressed in one of the notations accept able to OTTER, McCune's automated reasoning program used for the experi ments and the successes reported here. Although such expressions, in the strict sense that I normally use the term, are not clauses (see [15, 17]), throughout this article, I shall be cavalier and use the term clause loosely to include such expressions. Also, I am somewhat cavalier when referring to a Moufang identity, sometimes meaning the equation
1526
Collected Works of Larry Wos
and sometimes meaning its encoding for OTTER. If Moufang 1, 2, 3, and 4 were expressed in clause form (in the notation I often use), they would appear in the following manner; of course, as expected, one could use EQUAL as the predicate. '/• Axiom, Moufang 1: EQ(prod(prod(x,y),prod(z,x)),prod(prod(x,prod(y.z)),x)). '/. Axiom, Moufang 2: EQ(prod(prod(prod(x,y),z),y),prod(x,prod(y,prod(z,y)))). '/, Axiom, Moufang 3: EQ(prod(x,prod(y,prod(x,z))),prod(prod(prod(x,y),x),z)). '/. Axiom, Moufang 4: EQ(prod(x,prod(prod(y,z),x)),prod(prod(x,y),prod(z,x))).
When equality is present in a problem, typically I choose as the inference rule paramodulation [15, 17] because this inference rule builds in equality-oriented reasoning (in fact, it generalizes equality substitution). If that is the choice— because paramodulating from or into nonunit clauses can sharply reduce the effectiveness of a program—I prefer a set of unit clauses, clauses (or their equiv alent) free of the (logical) or symbol, in OTTER the symbol "|". Logical not is denoted in OTTER by " -". The following units can be used to study Moufang loops, using any one of (or all four) Moufang identities. x » x. x * rs(x,y) " y. 7. right solvable '/, right solution is unique (implies left cancellation) rs(x, x * y) » y. '/, left solvable ls(x,y) * y = x. ls(x * y , y) » x. 7, left solution is unique '/, (implies right cancellation) 1 * x ■ x. 'I, left identity x * 1 » x. V. right identity (x * y) * (z * x) - (x * (y * z)) * x. '/. Axiom, Moufang ((x * y) * z) * y - x * (y * (z * y)). '/, Axiom, Moufang x * (y * (x * z)) « ((x * y) * x) * z. '/, Axiom, Moufang x * ((y * z) * x) - (x * y) * (z * x). '/, Axiom, Moufang
1 2 3 4
The study reported here relied on the use of the first seven cited units and, taken one at a time, a Moufang identity; until Section 8, the focus is on the first three identities. The researcher interested in studying Moufang loops with the aid of OTTER might begin by proving the following useful property of inverses. For all x, there exist a left and a right inverse of x, and, further, they are equal. Next, one might prove that cancellation, left and right, follows from the given units; see
OTTER
and the Moufang Identity
Problem
1527
the sample input file given in the Appendix for the two laws. Finally, one might explore the use of cancellation in place of the uniqueness units. If the cancellation laws are employed, which take the form of nonunit clauses, then the inference rule UR-resolution [15,17] is used in addition to paramodulation. Different from the approach taken in this article, Kunen studied Moufang loops briefly, using unit clauses for left and right inverse (not assuming the two are equal), for left and right identity, and (one at a time) for one of the first three Moufang identities, and using two nonunit clauses for (extensions of) the cancellation laws. Regarding the presence of the unit clause x = x (reflexivity of equality), standard practice strongly recommends its inclusion when paramodulation is in use (explicitly or, as with a Knuth-Bendix approach, implicitly), for proofs often terminate with the deduction of a statement of the form a != a for some a where a is a constant. Next, needed are the definitions of circle of proofs and pure proof, for the problem to be featured asks one to find, if such exists, a circle of pure proofs for the (first) three Moufang identities. D e f i n i t i o n , circle of proofs. For a set of k equivalent elements—formulas, equations, properties, conditions, or definitions—a circle of proofs is a sequence of proofs such that the first proof shows that the first element implies the second, the second proof shows that the second element implies the third, ..., and the k-th proof shows that the fc-th element implies the first. D e f i n i t i o n , p u r e proof w i t h respect t o a set of e l e m e n t s . For a set of elements—formulas, equations, properties, conditions, or definitions—a proof of element j from element i is pure with respect to the set of elements if and only if it does not rely on the use of any of the elements but the j - t h and the i-th. The presence of a proper instance of an element other than the j - t h or i-th does not render the proof impure. If such instances are absent, by definition, the proof is instance pure. Further, if none of the deduced steps contains as a proper subterm an instance of an unwanted element, by definition, the proof is subterm pure. In response to a natural query, note that purity is not lost when the proof contains a n element derived from that which is unwanted. For example, if Mo ufang 3 is unwanted, purity is not lost in the presence of an equation that is derived from Moufang 3 by multiplying one of its terms by the identity 1. For a second and more interesting example, purity is not lost if one of the intermediate steps (resulting from demodulation) is an equation derivable from Moufang 3 by applying left division. Somewhat related, in Section 3.1, the subtlety of purity in the context of reasoning backward (from the denial of the desired conclusion) is addressed with an example.
Collected Works ofL&rry Wos
1528
3
Obstacles and Subtleties
Ordinarily my approach to attacking a new problem is to make one or more runs and see what I can learn by examining the corresponding output file(s). For but one example, I might discover that a type of term is present in many, many clauses and conjecture that such terms, and hence the clauses containing them, are best purged immediately upon generation. In this study, however, I first discussed the problem with my colleague McCune, for I thought it likely that various obstacles could be recognized and circumvented early. Indeed, our discussions did help to identify certain obstacles and to shape the approach (presented in this article) that was subsequently used to obtain the desired proofs. Also, though indirectly, those discussions set the stage for and perhaps even dictated the number and nature of the experiments central to this article; see Section 4 in relation to Sections 3.1 and 3.2. 3.1. POSSIBLE APPROACHES
Although finding any proof regardless of its properties might prove more than challenging, the most obvious—and perhaps most formidable—obstacle is the requirement of purity. Three approaches to deriving pure proofs are possible: an approach based on a forward search, using the denial of the conclusion only to detect proof completion; an approach based on a backward search, using the axioms and hypotheses of the theorem only to complete applications of an inference rule; and an approach based on a bidirectional search, reasoning both forward and backward. In [22] where the focus is on circles of pure proofs for thirteen shortest single axioms of equivalential calculus, forward proofs are a virtual necessity in that condensed detachment is the inference rule in use (from the viewpoint of logic). (In the study of this area of logic, a backward search is ordinarily impractical, for reasoning backward from the denial of some desired conclusion typically forces the program to cope with very long clauses.) However, for the current study of Moufang loops, having the program produce a proof relying solely on reasoning forward—thus making its examination for purity straightforward (on the sur face) although perhaps painful—might be impractical from the perspective of automated reasoning. (Indeed, one learns in Section 3.2 that, at least when de modulation is used, one cannot merely examine the deduced steps of the proof to see if an unwanted equation is present and, if not, then accurately claim that the proof is pure; in particular, an unwanted equation might be present as an in termediate step that results from demodulation.) In fact, as frequently occurred in the experiments featured here, a strictly forward search failed to yield any proof. (However, if one insists with the understanding that no proofs may be completed, one can instruct OTTER to make a forward search by placing the negation of the target equation in the passive list of clauses. Clauses in that list are used only for detecting unit conflict, which signals proof completion, and for
OTTER and the Mouf&ng Identity Problem
1529
applying forward subsumption.) If a forward proof is out of reach, one is then forced to instruct the program to seek a backward proof or a bidirectional proof. The following syntactic example of a backward proof (included simply for illustration) shows that a mere reading of the proof does not suffice where purity is concerned. The axioms of the example are, in clause form, (1) through (3). (1)
P.
(2)
-P I Q.
(3)
-Q | R.
The theorem to prove asserts that P implies R. The following proof is one of contradiction, beginning with clause (4), the negation or denial of the conclusion of the theorem, and reasoning backward. (4)
-R.
The assignment is to produce a pure proof, pure with respect to the axiom represented with clause (5). (5)
Q.
The proof uses clause (4) to deduce clause (6), which is then used to deduce clause (7), which contradicts clause (1). (6) (7)
-q. -P.
The claim that the proof consisting of clauses (1) through (4) followed by clauses (6) and (7) is pure with respect to clause (5) is, as my colleague McCune has (in effect) pointed out, total sophistry. Of course, the backward proof does not explicitly contain clause (5), but, if it is transformed into a forward proof, clause (5) is indeed present. Put another way, clause (5) is "really" present in the proof, even if not explicitly. One thus sees that the cited obstacle of finding a proof that is indeed pure is subtler than it might first appear. The third possible approach, a bidirectional proof, proceeds forward and backward, using the negative equality (arising, from assuming, say, that Moufang 2 is not deducible from Moufang 1) to deduce additional negative equal ities. Unfortunately, when a bidirectional search produces a proof—just as is the case when a backward search produces a proof—one must transform the proof into a strictly forward proof to be then checked to see that an unwanted equation is not present. Such checking is error prone if done by hand; one can, of course, have OTTER do the work instead, but some effort and some additional instructions to the program will be required. I shall give an example of how to proceed in that regard at the close of Section 7.
1530
Collected Works of Larry Wos
3.2. T H E MEANS TO SEEK PURITY
Independent of the approach taken, there is the problem of devising a method (for the program to apply) to avoid the use, implicitly or explicitly, of some unwanted equation. OTTER offers two mechanisms to block the use of some unwanted equation (or clause): weighting [8, 15, 17] and demodulation [15, 17]. The following command and weight Jist illustrate how weighting is used. assign(max_veight,20). weight_list(purge_gen). '/, Blocking use of Moufang 3. weight((x * (y * (x * z)) - ((x * y) * x) * z),1000). end_of_list. The inclusion of the cited command and weight Jist in the input file will prevent OTTER from retaining the clause equivalent of Moufang 3, for it will be as signed a purge-gen weight strictly greater than the max-weight. (The purge_gen weight is used for deciding whether or not to retain a clause, in contrast to the pick-given weight which is used to decide where next to focus attention. The max.weight places an upper bound on the weight, or priority, of retained clauses. When no weight template applies, the purge-gen weight and the pick-given weight are measured solely in terms of symbol count.) Because variables in a weight template are treated as indistinguishable, the use of weighting to block, say, the retention of Moufang 3 will also block the retention of any clause that resembles Moufang 3, where the differences rest with the particular occurrences of variables. Whether or not the preceding property is a disadvantage, there does exist a far more serious problem in using weighting to seek pure proofs if, because of other considerations, demodulation is also in use. (The use of demodulation might be virtually required to prevent the program from retaining a huge number of clauses that would otherwise be simplified and then purged with subsumption.) Regarding the more serious problem, for but one example, the program may apply an inference rule to a set of hypotheses, then apply a sequence of de modulators, and one of the intermediate clauses (before the demodulation is complete) may be Moufang 3. Weighting would have no effect in the situation under discussion. In such an event, although Moufang 3 would not be explicitly present in the proof, it would be used implicitly; therefore, the proof (in a true sense) would not be pure with respect to Moufang 3. In place of weighting to block the use of an unwanted equation, such as Moufang 3, OTTER offers demodulation used in the following manner. list(demodulators). 'A Blocking use of Moufang 3. EQ((x * (y * (x * z)) - ((x * y) * x) * z ) , $T). Eq((((x * y) * x) * z - x * (y * (x * z ) ) ) , $T). end_of_list.
OTTER and the Moufang Identity Problem
1531
(Note that this use of demodulation focuses on demodulating an entire clause, in contrast to the far more common use that focuses on a term. Also note that, rather than a single demodulator, two demodulators are included, one to cope with Moufang 3 as given in this article, and one to cope with the case in which its arguments are interchanged.) When OTTER demodulates a clause to $T, true, the clause is automatically purged. Just as the use of weighting to seek pure proofs has pitfalls, so also does the use of demodulation. With demodulation as the means for blocking the participation, explicitly or implicitly, of an unwanted equation, three pitfalls exist. The first pitfall, easily avoided but commonly encountered, concerns the orientation of demodulators. With OTTER, one can give a lexical ordering in the following way. lex(t$T,a>b,c>d,e,l,_*_,rs(_,_),ls(_,_),-(_,_)]). As given, $T is treated as lighter than other terms. Were one to err (as I in fact did at one point early in the study) and place $T at the righthand end of the list, then the cited equalities (for blocking the use of Moufang 3) would not apply. The second pitfall concerns the use of back demodulation, a procedure that is so often of value in increasing the effectiveness of a program. To instruct the program to back demodulate clauses stored in its database—which means to rewrite by simplifying and canonicalizing each, if possible, with every newly adjoined demodulator—one uses the following command. set(back_demod). In the presence of this command, when a newly retained clause is made into a demodulator, the program attempts to apply it to all clauses currently retained. The key to the possible pitfall is contained in the modifier "all", for, indeed, even existing demodulators can be back demodulated. Therefore, if misfortune occurs, the two input demodulators (used to block the retention of Moufang 3) can be back demodulated. As a matter of history, in some of the experiments preceding those reported in this article, such occurred, necessitating the commenting out of the command, thus avoiding the use of back demodulation. The final pitfall concerns the choice of demodulating inside out or outside in. The default in OTTER is inside out, correctly suggesting that, in the majority of cases, that is the preferred choice. However, such demodulation can cause the program to present a proof in which, say, Moufang 3 is implicitly present. Indeed, when a sequence of demodulators is being applied inside out, one of the intermediate clauses might in fact be the clause equivalent of an unwelcome equation, the equation whose presence prevents the proof from being pure. At that point, it may well happen that neither of the two cited equalities (used to block, for example, Moufang 3) gets applied, because of the inside-out path. The appropriate action to take is to include the following command. set(demod_out_in).
1532
4
Collected Works of Larry Wos
Theorems and Experiments
Two possible circles of pure proofs exist for the (first) three equivalent Moufang identities; the fourth Moufang identity comes into play in Section 8. Circle 1 (for the first three identities), if it exists, consists of three proofs: a proof that Moufang 1 implies Moufang 2, a proof that is pure with respect to Moufang 3; 2 implies 3, pure with respect to 1; and 3 implies 1, pure with respect to 2. In other words, for Circle 1, the identities are ordered 1, 2, 3, 1. The only other possible order is 1, 3, 2, 1, and corresponding to that order is Circle 2. Therefore, to explore the possible existence of either of the two circles, there exist six possibly true theorems to attack. For each of the six purported theorems—taking into account the discussion of Section 3 and my continued interest in the use of strategy—a sequence of five experiments are in order. For each of the six purported theorems, the first exper iment was designed to determine whether any proof can be found by OTTER, ignoring purity. This program offers a wide variety of options and strategies. However, at least currently, no algorithm or metarules exist for making effective choices; experience and intuition are generally the basis for option choosing. (In [23], Chapter 8 does offer some biased guidelines in that regard.) The fact that a theorem has already been proved by some means (typically by a researcher) is, at this time (in 1995), no guarantee that a proof of it is within range of an automated reasoning program. Since for all six theorems OTTER produced a proof, the other four experi ments (in the sequence of five) each focused on finding a pure proof. The second experiment (conducted mainly out of curiosity) focused on the use of weighting to block the unwanted Moufang identity. Again, in all six cases, the second ex periment produced a proof, but a proof not guaranteed to be pure. Therefore, as discussed in Section 3, to guarantee that proofs produced are pure, weighting was replaced by demodulation in the third experiment. The third experiment succeeded in all six cases. Actually, as already discussed, a caveat must be is sued: In particular, if the proof is bidirectional, then the lack of purity may be hidden within that part of the proof that proceeds backward from the assumed falseness of the theorem. In such an event, either some hand computation is needed, or some not-so-standard use of OTTER is required. In either situation, misfortune is possible in that purity may be absent from the bidirectional proof. In all six cases, a proof was produced by the third experiment, and therefore, for the fourth and fifth experiments, efficiency became an equal concern. The fourth experiment explored the use of the hot list strategy [20, 22, 23] to see what effect its use would have on efficiency and on proof length. Because the fourth experiment succeeded (in all six cases), the fifth experiment explored a heavier use of the hot list strategy. The fifth experiment in all cases succeeded. However, as it turned out, in four of the six cases (in the context of Experiment 5), an attempt at a (strictly) forward proof failed, necessitating action to (so to speak) convert a bidirectional proof into a forward proof; see Section 7 for
OTTER and the Moufang Identity Problem
1533
the actions that were taken. (Incidentally, I chose not to be concerned about converting bidirectional proofs for any of Experiments 1 through 4 because purity was proved for all six theorems in the context of Experiment 5, after the converting procedure was applied to the four cases where it was needed; see Section 7 for the procedure.)
5
Successes and Options
To gain some understanding and some insight into the options that were used and the significance of the changes in them within a sequence of experiments, some background information is in order. In addition to providing a clearer picture of how the problem focusing on the (first) three Moufang identities was solved, this background information is intended to aid researchers in the use of OTTER in totally unrelated areas, to show how decisions were made, and to (in part) satisfy the frequent request for giving guidance for making choices from among the numerous options offered by McCune's program. Although (as Ken Kunen did in some successful experimentation [7]) one can include cancellation laws among the axioms, my preference is for a repre sentation in which no nonunit clauses (or their equivalent) are present. This preference is consistent with my goal of demonstrating that the use of an auto mated reasoning program often requires little guidance in the context of adding various lemmas. Because equality is the relation, paramodulation is the choice, a choice that (because of efficiency considerations) asks for the inclusion of few or no nonunit clauses in the input. (Whether a Knuth-Bendix approach, which re lies on paramodulation, is effective is left to future research, possibly by others.) Also different from Kunen, no input demodulators were included with the pur pose of using them for canonicalization, for I had no conjecture regarding which would be useful. Instead, OTTER was instructed to set(process.input), a com mand that makes some or all input clauses into demodulators. To permit new demodulators to be adjoined during the run, the command set(dynamic.demod) was included. The preceding two commands were included because of the con viction that efficiency would be severely decreased without the presence of de modulators. As expected from the discussion of Section 3, set(demod.outJn) was the command, to instruct demodulation to be from the outside in (rather than inside out). As those familiar with my views expect, I emphasized the use of strategy: Three were used, part or all of the time. With the command assign(pick_given. ratio,3), McCune's ratio strategy [10, 17, 23] was used, chosen because numer ous experiments in various areas strongly suggest that the likelihood of suc cess is increased when some of the reasoning is driven by focusing on complex clauses retained early in the attack. With this command, OTTER was instructed to choose (for the focus of attention to direct the program's reasoning) three clauses by weight (symbol complexity, in this case), one by first come first serve
1534
Collected Works of Larry Wos
(or breadth first), then three, then one, and the like. The ratio strategy blends (subject to the assignment of the pick-givenj-atio parameter) a bit of level sat uration with (usually more) choosing of clauses for the focus of attention by weight. More important to the research was the use of the hot list strategy, chosen to cause the program to immediately consider the clauses in the hot list with each newly retained clause. Use of the hot list strategy typically requires the researcher to choose from among the input statements that present the question or problem those that are conjectured to merit immediate visiting (as hypothe ses) and, if the value assigned to the heat parameter is greater than 1, even immediate revisiting. The chosen statements are placed in the (input) hot list, and they are used one at a time or in combination (depending on the infer ence rule being applied) to complete inference rule applications, rather than to initiate applications. Members of the hot list are not subject to any restriction regarding unit and nonunit clauses. Potentially powerful choices for members of the hot list include the elements of the special hypothesis and elements of the initial set of support. (The special hypothesis of a theorem of the form if P then Q refers to that part of P, if such exists, that excludes the underlying axioms and lemmas. For example, in the study of the theory that asserts that rings in which the cube of x = x are commutative, the special hypothesis consists of the equation xxx = x.) Also, when an inference rule in use relies on the use of some clause as a nucleus (as in hyperresolution), then placing a copy of such a clause in the hot list is often profitable. Further, if a needed nucleus is absent from the hot list, then (with the rarest exceptions) the program cannot use the hot list strategy for that case, for the strategy requires that, other than the ini tiating clause, all remaining clauses used in an application of the corresponding inference rule be present in the hot list. The exceptions occur when a nucleus is used to initiate an application of an inference rule, which is indeed rare in the experiments with which I am familiar. When the program decides to retain a new conclusion, before another con clusion is chosen from list(sos) as the focus of attention to drive the program's reasoning, the hot list strategy causes the new conclusion to initiate applica tions of the inference rules in use, with the remaining hypotheses (or parents) that are needed all chosen from the hot list. Members of the hot list, whether input or adjoined during a run, are not subject to back demodulation or back subsumption; in other words, clauses in the hot list are never rewritten with the discovery of new demodulators (rewrite rules), nor are they purged because of being captured by a newly retained clause. If one wishes the program during the run to adjoin new members to the hot list, one uses the dynamic hot list strategy [20, 22] (which McCune formulated as an extension of the hot list strategy); see Section 6.3. If that is the choice, then one assigns an integer, positive or negative, to the dynamicJieat.weight parameter with the intention that conclusions that are retained and that have a pick-given weight (complexity) less than or equal to the assigned value will be
OTTER and the Moufang Identity Problem
1535
adjoined to the hot list. (A clause, or its equivalent, technically has two weights, its pick-given weight which is used in the context of choosing clauses as the focus of attention to drive the program's reasoning, and its purge-gen weight which is used in the context of clause discarding; often the two weights are the same.) The dynamic_heat_weight assignment places an upper bound on the pick-given weight of clauses that can be adjoined to the hot list during the run. When the heat parameter is assigned the value 1, conclusions that result from consulting the hot list (whether one is using the hot list strategy or the dynamic hot list strategy) have heat level 1. (By definition, input clauses have level 0, and a deduced clause has a level one greater than that of its parents.) If the program decides to retain a conclusion of heat level 1 and if the (input) heat parameter is assigned the value 2, then before another conclusion is chosen as the focus of attention, the heat-level-1 conclusion is used to initiate the search for conclusions of heat level 2 (with the hypotheses needed to complete the cor responding application of the inference rule chosen from the hot list). The heat parameter can be assigned to whatever positive integer the researcher chooses, with the objective of placing corresponding emphasis on the immediate use of the members of the hot list. If the parameter is assigned the value 0, then the hot list strategy will not be used. Finally, as expected if one is familiar with my research, the set of support strategy [15, 17] was used, contributing significantly to program efficiency (see Section 6.1). (This strategy was chosen and used in the manner discussed here to prevent the program from exploring the entire theory of loops.) Because attempts at completing forward proofs (each in a single run) failed the majority of the time, the denial of the conclusion to be proved was placed in list(sos), the initial set of support. The only other member of that list was the hypothesis. For example, when attacking the theorem that asserts the deducibility of Moufang 1 from Moufang 3, list(sos) contained two clauses, the equivalent of Moufang 3 and the equivalent of the negation of Moufang 1. (For the researcher wishing some information immediately regarding placing all clauses in the initial set of support, I can report with satisfaction that, other than in a few cases, the results for all five experiments, for each of the six theorems attacked, required less CPU time when the two-element set of support was used; see Section 6.1 for more detail.) Regarding option changes from experiment to experiment, the minimal changes were made in order to gain the most meaningful information in the context of effectiveness. For example, the only differences between the third and the fifth experiment when attacking the theorem that asserts that Moufang 2 implies Moufang 3, with the constraint that Moufang 1 not be used (explicitly or im plicitly), was the addition of a hot list (for the fifth experiment) and the setting of the heat parameter. For a second example, three differences existed between the fourth experiment focusing on proving that Moufang 1 implies Moufang 3 with Moufang 2 not participating at all and the fourth experiment focusing on proving that Moufang 2 implies Moufang 1 with Moufang 3 not participating
1536
Collected Works of Larry Wos
at all. First, regarding list(sos) (the initial set of support), Moufang 1 and the negation of Moufang 3 are replaced by Moufang 2 and the negation of Moufang 1. Second, regarding list(demod), which is the input list of demodulators, two equalities corresponding to Moufang 2 are replaced by two corresponding to Moufang 3; the choice of input demodulators is dictated by the choice of which identity to avoid being present, implicitly or explicitly, in a completed proof. Perhaps unexpected, two equalities are included, for the program must be in a position to cope with the deduction of the unwanted identity with the lefthand and righthand arguments interchanged. The third difference rests with the con tents of list(hot), the initial hot list. When Moufang 1 is the hypothesis, a copy of it appears in list(hot) in addition to appearing in list(sos). When Moufang 2 is the hypothesis, then a copy of it appears in list(hot) as well as in list(sos). The choice of which clauses to place in the initial hot list is a reflection of the recommendation to place key clauses in list(hot), in the two cited cases the clause that corresponds to the hypothesis of the theorem under attack. Regarding the five experiments taken as a sequence, for a given implication, say Moufang 3 implies Moufang 2, many options were shared. Among them, max.weight was assigned the value 20, pick-given.ratio was assigned the. value 3, max_proofs was assigned the value 2, and max_mem was assigned the value 20000 (20 megabytes). Also in common were set(dynamicdemod) (to adjoin new demodulators during the run), set(demod.outJn) (for outside-inside demodula tion), set(process-input) (to make some or all input clauses into demodulators), and set(para.into) and set(para_from) (both for the inference rule paramodulation). With respect to the latter two set commands, OTTER is instructed, for each clause chosen as the focus of attention to drive its reasoning, to paramodulate from that clause and also to paramodulate into that clause with all of the clauses available in list(usable). Because paramodulation's performance deteri orates markedly when a nonunit clause is involved, in all five experiments (for each of the six theorems under attack), the commands set(para-into_units_only) and set(para.from_units_only) are included. (As it turned out, the preceding two commands to cope with a possible need played no role, for nonunit clauses were never included.) Other than clauses corresponding to a Moufang identity or the negation of one, all input clauses were placed in list(usable), were not placed in the initial set of support. For each of the six theorems, the first experiment in the sequence of five fo cused merely on provability. Indeed, no actions were taken to block the presence in a proof of the unwanted Moufang identity. Despite the lack of a guarantee when using weighting to block an unwanted equation or formula (as discussed in Section 3), Experiment 2 extended Experiment 1 by including a single weight template in weight-list (purge_gen), an inclusion of the following type. weight_list(purge_gon). '/■ Blocking use of Moufang 3. weight((x * (y * (x * z)) - ((x * y) * x) * z),1000).
OTTER and the Moufang Identity Problem
1537
end_of_list. In the third through the fifth experiment (for each theorem of the six), to guarantee (with one proviso) purity, weighting was replaced by demodulation, in the following manner. list(demodulators). '/, Blocking use of Moufang 3 . EQ((x * (y * (x * z)) - ((x * y) * x) * z ) , $T). EQ((((x * y) * x) * z - x * (y * (x * z ) ) ) , $T). end_of_list. As for the proviso, forward proofs were often difficult to complete, necessitat ing in many cases a bidirectional search. The example given in Section 3.1 (in terms of P, Q, and R) correctly suggests that proofs containing steps resulting from reasoning backward from a negative clause can "hide" the presence of an unwanted equation or formula. As (in effect) promised earlier, I show in Section 7 how OTTER was used to replace the so-called backward portion of the proof with a forward portion, maintaining the restriction of blocking the unwanted identity. The fourth and fifth experiments in each of the six sequences extended the third experiment by adding the use of the hot list strategy. In the fourth ex periment, the (input) hot list contained a single element. For example, when Moufang 3 was the hypothesis, then a copy of it was placed in list(hot). In the fifth experiment also included in list(hot) were the axioms of a loop, those clauses (excluding that for reflexivity) that were also present in list(usable). 5.1. T H E FIRST CIRCLE
At this point, comparisons are in order. For the first circle (focusing on the first three identities), three proofs are of interest: a proof that Moufang 1 implies Moufang 2, pure with respect to Moufang 3; a proof that Moufang 2 implies Moufang 3, pure with respect to Moufang 1; and a proof that Moufang 3 im plies Moufang 1, pure with respect to Moufang 2. As commented in Section 1, (from what I know) the third proof was absent in prior studies of Moufang loops. Instead, the proof that Moufang 3 implies Moufang 1 relied on the use of Moufang 2. The first theorem for discussion asserts that Moufang 1 implies Moufang 2. However, purity is not relevant to the first of the sequence of five experiments, for that experiment is concerned merely with completing any proof. (All ex periments in this article were conducted on a SPARCstation-10. For brevity, although the CPU times that are cited are indeed approximate figures, each is given without the reminder of "approximately".) In 125 CPU-seconds, a forward proof was produced of length 73 (paramodulation steps) and level 20, complet ing with retention of clause (1876). (By definition, input clauses have level 0,
1538
Collected Works of Larry Wos
and a deduced clause has a level one greater than that of its parents; demodula tors do not count among parents of a deduced clause.) The use of demodulators in deducing a clause affects neither proof length nor proof level. The second and third experiments (of the sequence of five) produced the same proof with similar statistics. In the fourth experiment (in which the hot list strategy was introduced), some ground was gained, and some was lost. A shorter proof (based solely on reasoning forward) was found, one of length 58 and level 19, completing with the retention of clause (2707). However, the CPU time required to find the proof increased to 589 CPU-seconds. For the curious, quite often more CPU time is required to complete a shorter proof. The fifth experiment was most satisfying. A length 28 and level 14 proof was found, completing with retention of clause (611). In addition to the sharp reduction in proof length, less than 4 CPU-seconds were required. The only drawback at all was that the proof contains one deduced step produced by reasoning backward from the denial of the theorem. Two experiments sufficed to replace that step with a forward reasoning step. Of less interest, in 30 CPUseconds (in the same run) OTTER did complete a forward proof of length 82 and level 23, with retention of clause (1478). With the cited step replacement, each of the five experiments in fact yielded a pure proof. The cited successes with the study of Moufang 1 implies Moufang 2 im mediately called for the study of Theorem 2, Moufang 2 implies Moufang 3. The first experiment—testing the difficulty of obtaining any proof, regardless of purity—produced a proof in 771 CPU-seconds with retention of clause (2783). The proof has length 55 and level 16. Except for one step, the proof is for ward; a subsequent experiment produced the desired replacement step to yield a forward proof. The second experiment essentially imitated the first. The third differed somewhat, producing a proof in 866 CPU-seconds with retention of clause (2778). The proof has length 52, level 16, and, except for one step, is a forward proof. A single experiment provided the needed replacement step to yield a forward proof. Even the fourth experiment was similar, yielding a 57step proof of level 14, completing with retention of clause (2013). To obtain the proof, 591 CPU-seconds were required. Again, one additional experiment sufficed to replace one step to produce a forward proof. The breakthrough in the context of CPU time, as in the study of the first theorem of Circle 1, occurred with the fifth experiment. Indeed, a 50-step proof of level 17 was produced in 19 CPU-seconds with retention of clause (1294). The proof contains one step resulting from reasoning backward from the negation of the theorem, a step that was easily replaced with one experiment. It turned out that, again, (with the cited step replacements) the proofs yielded by the five experiments focusing on the second theorem of Circle 1 are pure (with respect to Moufang 1). To complete the study of Circle 1, the third theorem became the focus, the theorem asserting the deducibility of Moufang 1 from Moufang 3. Of course,
OTTER and the Moufang Identity Problem
1539
purity with respect to Moufang 2 was key. In 555 CPU-seconds, the first experi ment yielded a 47-step proof of level 16 with retention of clause (2359). The proof contains two steps of so-called backward reasoning. The experiment to replace the two steps with forward reasoning succeeded, but, rather amusing, only one replacement step was needed. (On the other hand, as shown by example in Sec tion 7, sometimes the replacement of one step of backward reasoning can require the use of two steps of forward reasoning.) The second and third experiments essentially mirrored the first. The fourth experiment produced a 55-step proof of level 16 in 1910 CPU-seconds with retention of clause (2491). As in the first experiment, the single experiment designed to replace two backward-reasoning steps with forward-reasoning steps found one step that was sufficient. Again, the fifth experiment was the most satisfying, but not as dramatically as with the first and second theorems. In 103 CPU-seconds, OTTER produced a 68-step proof of level 22, completing with retention of clause (1860). As was becoming the custom, one experiment produced a forward-reasoning step to replace the single step resulting from reasoning backward. With the cited step replacements, all five proofs are pure. 5.2. T H E SECOND CIRCLE
Three theorems, 4, 5, and 6, are the focus for Circle 2. Theorem 4 asserts that Moufang 2 implies Moufang 1, of course, with the additional objective of finding a proof pure with respect to Moufang 3. Theorem 5 has Moufang 3 for its hypothesis and Moufang 2 as its conclusion, and purity is with respect to Moufang 1. Finally, Theorem 6 has Moufang 1 as its hypothesis and Moufang 3 as its conclusion, and purity is with respect to Moufang 2. Theorem 4 was proved in the first experiment in 193 CPU-seconds with reten tion of clause (1980). The proof has length 24 and level 12. A single experiment provided a forward-reasoning step to replace the one step that resulted from rea soning backward. The second and third experiments were essentially copies of the first. The fourth yielded a 37-step proof of level 12 in 202 CPU-seconds with retention of clause (1562). A single experiment provided a forward-reasoning step to replace the one step that resulted from reasoning backward. The fifth experiment was minutely pleasing, for 51 CPU-seconds sufficed to produce a proof, one of length 62 and level 19, completing with retention of clause (1640). A single experiment provided a forward-reasoning step to replace the one step that resulted from reasoning backward. With the cited step replacements, the proofs yielded by the five experiments focusing on the first theorem (Theorem 4) of Circle 2 are pure. Theorem 5 proved to be more difficult than did Theorem 4, requiring in the first experiment 573 CPU-seconds and the retention of clause (2351). The resulting 47-step proof of level 16 is a forward proof. The second and third experiments essentially mirrored the first. The fourth required 752 CPU-seconds and the retention of clause (2030) to complete a forward proof of length 66 and
Collected Works of Larry Wos
1540
level 17. The fifth experiment produced the desired proof in 205 CPU-seconds with retention of clause (2091), a proof of length 60 and level 22 that is a forward proof. All five proofs are pure and are also forward proofs. The study of Theorem 6 was more rewarding than was the study of either Theorem 4 or Theorem 5, as one sees from the following data. The first ex periment produced a proof of length 59 and level 19 in 41 CPU-seconds with retention of clause (1622). One experiment sufficed to produce a step to con vert the 59-step bidirectional proof to a forward proof. The second and third experiments essentially matched the first. The fourth experiment proved to be a singular failure, requiring 667 CPU-seconds and the retention of clause (2740) to complete a 93-step proof of level 25. The proof contains one step resulting from reasoning backward, easily replaced to produce a forward proof by conducting a single experiment. The fifth experiment, however, was indeed satisfying. Less than 3 CPUseconds were required to produce a 30-step proof of level 15, and only 614 clauses were retained before the proof was completed. In addition, the proof offered a piquant property: To replace the only step resulting from backward reasoning, to produce a forward proof, required two forward-reasoning steps. With the cited step replacements, the proofs yielded by the five experiments focusing on the third theorem (Theorem 6) of Circle 2 are pure.
6
Insight Regarding the Use of Strategy
To complement the preceding material of this article, this section is devoted to a more detailed discussion of various strategies. Program performance can indeed be sharply enhanced by a judicious use of strategy. Even further, my more than thirty years of use of an automated reasoning program convinces me that relying heavily on strategy is essential when attacking deep questions and hard problems. The evidence is drawn, as expected, from experimentation. 6.1. THE SET OF SUPPORT STRATEGY In the thirty experiments reported here, a natural question concerns the role of the set of support strategy, a strategy that restricts a program's reasoning to drawing conclusions that are recursively traceable to the input clauses placed in list(sos). In particular, what is the effect on program performance if this strategy is not used? Much of the answer is contained in a comparison of the results obtained from conducting experiments in pairs, each pair consisting of one experiment with and one without the set of support strategy. To conduct an experiment without the set of support strategy, (with OTTER) one places all clauses in list(sos). (As a practical matter, the clause for reflexivity of equality, x = x, can be placed in the input usable list, for almost never is the program permitted to apply paramodulation into or from a variable; such permission
OTTER and the Moufang Identity Problem
1541
would, with few exceptions, drown a program in drawn conclusions, for unifi cation would always succeed.) In keeping with an accurate account of history, I note that I conducted thirty experiments without using the set of support strategy before conducting the thirty with its use, the latter being the basis of this article. To have done so is not in keeping with my usual approach of heavily relying on the set of support strategy, as anyone familiar with my research can attest. The explanation (for my omission of the use of the set of support strat egy) rests either with inadequate attention to detail or with a preoccupation with the problem under study. Among the favorable comparisons, Theorem 2 provides relevant evidence, the theorem that asserts the deducibility of Moufang 3 from Moufang 2. In the first experiment (in which no attempt was made to block the use of Moufang 1), without the set of support strategy, a 52-step proof of level 18 was completed, requiring 1897 CPU-seconds. On the other hand, with the set of support strategy, a 55-step proof of level 16 was completed, only requiring 771 CPU-seconds. Why the shorter proof without the strategy, yet requiring roughly 2.5 times as much CPU time? First, in the unsupported proof, the first three steps are yielded by applying paramodulation to combinations of various axioms for a loop (see Section 2 for the definition), and none of the parents of any of these steps is Moufang 2 or the negation of Moufang 3. Nevertheless, amusing enough, two of the three steps are found in the proof obtained by using the set of support strategy with list(sos) containing only Moufang 2 and the negation of Moufang 3. When the axioms of a theory are placed in list(sos), OTTER is given permission to explore the underlying theory, seeking lemmas and such. To ensure that the members of list(sos) are considered before other clauses for driving the program's reasoning, the following command is included. set(input_sos_first). The second factor regarding the difference in CPU time and the difference in proof length rests with the fact that, by placing Moufang 2 and the negation of Moufang 3 in list(sos) and the axioms in list(usable), the program is prevented from exploring the basic theory. Although the result will usually be the inacces sibility to various lemmas, in general far fewer paths of inquiry are traversed. Indeed, without the use of the set of support strategy, to complete the proof, perhaps as many as 303,551 clauses were generated of which 1575 were retained. When the strategy was used, the proof was completed after generating perhaps as many as 289,428 clauses of which 1563 were retained. (Precision regarding how many clauses were generated is lacking, for the run was not terminated im mediately after finding the first proof in either case.) More significant, to obtain a proof without the aid of the strategy, 363 clauses were chosen as the focus of attention to drive the program's reasoning. In contrast, with the strategy, 340 clauses were chosen. For a second example, still focusing on Theorem 2 (of the first circle), the fourth experiment also shows the value of using the set of support strategy.
1542
Collected Works of Larry Wos
Without the strategy, 1811 CPU-seconds were required to obtain a proof; in contrast, with the strategy, 591 CPU-seconds were required. As noted, the fourth experiment of each sequence of five relies on the use of the hot list strategy, where (the initial) list(hot) contains the hypothesis of the theorem and no other clauses. A third example is provided by staying with Theorem 2 but switching to the fifth experiment, that in which the (initial) hot list was expanded to include the axioms of a loop. Without the set of support strategy, 171 CPU-seconds were required to produce the desired proof; with the strategy, only 19 CPU-seconds were required. For additional evidence, the focus switches to Theorem 5, the theorem as serting the deducibility of Moufang 2 from Moufang 3. This theorem is in an obvious sense the converse of Theorem 2, the theorem that was just under scrutiny. In the context of the first of the five experiments, without the set of support strategy, 1615 CPU-seconds were required to obtain a proof; with the strategy, 573 CPU-seconds were required. Perhaps unexpected in view of the decrease in CPU time, 2311 clauses were retained in order to complete a proof without the strategy, in contrast to the requirement of retaining 2351 with it. The additional CPU time is mainly accounted for by the need to choose as the focus of attention 362 clauses when the strategy was not used, in contrast to choosing 331 clauses as the focus of attention when the strategy was used. With the strategy, OTTER generated roughly 250,000 clauses and retained 1344 to produce a proof. In contrast, without the set of support strategy, to complete a proof, the program generated roughly 277,000 clauses of which 1309 were retained. An example of proof lengthening by using the set of support strategy is found in the context of Theorem 2. Theorem 5 provides an example of proof shortening: In the context of Experiment 1, without the strategy, the first completed proof has length 54 and level 17; with the strategy, the first completed proof has length 47 and level 16. One might find it piquant to note that the shorter proof was obtained without the use of the three lemmas used as the first three deduced steps in the longer proof, lemmas obtained by applying paramodulation to pairs of axioms for a loop. In other words, the use of three lemmas from the theory of loops (ignoring the property of being Moufang) did not aid OTTER in finding a shorter proof. The fourth and, to some extent, the fifth experiments (focusing on Theorem 5) also provide evidence of the value (in the context of efficiency) of using the set of support strategy. In the fourth experiment, without the strategy, 2420 CPUseconds were required to produce a proof; with its use, 752 CPU-seconds were required. The first of the two cited proofs has length 34 and level 11; despite the reduction in the required CPU time, the second has length 66 and level 17. In the fifth experiment (in which an extensive hot list was used), 256 CPU-seconds were required to produce a proof without the aid of the set of support strategy; with its aid, 205 CPU-seconds were required. Where the first of the two cited
OTTER and the Moufang Identity Problem
1543
proofs has length 35 and level 16, the second has length 60 and level 22. The story would hardly be complete without an example of an increase in CPU time when the set of support strategy is used. For that example, Theorem 1 suffices, in the context of the fourth experiment. As a reminder, the theorem asserts the deductibility of Moufang 2 from Moufang 1. The experiment's objec tive is to find a proof that is pure with respect to Moufang 3, using the hot list strategy with but one element in list(hot), namely, Moufang 1. Without the set of support strategy, 364 CPU-seconds is enough to yield a 68-step proof of level 17. However, although this strategy is considered by many researchers in auto mated reasoning to be the most powerful strategy for restricting a program's reasoning, 589 CPU-seconds were required to complete a proof with the use of the set of support strategy. The proof has length 58 and level 19. The cited data supports my usual recommendation, the following. Recommendation. In the vast majority of cases, the power of a program's at tack is significantly increased if one uses the set of support strategy. Without additional knowledge, ordinarily, the most effective choice for the initial set of support is the union of the special hypothesis and the negation (or denial) of the conclusion. In the presence of a set 5 of axioms, the special hypothesis of a theorem of the form P implies Q is P. For example, in ring theory, the ad ditional hypothesis that the cube of x = x (for every x) is sufficient to prove commutativity. If the given recommendation is followed regarding which input clauses are to be placed in list(sos), then the elements of the (initial) set of support are the clauses corresponding to xxx = x and the assumed falseness of commutativity. 6.2. T H E RESONANCE STRATEGY
In addition to the set of support strategy and the hot list strategy, OTTER offers other strategies that can sharply increase program performance, for ex ample, the resonance strategy [21]. The objective of the resonance strategy is to enable the researcher to suggest equations or formulas expressed as unit clauses, called resonators, each of whose symbol pattern (all of whose variables are con sidered indistinguishable) is conjectured to merit special attention for directing a program's reasoning. To each resonator, one assigns a value reflecting its rel ative significance; the smaller the value, the greater the significance. My notion in formulating this strategy was that the steps of a known proof of one theorem might indeed be effective if used to direct an automated reasoning program in its search for a proof of a related theorem. Resonators are not lemmas, do not have t r u e or false values. Instead, any equation or formula that matches an included resonator is given the value of the matching resonator and is, therefore, given a corresponding preference for being chosen as the focus of attention to direct the program's reasoning. A clause matches a resonator if and only if, when all variables in both are treated as the same variable, the two are identical. After completing the research on which this article is mainly based, it seemed
1544
Collected Works of Larry Wos
natural to experiment briefly with the resonance strategy to see what effect its use would have, leaving a more thorough investigation either to a future article or to other researchers. My notion was to choose one proof from among the proofs produced by the thirty experiments featured in this article, and use the (positive) deduced steps of that proof as resonators. Equations or formulas containing logical not or containing logical or are never used as resonators. What criteria would be reasonable for choosing the key proof? Although, other than aesthetics, no persuasive justification can be given, three criteria taken together appealed to me. First, narrow the choice to one from among those six experiments in which a substantial hot list was used, the fifth of each sequence of five for the six theorems. Intuitively, if the use of such a hot list could sharply reduce the CPU time required to obtain a proof—which it often did, as reported here—then perhaps the proofs thus obtained offered some special property. Second, narrow the list further by focusing on experiments that yielded a proof in less than 30 CPU-seconds. More intuition: perhaps such a de gree of effectiveness would be inherited. Third, from the narrowed list, choose the experiment that reduced the required CPU time the most. Applying the cri teria selects the fifth experiment focusing on Theorem 2, that in which Moufang 2 as hypothesis is used to deduce Moufang 3 while blocking the participation of Moufang 1. From the first proof yielded by the fifth experiment concerned with The orem 2, forty-nine resonators were chosen, the positive deduced steps. Each was assigned a pick-given weight of 2 (to give any matching clause that is retained a high priority for being chosen as the focus of attention), and a weight_list(pick-and_purge) was used. The same set of support was used. In each of the following experiments, the only change from its original form was the addition of the use of the resonance strategy (as described). The third criterion for choosing the resonators to be used suggested which of the thirty experiments to revisit first, namely, the first experiment focusing on Theorem 2. From the viewpoint of CPU time, the target was 771 CPU-seconds, the time required to obtain a proof on the first visit to the experiment under consideration. How satisfying: With the addition of the resonance strategy, only 12 CPU-seconds (approximately) were required. Where the first visit completed a proof upon retention of clause (2783), the second visit completed a proof upon retention of clause (895). As for the explanation for the dramatic reduction in CPU time, the first visit required choosing 340 clauses as the focus of attention to drive the reasoning; in contrast, the second visit required but 63. When the proof was completed in the first visit, 29,000 clauses had been generated of which 1563 were retained. In contrast, in the revisiting, 13,000 were generated of which 519 were retained upon completion of the proof. Closer inspection of the two runs revealed additional riches. Where the proof obtained in the first visit has length 55 and level 16, the proof obtained in the second visit has length 40 and level 12. As one can see from the data, the use of 40 clauses (for the proof) out of 519 is more impressive than using 55 out of
OTTER and the Moufang Identity Problem
1545
1563. Of the 40 deduced steps in the shorter proof, 20 were not present in the longer proof. Immediately, one might wonder about the direct role of the resonators. In particular, how many of the 40 steps are among the 49 resonators? The answer is 31. In that resonators are treated as if their variables are indistinguishable, the natural question to answer next concerns how many of the 40 deduced steps match (in the sense discussed earlier in the context of the resonance strategy) a resonator. That question can easily be answered by taking the 49 resonators and making all of their variables the variable x, then applying the same actions to the 40 deduced steps, and finally making a comparison. Noting that 31 of the 40 steps are among the resonators, I fully expected that almost all of the 40 steps would match a resonator (in the sense used here). I was wrong: Indeed, the number increased merely by 1, from 31 to 32. Among other conclusions, one could envision having a strong preference for using the hot list strategy, which, therefore, dictated running Experiment 5 for Theorem 2 first before running Experiment 1, and then (out of curiosity, at least) using the corresponding resonators to run Experiment 1. Unfortunately, such a sequence of experiments (when compared with the sequence presented in this article) would have hidden the value of the hot list strategy. Still, all in all, the cited results hint at the power offered by combining the set of support strategy, the hot list strategy, and the resonance strategy. The impressive (to me) reduction in CPU time virtually demanded some additional experimentation. Of the other twenty-nine experiments that might be revisited, clearly the most charming and provocative focused on Theorem 5, for the following reason. Theorem 5 is the converse of Theorem 2, interchanging the hypothesis and conclusion of the latter. Therefore, to expect any significant decrease in CPU time to obtain a proof of Theorem 5 by using resonators from a proof of Theorem 2 is indeed counterintuitive. To provide the simplest comparison, the obvious choice from among the sequence of five experiments to revisit was the first of the five. Where in the first visiting 573 CPU-seconds were required to obtain a proof of Theorem 5, in the revisiting (with the use of the 49 resonators taken from a proof of Theorem 2) 238 CPU-seconds were required. In the first of the two runs, the completed proof has length 47 and level 16; in the second, the proof has length 33 and level 15. Of the 33 deduced steps, 6 are not present in the 47-step proof. Regarding the direct role of the resonance strategy, 18 of the 33 steps are not identical to one of the 49 resonators. In the context of resonator matching, 15 steps do not match a resonator, which (when compared with the preceding experiment in which 8 do not match) in part explains why the decrease in required CPU time was less dramatic. Prom among the remaining twenty-eight experiments meriting revisiting, I chose but three for testing the power of the resonance strategy, keying on the same forty-nine resonators; the other twenty-five must wait for future research by me or by another researcher. The first of the three focused on Theorem 4,
1546
Collected Works of Larry Wos
Experiment 1. (My emphasis on the first of the sequence of five experiments throughout this subsection is explained by the conjecture that the most inter esting results would be found in that context.) The first visit to that experiment yielded a 24-step proof of level 12 in 193 CPU-seconds, with retention of clause (1980). The revisiting (with the resonance strategy) produced a proof of length 32 and level 12 in 14 CPU-seconds, with retention of clause (917). Of the 32 deduced steps, 19 are not found among the deduced steps of the proof produced in the first visiting. Only 6 of the 32 are not identical to one of the 49 resonators, and only 5 do not match one of the resonators (in the sense used in the context of the resonance strategy in which all variables are treated as indistinguishable). Again one finds a rather impressive decrease in CPU time in the context of the first visit and the (second) revisiting, and also finds that a large fraction of the deduced steps of the proof produced by the revisiting match one of the included resonators. One could hardly call such correlations merely coincidental. The second of the three experiments focused on Theorem 1, (in effect) the converse of Theorem 4. Again, the first of the sequence of five experiments was the target. In 125 CPU-seconds, a 73-step proof of level 20 was produced, with retention of clause (1876). With the resonance strategy, only 6 CPU-seconds were required, completing a proof of length 28 and level 12, with retention of clause (817). Of the 28 deduced steps, 11 are not present in the much longer proof. Regarding the role of the resonance strategy, 10 of the 28 steps are not identical to any of the 49 resonators, and 6 do not match any of the resonators. For the final experiment (of the three relying on the 49 resonators), I chose a so-to-speak self-referential experiment. Specifically, what would occur in the contexts of CPU time and proof length when the 49 resonators taken from the proof of Theorem 2, Experiment 5, are used to influence the search for a proof of that same theorem when revisiting the same experiment? The answers are that the CPU time increased from 19 CPU-seconds to 25 CPU-seconds, but the proof length decreased from 50 to 41; the level decreased from 17 to 15 also. In the revisiting, the proof completed upon retention of clause (1834), after 56 clauses had been chosen as the focus of attention to drive the reasoning. In the first visit, the proof completed upon retention of clause (1294), after 77 clauses had been chosen as the focus of attention. Where did the extra CPU time go, in view of finding a shorter proof and focusing on fewer clauses to drive the program's reasoning? The additional 6 CPU-seconds is almost equally divided with the weighing of clauses—which is to be expected in view of the inclusion of 49 weight templates as opposed to none—and the use of the hot list strategy, which I cannot explain. The preceding data strongly supports the thesis that, through the use of the resonance strategy, deduced steps of the proof of one theorem con be of substantial value in seeking the proof of a related theorem. However, as the following data shows—and as one would almost certainly predict—the choice of resonators sometimes aids the program not at all, or even interferes with its performance. When, for example, one takes as resonators the positive steps of
OTTER
and the Moufang Identity
Problem
1547
the first proof found for Theorem 1 in Experiment 5, of which there are 27, and uses them to seek a proof of Theorem 2, Experiment 1, the performance of the program is harmed. Indeed, without the resonators, O T T E R completes a proof in 771 CPU-seconds; with the 27 resonators, the CPU time increases to 1404 CPU-seconds. Also, the proof length increases, from 55 to 64, and the level increases, from 16 to 18. But the situation often is confusing and often lacks consistency, as the following shows. When the same 27 resonators are used to seek a proof of Theorem 1, Experiment 1, the CPU time decreases to 4 CPU-seconds from 125, the proof length decreases to 33 from 73, and the level decreases to 15 from 20. 6.3.
T H E DYNAMIC H O T LIST STRATEGY
Closely related to the hot list strategy, which plays a key role in obtaining the results featured in this article, is the dynamic hot list strategy. Although I did not use the latter strategy, its use might indeed prove valuable in a study of this type and might provide an interesting research topic. McCune, recognizing the value of enabling a program to adjoin elements to the hot list during a run, extended the hot list strategy to the dynamic hot list strategy. T h e same heat parameter governs the degree to which the hot list is revisited when a new clause is retained. On the other hand, unlike the hot list strategy, a second input parameter affects the dynamic hot list strategy, namely, the dynamicJieat.weight. The dynamicheat.weight assignment places an upper bound on the pick-given weight of clauses that can be adjoined to the hot list during the run. For example, one might wish the program to add to the hot list during the run any retained clause whose pick-given weight (whether determined by a weight template or by symbol count) is less than or equal to 4. In that event, one includes the following command. assign(dynamic_heat.weight,4). A far more intriguing and intricate example concerns the case in which the researcher wishes the program to adjoin to the hot list during the run certain clauses if they are deduced and retained. As an illustration, one might wish the hot list to contain the clause equivalent of the property of commutativity of product if the corresponding equation is deduced and retained. The follow ing actions suffice. First, one begins by relying on a strategy that combines the power of the resonance strategy with that of the hot list strategy (in its dynamic incarnation); see [20]. Second, one assigns, say, the value 2 to the dy namicJieat.weight (by using the just-cited assign command with 4 replaced by 2). Finally, one includes the following. weight_list(pick_and_purge). weight(EQ(prod(x,y),prod(y,x)),2). end_of_list.
Collected Works of Larry Wos
1548
Of course, one must assign to the max_weight a value greater than or equal to 2 to permit the program to retain the clause encoding of commutativity of product, if that clause is deduced. If the cited equality clause is deduced, it will be assigned a weight of 2, of course assuming that no earlier weight template applies and that the clause is not demodulated. The clause will be retained if subsumption does not purge it. Because of having a weight of 2, which is also the value assigned to the dynamicJieat.weight parameter, the clause will be adjoined to the hot list during the run—as desired. 6.4. A N ALTERNATIVE STRATEGIC APPROACH
For studying Moufang loops and other areas in which equality plays a major role, one other strategy and one other approach merit mention. The strategy is the tail strategy [23], a strategy that gives preference to equations whose second argument is short. To instruct OTTER to use the tail strategy, one includes the following command. weight(Eq($(l),$(2)),1). The command can also be expressed in the following manner. weight«$(l) - $(2)),1). I found this strategy, formulated by McCune when he was studying equivalential calculus [16], useful in attacking problems from Robbins algebra and other areas. Regarding a sharply different alternative than used in this article, by includ ing the following command, one can rely on a Knuth-Bendix approach. set(knuth_bendix). By including the cited command, the program is instructed to continually at tempt to make all positive equalities into demodulators (rewrite rules) and, more generally, to rely (as much as possible) on seeking a complete set of reductions. Studies focusing on various aspects of group theory, for example, have profited from using Knuth-Bendix; see [9] and [18]. Unfortunately, too often, the use of this approach consumes an inordinate amount of CPU time in demodulation. Therefore, a most attractive (to me) but difficult research problem to attack con cerns some compromise between full reliance on Knuth-Bendix and full reliance on paramodulation (as shown in the included input file).
7
Proof Anomalies and Proof Conversion
The oddities and anomalies that one finds in proofs can contain clues to means for increasing program performance. Three anomalies are cited here without any suggestion regarding their significance. First, the replacement of a step re sulting from reasoning backward by one or more steps resulting from reasoning
OTTER and the Moufang Identity
Problem
1549
forward often can require the use of paramodulation rather than the use of demodulation. Indeed, sometimes the backward reasoning step benefits by fo cusing on an equation that is free of variables, thus permitting demodulation to be used. When that part of the reasoning is replaced by reasoning forward from the appropriate equations, often the program must focus on equations in which variables abound, thus requiring paramodulation, for two-way unification can be required. Before turning to the second anomaly, an example of the preceding phe nomenon is in order, an example showing how a single step of backward reason ing can require as a replacement two steps of forward reasoning. The example is taken from the replacement of steps resulting from reasoning backward by steps resulting from reasoning forward in the context of Experiment 5 focusing on the deducibility of Moufang 3 from Moufang 1 (Theorem 6). Perhaps a thorough study of the example can show how to proceed in obtaining forward proofs in a single run, rather than obtaining bidirectional or backward proofs that then are transformed into forward proofs. The first set of clauses is taken from that part of the proof where reasoning backward occurs. > UNIT CONFLICT a t 2.97 s e c > 615 [ b i n a r y , 6 1 4 . 1 , 1 0 . 1 ] $ANS(m3). 10 [] x - x. 27 [ c o p y . 2 6 , f l i p . 1 ] ((a*b)*a)*c != a* (b* ( a * c ) ) I $ANS(m3). 29,28 [ p a r a _ i n t o , 2 4 . 1 . 1 . 1 . 2 , 1 9 . 1 . 1 , d e m o d , 2 2 ] (x*y)*x ■= x* ( y * x ) . 331,330 ( h e a t - 1 ) [ p a r a . i n t o , 3 1 6 . 1 . 1 . 2 , 4 . 1 . 1 , f l i p . 1 ] r s ( r s ( x , l ) , y ) = x*y. 585,584 ( h e a t « l ) [ p a r a . i n t o , 5 7 3 . 1 . 1 . 1 . 2 . 1 , 6 . 1 . 1 , demod,271,396,331,526,129] (x* ( y * x ) ) * z - x* (y* ( x * z ) ) . 614 [ p a r a _ f r o m , 3 3 0 . 1 . 2 , 2 7 . 1 . 2 . 2 . 2 , d e m o d , 2 9 , 5 8 5 , 3 3 1 ] a* (b* ( a * c ) ) ! = a* (b* (a*c)) | $ANS(m3). To replace the cited negative equality, clause (614), the following two-step proof was used (based on forward reasoning). > UNIT CONFLICT at 0.17 sec > 25 [ b i n a r y , 2 3 . 1 , 7 . 1 ] $ANS(m3). Length of proof i s 2 . Level of proof i s 2 . PROOF 2 [] (x*y)*x - x* ( y * x ) . 3 [] (x* ( y * x ) ) * z = x* (y* ( x * z ) ) . 5 [] r s ( r s ( x , l ) , y ) - x*y. 7 [] ( ( a * b ) * a ) * c != a* (b* ( a * c ) ) I $ANS(m3). 17,16 [ p a r a _ i n t o , 5 . 1 . 2 , 2 . 1 . 2 ] r s ( r s ( x , l ) , y * x ) - (x*y)*x. 23 [ p a r a _ f r o m , 5 . 1 . 2 , 3 . 1 . 1 . 1 , d e m o d , 1 7 ] ( ( x * y ) * x ) * z - x* (y* ( x * z ) ) . Regarding a possible template for the choice and precise placement of the various clauses OTTER uses to produce the needed steps of forward reasoning, see the example given at the close of this section, just after the discussion of the third
1550
Collected Works of Larry Wos
anomaly. However, note that one may be forced to modify the template, for example, by commenting out dynamicdemod or by adding clauses to the input list of demodulators or by taking some other (not necessarily) small action. Regarding the second anomaly, when one closely examines a proof in which backward reasoning occurs, where the goal is to discover the obstacle to com pleting a proof in which only forward reasoning occurs, the following can be unearthed. The forward step merely requires the paramodulation from a clause A into a clause B to yield a clause C, which, if nothing interferes, would com plete a proof by contradiction by providing unit conflict with the negation of the conclusion or with the negation of the conclusion with its two arguments inter changed. In such an event, one might theorize that the forward proof could be obtained merely by placing in list(passive) both the negation of the conclusion and its flip, with the two arguments interchanged. When I tried an experiment of the type under discussion, the sought-after forward proof was not produced. Instead, I found that indeed paramodulation was applied from A into B, but interference took place. Specifically, before unit conflict was detected, demod ulation was applied to C, where the demodulators are a form of associativity (whose general and well-known form does not hold in Moufang loops), followed by demodulation with B. To my surprise, the result of the cited demodula tion was a clause that was an instance of reflexivity of equality and was thus subsumed by x = x, which was present in list(usable). For the third anomaly, which was discovered directly as a result of the justcited experiment, one might experience something like bewilderment by trying the following. One selects an experiment reported in this article or one selects an experiment from one's own research and revisits it with one small modification, the removal of the clause x = x for reflexivity. I was motivated to do so by the possibility that a forward proof would be discovered in the absence of reflexivity and its power for subsumption. My choice was to revisit one of the experiments featured in Section 6.2, the experiment in which forty-nine resonators were used in the study of proving Moufang 3 from Moufang 2, Experiment 5. Rather than rediscovering the 39-step proof that was found in the experiment, OTTER found a shorter proof, one of length 35. I do find such oddities piquant, disturbing, and, most of all, counterintuitive. At this point, I provide an example (as promised) of how steps resulting from reasoning backward from the negation of the conclusion can be replaced by steps whose nature is forward reasoning. The focus is on the proof of Theorem 2, Experiment 5. That proof contains one step resulting from reasoning backward from the flip of the negation of the theorem. The flip of an equality is produced by OTTER by simply interchanging the lefthand and righthand arguments. The following three clauses are present in which != occurs, the third being a deduced clause. 25 [] a* (b* (a*c)) !- ((a*b)*a)*c I $ANS(m3). 26 [copy,25,flip.1] ((a*b)*a)*c !- a* (b* (a*c)) I $ANS(m3).
OTTER and the Moufang Identity Problem
1551
138 [para_into,26.1.1.1.1,57.1.2,demod,58,28] (a* (b*a))*c != a* (b* (a*c)) | $ANS(m3). Clause (138) is used to complete the proof when considered with the following clause, clause (1294), to obtain unit conflict. 1294 (heat-1) [para.from,1281.1.1,3.1.1.1,demod,1087] (x* (y*x))*z = x* (y* (x*z)). The object is to replace clause (138) with a clause resulting from reasoning forward, a clause that gives unit conflict when considered with the negation of Moufang 3 or with its flip, clauses (25) and (26), respectively. The first step of the procedure is intended to prevent reasoning backward. Therefore, clauses (25) and (26) are placed in list(passive). Clauses in that list do not participate in the reasoning; they are used only to determine proof com pletion (by detecting unit conflict) and for forward subsumption. The next move is designed to have the paramodulation involve clause (1294) rather than clause (138); clause (138) must not be derived. With clause (26) in this converting procedure in list(passive) and from the facts that clause (57) was paramodulated into clause (26) and the result was demodulated with clause (58) and clause (28), two actions are taken. Clause (1294) is placed as the only clause in list(sos), and (the following) clauses (27) and (57) are placed in list(usable), along with the clause for reflexivity. 58,57 (heat-1) [para_into,29.1.1,9.1.1,demod,20,22,flip.1] l s ( x , l ) * y = x*y. 28,27 [para_into,23.1.1.1.1,19.1.1,demod,20] (x*y)*x = x* (y*x). A pair of numbers, such as 58,57, says that clause (58) is a copy of clause (57), the former used as a demodulator, and the latter used for paramodulation (in this case). Of course, all of the history is stripped from the clauses when they are placed in the various input lists. Also, as is obvious, no need exists for placing clauses (57) and (58) both in list(usable), for they are identical; a similar remark holds for clauses (28) and (27). What is not obvious is that none of the cited clauses is placed in list (demodulators), the reason being that I conjectured that demodulation with these clauses would not be needed. (Sometimes, to find the needed forward steps, clauses may need to be inserted into the input demodulator list.) As for options, with a few exceptions, one can use those of the experi ment that produced the proof being converted to a forward proof. One can drop the use of the hot list strategy, and one is advised to avoid the use of set(process_input), thus preventing any of the input clauses from being demod ulated. What may surprise the researcher, as it did me, is the fact that clause (27) and clause (1294) were enough to produce (by applying paramodulation) the desired clause to unit conflict with clause (26).
Collected Works of Larry Wos
1552
8
Extending Purity to the Fourth Moufang Identity
To test the possible value and scope of a new methodology, a profitable approach is to find a problem that was not in the set of problems that precipitated the study that led to formulating the methodology. This section features such a problem, brought to my attention by Ken Kunen. In particular, he notes that focusing on three equivalent Moufang identities is a historical accident, for a fourth identity [6] (the following expressed in a notation acceptable to OTTER) is of equal interest; the identity is the mirror image of Moufang 1. '/, Axiom, Moufang 4: x * ((y * z) * x) - (x * y) * (z * x). When Kunen brought this identity to my attention, he also suggested that, rather than seeking circles of pure proofs, one might search for a set of pure proofs that, if found, shows that all possible orderings of the four Moufang identities admit a corresponding circle of pure proofs. Twelve proofs are needed. One could begin by seeking a proof that Moufang 1 implies Moufang 2 such that the proof is pure with respect both to Moufang 3 and to Moufang 4, in contrast to requiring (as was the case earlier in this article) purity with respect to Moufang 3 alone. Almost immediately upon learning of this fourth identity, I applied the methodology used in this article to searching for the desired twelve proofs. Here I give the results and briefly detail some of the highlights. Rather than a sequence of experiments, my choice was to attempt to go to the heart of the matter, in particular, by focusing on the fifth experiment. After all, the approach on which the fifth experiment is based had proved the most powerful, when compared with the other four elements of the sequence of five. My plan was simply to take the corresponding input file for attacking, say, Moufang 1 implies Moufang 2 and merely add demodulators to block the use of Moufang 4. For a second illustration, for the sought-after proof of Moufang 1 from Moufang 4, the plan called for taking the input file (for Experiment 5) that includes, say, Moufang 3 as hypothesis and the negation of Moufang 1 (as the denial of the sought-after conclusion) and replacing Moufang 3 with Moufang 4 and adding two demodulators for blocking the use of Moufang 3. In most respects, the plan worked beautifully. Indeed, ten of the twelve proofs were yielded quite readily. However, the proof that Moufang 2 implies Moufang 1 that is pure with respect to both Moufang 3 and Moufang 4 and the proof that Moufang 3 implies Moufang 1 that is pure with respect to both Moufang 2 and Moufang 4 gave trouble. The culprits had to be the two demodulators that were added to block the use (in a proof) of Moufang 4, for no other change was made from the corresponding two experiments that had succeeded (as discussed earlier in this article). One of culprits in fact caused OTTER to deduce -$T, where $T denotes true; hence, the empty clause was deduced, where it was not wanted.
OTTER and the Moufang Identity Problem
1553
In order to circumvent this unfortunate occurrence, $T was replaced through out the input file by the constant junk, and a weight Jist was included containing one template, the following. weight(junk,1000) . The idea is to demodulate any unwanted clause to the constant junk (rather than to $T), and then purge any clause containing junk. Therefore, weightJist(purge_gen) was used. A smaller value than 1000 would have sufficed, as long as the value exceeded the value assigned to the max.weight. The modification succeeded, and the final two proofs (of the desired twelve) were completed. In point of fact, many of the runs yielded two proofs of the sought-after result. Measured in CPU time, the most difficult to obtain required approximately 1084 CPU-seconds (on a SPARCstation-10). The theorem was that of proving Moufang 1 from Moufang 2, and it was the second proof that required the time. That proof was sought because its use made easier the task of converting the backward-reasoning fraction of a bidirectional proof into a forward proof; see Section 7 for the approach. Its length is 48, and its level is 17. Of the set of proofs that were produced, the longest consists of 83 deduced steps; its level is 23; and the theorem is that of deducing Moufang 2 from Moufang 1, the second proof. On the other hand, some proofs were produced almost immediately, having length 2 and level 2. Summarizing, the research featured in this article suggests that use of the hot list strategy is most effective for this type of problem. Also, the use of demodu lation to prevent the participation of unwanted equations succeeds, although in a few cases its use is supplemented with the use of weighting. Finally, motivated by Kunen's suggestion, I applied the methodology for seeking pure proofs to the ensemble of the four Moufang identities, not only yielding their equivalence, but also yielding the desired twelve pure proofs. As a corollary, all orderings of the four Moufang identities admit a circle of pure proofs, for example, the ordering 1, 3, 2, 4, 1 and the ordering 1, 4, 2, 3, 1. To put totally to rest any notion that, with enough effort, CPU time, and ingenuity, pure proofs can always be found regardless of the ordering, a brief review of the study of the thirteen shortest single axioms for equivalential cal culus suffices. Were the parallel situation to hold for equivalential calculus, then one could select any two of the thirteen shortest single axioms and prove that the first implies the second with a proof that is pure with respect to the other eleven. However, quite the opposite is true. Specifically, although one can of course select any of the thirteen and prove any of the remaining twelve—for each by itself is a complete axiom system—for many pairs of formulas, purity of proof is often unobtainable, regardless of the effort, CPU time, and ingenuity involved. For a striking example of an impenetrable barrier, note that the first application of condensed detachment (the inference rule often used to study this area of logic) to the axiom known as UM with itself yields that known as XGF. Therefore, with UM as hypothesis, any sequence of deduced steps must contain
1554
Collected Works of Larry Wos
XGF as the first step, in turn implying that only one pure proof (with respect to the other shortest single axioms) is possible when condensed detachment is the inference rule in use, namely, that of XGF. Because of the use of demodulation (which captures instances of demodula tors), and because of (where needed) application of the technique of Section 7 for replacing steps of backward reasoning with steps of forward reasoning, all twelve proofs are pure even with respect to instances of the undesired equa tions, and none of the proofs contains hidden use of an unwanted equation; see the syntactic example given in Section 3.1 regarding a backward proof. In other words, among other properties, none of the proofs relies upon as an intermediate step (resulting from demodulation) the use of an equation to be avoided, which is where the outside-in demodulation comes into play. The presence of instance purity for the proofs, although not an objective, adds nicely to the result. As a final note for the curious, in the context of quasigroups rather than loops, the four Moufang identities are provably equivalent even when the two axioms regarding the identity element 1 are absent. With small modifications, the methodology presented here found a circle of pure proofs for the four iden tities, with the ordering 1, 2, 4, 3. The proofs range in length from 32 to 391 deduced steps. Their difficulty, measured in CPU time, ranges from 6 CPUseconds to 27,175 CPU-seconds; the latter time is for the theorem that asserts the deducibility of Moufang 1 from Moufang 3. Of note is the use of the res onance strategy to obtain the cited proof requiring 27,175 CPU-seconds. The resonators are those proof steps that do not mention the identity element 1 and that are from the proof that Moufang 3 implies Moufang 1 when the two axioms for the identity element 1 are present. Without the resonance strategy, even after more than 71,000 CPU-seconds, no proof was obtained. These two experiments taken together give additional evidence of the value of using for resonators proof steps from a related theorem. Quite a challenge for automated reasoning is offered by the theorem that asserts the deducibility of Moufang 3 from Moufang 2, where the two axioms for the identity 1 are absent, and where purity with respect to Moufang 1 and Moufang 4 is required; indeed, I failed to find such a proof. A similar situation is presented when Moufang 3 is the hypothesis and Moufang 2 is the conclusion.
9
Review, Conclusions, and Future Research
Featured in this article is the problem of finding, if such exist, three pure proofs that together prove the equivalence of the (first) three identities known as the Moufang identities. A proof that Moufang 1 implies Moufang 2 is pure with respect to Moufang 3 if and only if Moufang 3 is not present in the proof, explicitly or implicitly. Although Bruck supplied proofs of the equivalence of the (first) three iden tities [2], an open question remained, one that (in effect) is concerned with
OTTER and the Moufang Identity Problem
1555
the following flaw in the proofs. The equivalence was established by showing that Moufang 1 implies Moufang 2, 2 implies 3—and here is the flaw—rather than showing directly that 3 implies 1, so-to-speak two proofs were presented respectively showing that 3 implies 2 and that 2 implies 1. Aesthetically, in mathematics and in logic, when asked to establish the equivalence of a set of properties or definitions, one prefers a circle of pure proofs (as formally defined in Section 2). To answer the implied open question (focusing on the first three Moufang identities), one must order the three identities and then supply three proofs that, for the ordering, are each pure. For example, the ordering might be 1, 3, 2, which would then ask for a proof that 1 implies 3 that is pure with respect to 2, a proof that 3 implies 2 that is pure with respect to 1, and a proof that 2 implies 1 that is pure with respect to 3. This article shows that both orderings admit a circle of pure proofs, namely, 1, 2, 3, and 1, 3, 2. Perhaps equally important, the method for obtaining the required proofs with substantial assistance from McCune's program OTTER is detailed. Access to the details enables one to verify the results independently and to extend the studies on which this article is based. In addition, this in formation provides some insight for researchers who might wish to use OTTER for attacking questions totally unrelated to those featured here. To further that possibility, the Appendix offers a sample input file and four proofs (for the circle of pure proofs for the ordering 1, 2, 3, 4 of all four Moufang identities); see Sec tion 8. In Section 8, the method is successfully applied to the study of a fourth Moufang identity [6] (brought to my attention by Ken Kunen) in the context of obtaining twelve pure proofs, the proofs needed to establish that all orderings admit a circle of pure proofs. The method, through the use of demodulation outside-in, produces a bonus: The proofs are free of the use of instances of the unwanted equations, even in the context of the intermediate steps resulting from demodulation. Nor do the proofs contain the hidden use of an unwanted equa tion (as discussed in Section 3.1 where the focus is on proofs featuring reasoning backward from the denial of the theorem under study). That aspect was verified by finding steps resulting from forward reasoning to replace any that resulted from reasoning backward. I thus continue the practice of acquainting researchers with the use of OTTER and its options, showing how and why to make various choices. The three strategies that played the key role are the hot list strategy (see Section 5), the set of support strategy (see Section 5 and Section 6.1). and Mc Cune's ratio strategy (see Section 5). The first of these three strategies derives its power from rearranging the order in which conclusions are drawn, enabling the program to draw conclusions far sooner than it would otherwise. Indeed, use of the hot list strategy permits an automated reasoning program to "look ahead". The second derives its power from restricting the reasoning, thus avoiding nu merous paths of inquiry. The third strategy derives its power from combining a search based on conclusion complexity with one based on breadth first. Throughout this article, data is sprinkled regarding CPU time, clause re-
1556
Collected Works of Larry Wos
tention, proof length, and the like. As one quickly discovers, anomalies occur among the data. Research that culminates in explaining the anomalies might be quite useful in the fuller context of using an automated reasoning program to solve problems in mathematics and in logic. Such research might also shed needed light on the topic of option selection. Indeed, research that succeeds in producing some effective metarules for option choosing would clearly be signif icant. Among the features of OTTER covered here, features that can be put to good use in a variety of contexts, are techniques for blocking the use of unwanted equations or formulas and diverse strategies for increasing the effectiveness of an automated reasoning program. Of the two techniques for preventing the participation of an unwanted equation or formula, weighting and demodulation, the latter can be used to guarantee one's objective, where the former (though useful) admits loopholes. The picture and subtleties are discussed in some detail in Section 3. In the context of future research, Section 6.2 focuses on the resonance strat egy, a strategy that encourages the user of the program to supply symbol pat terns (in the input) that can sharply influence the preference given to retained conclusions. The preference or priority assigned to retained conclusions directs the program in its choice of where next to focus its attention. Also of interest for future research is the dynamic hot list strategy (see Section 6.3), formulated by McCune as an extension of the hot list strategy. This strategy enables an automated reasoning program to adjoin new clauses to the hot list during the run. Although the question central to this article was answered by finding two circles of pure proofs (for the first three Moufang identities), many of the proofs lacked one aesthetic property. In particular, many of the proofs (obtained in the primary runs) contained a step resulting from reasoning backward from the negation of the conclusion to be proved, and, occasionally, two such steps were present. Therefore, to satisfy the aesthetic constraint of having access to proofs all of whose steps result from forward reasoning, additional runs were needed; see Section 7. An interesting and challenging research topic focuses on finding some means to obtain in a single run the desired forward proof, for each of the six theorems that were featured in this article. Also meriting research is the study of "short" proofs and the study of "elegant" proofs; the two properties are often tightly coupled, but not necessarily. The strategies discussed in this article have proved useful for research of the type just cited. The data presented here support various conclusions. First, quite likely, the question central to this article would still be open were it not for McCune's pro gram OTTER. Second, of a more general character, the data given throughout this article provides substantial evidence of the power of the hot list strategy, of the power of the set of support strategy, and of the virtual requirement of using strategy when attacking deep questions and hard problems with such a program. Indeed, a glance at the CPU times with and without the strategies
OTTER and the Moufang Identity Problem
1557
is most encouraging regarding the value of strategy. No longer is memory the obstacle to effective research with the assistance of a reasoning program; CPU time is the dominant concern.
Appendix To aid and stimulate research, I present here a sample input file (for use with McCune's program OTTER) and the four proofs of which the first circle con sists, for the ordering 1, 2, 3, 4 of the four Moufang identities. The follow ing sample input file provides a beginning. When a line contains a "%", the characters from the first "%" to the end of the line are treated by the pro gram as a comment. In the proofs, two copies of an input clause denotes its presence in two input lists, one of which is the hot list. Also, as an example, [paraJnto,24.1.1.1.2,19.l.l.demod,22] says that clause (2 4) is the into clause, clause (19) is the from clause, and clause (22) is used as a demodulator. The notation 24.1.1.1.2 says that the term of interest is the second subargument of the first subargument of the first argument of the first literal. When OTTER lists in a proof a pair of numbers, such as 28,27, to designate the clause number of a retained clause, the clause is used both as a demodulator and as a parent in the application of some inference rule; the higher number designates the use as a demodulator.
1558
Collected Works of Larry Wos
Sample Input File op(400,ifx, *) . '/. make all association explicit '/, set (ur.res). '/. The following two commands prevent paramodulation involving a '/, nonunit clause. set(para_into_units_only). set(para_from_units_only). set(para_into). X Paramodulate \fIinto\fR the chosen clause. set(para_from). '/. Paramodulate \f Ifrom\fR the chosen clause. set (order_eq) . X Flip arguments if righthand heavier than */. lefthand. set(dynamic_demod). X Adjoin demodulators during the run. X set(back_demod). X Apply new demodulators to retained clauses. set(lrpo). X Activate the LRPO ordering for orienting '/, equalities and deciding dynamic demodulators. set(demod_out_in). X Demodulate from outside to inside. set(process_input). Treat input clauses as if they were X generated by applying an inference rule. clear(print.kept). X Do not enter in the output file clauses V, as they are retained. lex([$T,a,b,c,d,e.l._»_,rs(_,_),ls(_,_),-(_,_)]). X Order the '/. symbols for demodulation. assign(max_weight,20). % Limit the complexity of retained clauses. assign(max_mem,20000). X Limit the memory use to 20 megabytes. assign(pick_given_ratio,3). X Three clauses are chosen by X weight and one by breadth first, repeatedly. as8ign(max_proofs,2). X Limit the number of completed proofs X to 2. as8ign(report,90). X Every 90 CPU seconds, write a statistical X summary into the output file. assign(heat,l). X Limits the recursion through the hot list. X The following list can be used to purge unwanted equations X of various types. X weight_list(purge_gen). X Blocking use of Moufang 1. X weight(((x * y) ♦ (z » x) - (x * (y * z)) * x),1000). X Blocking use of Moufang 2. X weight(((x * y) * z) * y - x * (y * (z * y)),1000). X Blocking use of Moufang 3. X weight((x • (y » (x » z)) - ((x ♦ y) * x) * z),1000). X end_of_list.
OTTER and the Moufang Identity Problem
*/, Used to complete applications of inference rules. list(usable). x « x. x * rs(x,y) - y. V. right solvable rs(x, x * y) » y. '/, right solution is unique 7, (implies left cancellation) ls(x,y) * y « x. '/, left solvable ls(x * y , y) - x. '/, left solution is unique 7, (implies right cancellation) '/. identity: 1 »x - x. X * 1 - X.
7. left cancellation '/, x»y !» u I x»z != u 7, right cancellation 7, y*x !- u I z*x != u end.of_list.
I y » z. I y « z.
'/. Used to initiate applications of inference rules. list(sos). '/, A consequence of left and right surjective: '/, It's all that is needed here. '/. x ♦ R(x) - 1. X L(x) * x = 1. '/, The following negate left and right inverse. 7. d « y !- 1. •/. y » e !- 1. '/, actually, L and R turn out to be the same in a Moufang loop '/, Axiom, Moufang 1: '/. (x * y) * (z * x) =■ (x * (y » z)) ♦ x. '/, Axiom, Moufang 2: ((x » y) * z) * y = x » (y * (z * y)). 7. Axiom, Moufang 3: '/, x * (y * (x * z)) = ((x * y) * x) » z. '/, Axiom, Moufang 4: % x * ((y * z) * x) = (x » y) * (z * x ) .
1559
1560 7. Negation Axiom, Moufang 1: 7. ((a * b) * (c » a) !- (a * (b * c)) * a)
Collected Works of Larry Wos
I $ANS(ml).
'/, Negation Axiom, Moufang 2: 7. ((a * b) * c) ♦ b !- a ♦ (b * (c * b)) I $ANS(m2). 7, Negation Axiom, Moufang 3: a * (b * (a * c)) !« ((a » b) * a) * c I $ANS(m3). 7. Negation Axiom, Moufang 4: 7. a « ((b * c) * a) !- (a » b) * (c ♦ a) I $ANS(m4). end_of_list. 7. Used mainly to detect proof completion and to monitor progress. 7. list (passive). 7. ((a * b) * (c * a) !- (a * (b * c)) * a) I $ANS(ml). 7. ((a * b) ♦ c) * b !- a * (b * (c ♦ b)) I $ANS(m2). 7. a » (b * (a * c)) !- ((a * b) ♦ a) * c I $ANS(m3). % a * ((b * c) * a) !- (a * b) « (c » a) I $ANS(m4). 7. end_of_li8t.
7, The following list can be used to purge unwanted equations % of various types. list(demodulators). 7, Blocking use of Moufang 1. EQ(((x * y) * (z * x) « (x * (y * z)) ♦ x ) , $T). EQ(((x ♦ (y * z)) ♦ x - ((x » y) » (z * x))). $T). 7. Blocking use of Moufang 2. 7. EQ((((x * y) » z) » y - x * (y ♦ (z * y))), $T). 7. EQ(((x * (y » (z ♦ y))) - (((x * y) ♦ z) ♦ y)), IT). 7. Blocking use of Moufang 3. 7. EQ((x • (y * (x * z)) - ((x ♦ y) * x) * z), $T). 7. EQ((((x ♦ y) * x) » z - x ♦ (y ♦ (x * z))), $T) . 7, Blocking use of Moufang 4. EQ((x » ((y * z) * x) - (x ♦ y) » (z » x)). $T). EQ(((x « y) * (z * x) - x * ((y ♦ z) * x)). $T). end_of_list. 7, Used for the hot list strategy. list(hot). '/, Axiom, Moufang 1: % (x * y) * (z » x) - (x ♦ (y ♦ z)) « x.
OTTER and the Moufang Identity
Problem
1561
'/, Axiom, Moufang 2: ((x * y) * z) * y = x » (y » (z * y)). 7. Axiom, Moufang 3: 7. x » (y * (x ♦ z)) = ((x * y) * x) » z. 7, Axiom, Moufang 4: 7. x * ((y « z) * x) =» (x » y) * (z * x). x * rs(x,y) ■ y. '/, right solvable rs(x, x * y) » y. 7, right solution is unique (implies left cancellation) ls(x,y) * y » x. 1. left solvable ls(x * y , y) » x. 7, left solution is unique (implies right cancellation) '/, identity: I * x = x . x ♦ 1 - x. end.of.list.
Four Proofs in Order for the First Circle for the Four Moufang Identities
Moufang 1 implies Moufang 2 Otter 3.0.4, August 1995 The job was started by voa on merlin.mcs.anl.gov, Thu Aug 31 21:36:46 1995 The command was "otter304". > UNIT CONFLICT at 3.96 sec > 616 [binary,615.1,12.1] $ANS(m2). Length of proof is 29. Level of proof is 14. PROOF 6 [] x*rs(x,y) = y. 7 [] rs(x,x»y) = y. 8 [] ls(x,y)*y = x. 9 [] ls(x*y,y) = x. 10 [] l*x - x. II [] x*l - x. 12 [] x = x. 13 [] x*rs(x,y) = y. 16,15 [] rs(x,x*y) = y. 17 [] ls(x,y)»y - x. 19 [] ls(x*y,y) = x.
1562
Collected Works of Larry Wos
22,21 [] l*x » x. 24,23 [] x*l - x. 25 [] (x*y)* (z*x) - (x* (y*z))*x. 26 [copy,25,flip.1] (x» (y»z))*x - (x*y)« (z*x). 28 [] ((a*b)*c)*b !- a» (b» (c*b)) I $ANS(m2). 30,29 [para.into,26.1.1.1.2,21.1.1,demod,24] (x*y)*x - x* (y*x). 31 [para.into,26.1.1.1.2,17.1.1,demod,30,flip.1] ( x * l s ( y , z ) ) « (z*x) - x* (y*x). 33 [para_into,26.1.1.1.2,13.1.1,demod,30,flip.l] (x*y)* (rs(y,z)*x) - x* (z*x). 37 [para_into,26.1.2.1,13.1.1,damod,30] x* ((rs(x,y)*z)*x) = y*(z*x). 43 (heat-l) [para_into,29.1.1.1,6.1.1,flip.1] x* (rs(x,y)»x) = y*x. 47,46 (heat-l) [para.from,29.1.1,9.1.1.1] ls(x» (y»x),x) = x*y. 70,69 (heat-l) [para_from,33.1.1,7.1.1.2] rs(x»y,x* (z»x)) = rs(y,z)*x. 113 [para_fron,26.1.1,15.1.1.2] ra(x* (y*z),(x*y)* (z*x)) = x. 126,125 (heat-l) [para_into,113.1.1.2.1,ll.l.l,demod,70,22]
rs(x,x)»y = y. 136,135 [para.into,125.1.1,23.1.1] r s ( x . x ) - 1. 176 [para.from,43.1.1,15.1.1.2] rs(x,y*x) ■ rs(x,y)*x. 204 (heat-l) [para.into,176.1.1.2,10.1.1.demod,136,flip.1] r 8 ( x , l ) * x » 1. 209 [para.from,204.1.1,26.1.2.2,demod,30,24] x* ( ( y * r s ( x , l ) ) » x ) - x*y. 213 [para.from,204.1.1,26.1.2.1,demod,30,22] r s ( x . l ) * ((x*y)*rs(x,D) - y » r s ( x , l ) . 216,215 [para.from,204.1.1,19.1.1.1] l s ( l . x ) - r e ( x . l ) . 223 (heat-l) [para.from,209.1.1,7.1.1.2,demod,16,flip.1] ( x * r s ( y , l ) ) * y « x. 244 (heat-l) [para.from,213.1.2,9.1.1.1,demod,47] r s ( x , l ) « (x*y) - y. 246 [para.from,31.1.1,15.1.1.2,demod,70] r s ( l s ( x , y ) , x ) * z = y*z. 248 (heat-l) [para.into,246.1.1,11.1.1,demod,24] r s ( l s ( x , y ) , x ) - y. 258 [para.into,223.1.1.1.2,248.1.1,demod,216] (x*y)*rs(y,l) = x.
OTTER and the Moufang Identity
Problem
1563
267,266 (heat-1) [ p a r a . i n t o , 2 5 8 . 1 . 1 . 1 , 8 . 1 . 1 , f l i p . 1 ] ls(x,y) » x*rs(y,l). 268 (heat-1) [para_into,258.1.1.1,6.1.1] x*rs(rs(y,x),1) - y. 312 [para.into,244.1.1.1,248.l.l.demod,267,22] x* (rs(x,l)*y) - y. 326 (heat-1) [para.into,312.1.1.2,6.1.1,flip.1] r s ( r s ( x , l ) , y ) - x»y. 523 [para_from,268.1.1,37.1.1.2.1,flip.l] x* ( r s ( r s ( y , r s ( z , x ) ) , l ) » z ) ■ z* (y*z). 528,527 [para_from,268.1.1,244.1.1.2,flip.1] rs(rs(x,y),l) - rs(y,l)*x. 530,529 (heat-1) [para.from,523.1.1,7.1.1.2,demod,528,528] rs(x,y* (z*y)) » ((rs(x,l)*y)*z)*y. 615 [para.from,326.1.2,28.1.2,demod,530,528,126] ((a»b)»c)*b != ((a*b)*c)*b I $ANS(m2). 616 [binary,615.1,12.1] $ANS(m2). Moufang 2 implies Moufang 3 Otter 3.0.4, August 1995 The job was started by wos on merlin.mcs.anl.gov, Thu Aug 31 21:39:13 1995 The command was "otter304". > UNIT CONFLICT at 19.30 sec > 1279 [binary,1277.1.140.1] $ANS(m3). Length of proof is 51. Level of proof is 17. PROOF 5 [] ((x*y)*z)*y 6 [] x*rs(x,y) = 7 [] rs(x,x*y) » 8 [] ls(x,y)*y 9 [] ls(x*y,y) = 10 [] l*x - x. 11 [] x*l - x.
13 [] 15 [] 18,17 19 [] 22,21
«= x* (y* (z*y)). y. y. x. x.
x»rs(x,y) = y. rs(x,x*y) = y. [] l s ( x , y ) * y = x. ls(x*y,y) = x. [] l*x - x.
1564
Collected Works of Larry Wos
24,23 [] x*l - x. 25 [J ((x*y)*z)*y = x» (y* (z*y)). 27 [] »♦ {b* (a»c)) !- ((a*b)*a)»c I $ANS(m3). 28 [copy,27,flip. 1] ((a»b)*a)*c !- a* (b* (a*c)) I $ANS(m3). 30.29 [para_into,25.1.1.1.1,21.1.1,demod,22] (x*y)*x « x* (y*x). 31 [para_into,25.1.1.1.1,17.1.1] (x»y)*z - l s ( x . z ) * (z» (y*z)). 32 [para.into.25.1.1.1.1,13.1.1] (x*y)*rB(z,x) z» ( r s ( z . x ) * ( y * r 8 ( z , x ) ) ) . 36,35 [para.into,25.1.1.1,23.1.1,demod,22] (x*y)*y - x* (y*y). 38,37 [para.into.25.1.1.1.13. l . l . f l i p . 1 ] x» (y» (rs(x*y,z)*y)) » z*y. 41 (heat-1) [ p a r a . i n t o , 2 9 . 1 . 1 . 1 , 6 . 1 . 1 , f l i p . 1 ] x* (rs(x,y)*x) - y*x. 60,59 (heat-1) [para.into,31.1.1,11.1.1,demod,22,24,flip.1] l s ( x , l ) * y - x»y. 70 (heat-1) [para_from,31.1.2,7.1.1.2] r s ( l s ( x , y ) , ( x * z ) * y ) - y* (z*y). 110 (heat-1) [para_from,35.1.1,7.1.1.2] rs(x*y,x* (y*y)) = y. 120 [para_from,25.1.1,19.1.1.1] l s ( x * (y» (z*y)),y) - (x*y)*z. 122 [para_from,25.1.1,15.1.1.2] rs((x»y)*z,x* (y* (z*y))) = y. 129,128 (heat-1) [ p a r a . i n t o , 1 2 0 . 1 . 1 . 1 , 8 . 1 . 1 , f l i p . 1 ] ( l s ( x , y * (z*y))*y)*z - l s ( x , y ) . 136 (heat-1) [para.into,122.1.1.2,8.1.1,demod,129] r s ( l s ( x , y ) , x ) - y. 138 [para.into,59.1.1,23.1.1,demod,24] l s ( x , l ) « x. 140 [para.into,28.1.1.1.1,59.1.2,demod,60,30] (a* (b*a))»c !- a* (b* (a*c)) I $ANS(m3). 142,141 [para.from,138.1.1,136.1.1.1] r s ( x , x ) - 1. 216 [para_from,41.1.1,31.1.2.2] (x*rs(y,z))*y - l s ( x . y ) * (z*y). 223 [para.from,41.1.1,15.1.1.2] rs(x,y*x) - rs(x,y)»x. 230 (heat-1) [para.into,216.1.2.2,10.1.1,demod,18] ( x * r s ( y , l ) ) * y - x. 260 (heat-1) [para.into,223.1.1.2,10.1.1,demod,142,flip.1] r s ( x , l ) * x - 1. 276 [para.into,260.1.1.1.136.1.1] x * l s ( l , x ) - 1. 281,280 (heat-1) [para.from,276.1.1,7.1.1.2,flip.1] l s ( l . x ) - rs(x.l).
OTTER and t i e Moufang Identity
Problem
286 [para_from,260.1.i,25.1.1.1.1,demod,22,flip.l] r s ( x , l ) » (x* (y*x)) - y»x. 288 [para.from,260.1.1,15.1.1.2] r s ( r 8 ( x , l ) , l ) « x. 299 (heat-1) [para.into,286.1.1.2.2,8.1.1,demod,18] r s ( x . l ) * (x*y) - y. 305 [para.from,276.1.1,31.1.2.2.2,demod,281,281,24,281,18] (x*y)*rs(y,l) - x. 308,307 (heat-1) [ p a r a . i n t o , 3 0 5 . 1 . 1 . 1 , 8 . 1 . 1 , f l i p . 1 ] ls(x.y) - x*rs(y,l). 309 (heat=l) [para.into,305.1.1.1,6.1.1] x»rs(rs(y,x),1) - y. 339 [para.into,37.1.2,31.1.2,demod,38,308] (x*ra(y,l))* (y« (z*y)) - (x*z)»y. 348,347 (hoat-1) [para.into,339.1.1.2.2.8.l.l.domod,308,flip. 1] (x» ( y * r s ( z , l ) ) ) * z - ( x * r s ( z , l ) ) » (z*y). 361 [para.into,230.1.1.1,32.l.l.domod,30,348,22] x» ( ( r a ( x , l ) * r a ( x , l ) ) * (x»y)) - y. 366,365 [para.into,230.1.1.1,29.1.1,demod,348] ( r s ( x , l ) » r s ( x , l ) ) » (x»y) - r a ( x , l ) » y . 370 (heat-1) [para.from,361.1.1,7.1.1.2,demod,366] r8(x,y) - ra(x,l)»y. 392 [para.into,299.1.1.1,288.1.1] x» ( r s ( x , l ) * y ) - y. 407 [para.into,299.1.1.2,13.1.1] r s ( x , l ) * y - r s ( x . y ) . 409,408 (hoat-1) [ p a r a . i n t o , 3 9 2 . 1 . 1 . 2 , 6 . 1 . 1 , f l i p . 1 ] r a ( r 8 ( x , l ) , y ) = x*y. 437 (heat-1) [para.from,407.1.1,9.1.1.1,demod.308] ra(x,y)*ra(y,l) = r s ( x . l ) . 446 [para.from,299.1.1,19.1.1.1,demod,308] x«ra(y*x,l) » r s ( y , l ) . 463 (heat-1) [para.from,446.1.1.7.1.1.2,flip.1] ra(x*y,l) - r 8 ( y , r a ( x , l ) ) . 536,535 [para.from,309.1.1,299.1.1.2,tlip.1] ra(ra(x,y),l) » ra(y,l)*x. 613 [para.from,370.1.2,35.1.2.2,demod,36,flip.1] x*ra(y,r8(y,D) ■» x* ( r 8 ( y , l ) * r s ( y , D ) . 652,651 (heat-1) [para.into,613.1.1,8.1.1,demod,308,536,536, 36,142,22,flip. 1] (x* (y*y))» ( r 8 ( y , l ) * r s ( y , D ) - x. 963 [para.from,437.1.1,110.1.1.1,demod,409] x* ( r s ( x . y ) * ( r s ( y , l ) » r 8 ( y , l ) ) ) - r s ( y . l ) .
1473
1566
Collected Works of Larry Wos
995,994 (heat-1) [para.from,963.1.1,7.1.1.2] r s ( x , r s ( y , l ) ) « rs(x,y)« ( r s ( y , l ) * r s ( y , l ) ) . 1023,1022 [para.from,463.1.1,407.1.1.1,demod,995,flip.1] rs(x*y,z) - ( r s ( y . x ) * ( r s ( x , l ) * r s ( x , l ) ) ) * z . 1063,1062 [para.from,535.1.1,407.1.1.1,flip.1] r8(rs(x.y),z) - (rs(y,l)*x)*z. 1074 [para.from,535.1.2,37.1.1.2.2.1.1,demod,1063.142,22] r s ( x . l ) * (y* ((rs(y,x)*z)*y)) - z*y. 1077.1076 (heat-1) [para.from,1074.1.1.7.1.1.2.demod.1063, 142,22.flip.1] x* ((rs(x,y)*z)*x) - y* (z»x) . 1270 [para.into,70.1.1.2.1,407.1.1,demod,308,1023,995,652, 1063,142.22] (x*y)» (rs(y,z)*x) - x» (z*x). 1277 (heat-1) [para.from,1270.1.1,5.1.1.1,demod,1077] (x* (y«"x))*z - x* (y» (x*z)). 1279 [binary,1277.1,140.1] $ANS(m3). Moujang 3 implies Moufang 4 Otter 3 . 0 . 4 , August 1995 The job was started by wos on merlin.mcs.anl.gov, Thu Aug 31 15:53:30 1995 The command was "otter304". > UNIT CONFLICT at 10.56 sec > 1133 [binary,1132.1,28.1] $ANS(m4). Length of proof i s 30. Level of proof i s 13. pR00F 5 [] (x*y)* (z*x) = (x* (y*z))*x. 6 [] x*rs(x,y) - y. 7 [] rs(x,x*y) - y. 8 [] l s ( x , y ) * y - x. 9 [] l s ( x * y , y ) = x. 10 [] l*x - x. 11 [] x*l - x. 14,13 [] x*rs(x,y) - y. 16,15 [] rs(x,x*y) «= y. 20,19 [] ls(x*y,y) - x. 22.21 [] l*x « x. 24.23 [] x*l - x. 25 [] x» (y* (x*z)) - ((x*y)»x)*z.
OTTER and the Moufang Identity
Problem
27,26 [copy,25,flip.1] ((x*y)*x)*z - x* (y* (x*z)). 28 [] a* ((b*c)*a) != (a«b)» (c*a) I $ANS(m4). 29 [para.into,26.1.1.1.1,23.1.1,demod,22] (x*x)*y - x* (x*y). 34,33 [ p a r a . i n t o , 2 6 . 1 . 1 . 1 . 1 , 1 3 . l . l . f l i p . 1 ] x* ( r s ( x . y ) * (x*z)) - (y*x)*z. 36,35 [para.into,26.1.1,23.1.1,demod,24] (x*y)*x » x* (y*x). 37 [para_into,26.1.1,13.1.1,demod,36,flip.1] x* (y* (x*rs(x* (y*x),z))) « z. 39 (heat-1) [ p a r a . i n t o , 2 9 . 1 . 1 , 6 . 1 . 1 , f l i p . 1 ] x* (x*rs(x«x,y)) = y. 43 (heat=l) [para.from,29.1.1,7.1.1.2] rs(x*x,x* (x*y)) » y. 58 (heat-1) [para.into,33.1.1.2.2,11.1.1,demod,24] x* (rs(x,y)»x) = y»x. 65 (heat-1) [para_into,33.1.2.1,5.1.2,demod,34,27,flip.1] ((x»y)» (z*x))*u = x* ((y*z)» (x*u)). 70,69 (heat-1) [para_from,33.1.1,7.1.1.2] rs(x,(y»x)*z) = rs(x,y)» (x*z). 83,82 (heat«l) [para_from,35.1.2,5.1.2.1,demod,27] (x*y)* (x*x) = x» (y» (x»x)). 91,90 (heat=l) [ p a r a . i n t o , 3 7 . 1 . 1 . 2 . 2 . 2 , 7 . 1 . 1 , f l i p . 1 ] (x* (y*x))*z = x* (y« (x»z)). 103 [para.from,26.1.1,19.1.1.1,demod,36] l s ( x * (y* (x»z)),z) - x» (y*x). 114,113 (heat-1) [para.into,103.1.1.1.2,8.1.1] l s ( x * y , z ) - x* ( l s ( y , x * z ) * x ) . 115 (heat-1) [para.into,103.1.1.1,10.1.1,demod,114,114,91, 22,22,22,22,24] x» (ls(y,x*y)*x) - x. 120,119 (heat-1) [para.from,103.1.2,5.1.2.1,demod,83,114, 3 6 . 2 0 , f l i p . 1 ] x* ((y*x)*x) - x* (y* (x*x)). 147 [para.from,29.1.1,19.1.1.1,demod,114,114,36,120] x* (x* ( l s ( y , x * (x*y))» (x»x))) - x*x. 151 [para.from,29.1.2,15.1.1.2,demod,70] r s ( x . x ) * (x*y) - x*y. 154,153 (heat-1) [para.into,147.1.1.2.2.1.2.2,10.1.1,demod, 22,22,22,24,24,24] l s ( x . x ) - 1. 176,175 (heat-1) [para.from,151.1.1,9.1.1.1,demod,154,flip.1] rs(x,x) » 1. 239 [para.from,39.1.1,IS.1.1.2,flip.l]x*rs(x*x,y) - r s ( x , y ) . 258,257 (heat-1) [para.from,239.1.1,7.1.1.2,flip.1] rs(x*x,y)» r a ( x , r s ( x , y ) ) .
1567
Collected Works of Larry Wos
1568 273
[para.from.58.1.1,43.1.1.2.2,demod,258,16] rs(x,y*x) = r s ( x , y ) » x . 292,291 (heat-1) [ p a r a _ i n t o , 2 7 3 . 1 . 1 . 2 , 1 0 . 1 . 1 , d e a o d , 1 7 6 , f l i p . l ] r s ( x , l ) * x - 1. 356 [para.from,291.1.1.33.1.1.2.2,demod,24,14,flip.l] (x*r8(y,l))»y - x. 360 [para.from,291.1.1,26.1.1.1.1,demod.22,flip.1] r s ( x , l ) » (x* ( r s ( x , l ) * y ) ) » r s ( x , l ) * y . 367,366 (heat-1) [para.from,356.1.1.9.1.1.1] ls(x.y) - x*rB(y,l). 373 (heat-1) [para.into,360.1.1.2.2,6.1.1,demod,14] r s ( x , l ) « (xfy) - y. 553,552 [para.from,373.1.1,19.1.1.1,demod,367] x*rs(y*x,l) - r 8 ( y , l ) . 1132 [para.from,65.1.1.115.1.1,demod,367,553,292.24] x* ((y*z)*x) - (x*y)« (z*x). 1133 [binary,1132.1,28.1] $ANS(m4). Moufang 4 implies Moufang 1 Otter The job was Wed Sep The command
3 . 0 . 4 , August 1995 s t a r t e d by wos on merlin.mc8.anl.gov, 6 10:14:31 1995 was "otter304".
> UNIT CONFLICT a t 0.31 sec > 124 [binary,123.1,12.1] $ANS(ml). Length of proof i s 3. Level of proof i s 2. PROOF 12 [] x - x. 22,21 [] l*x - i . 23 [] x*l - x. 25 [] x* ((y»z)*x) - (x*y)» (z*x). 26 [] (a*b)» (c*a) !- (a* (b*c))*a I $ANS(ml). 27 [copy,26,flip.1] (a* (b«c))»a !- (a*b)* (c*a) I $ANS(ml). 29,28 [para_into,25.1.1.2.1,23.1.1,demod,22,flip.l] (x*y)*x - x* (y*x). 123 [para_into.27.1.2,25.1.2,demod,29] a» ((b*c)*a) !- a* ((b*c)*a) I $ANS(ml). 124 [binary,123.1,12.1] $ANS(ml).
OTTER and the Moufang Identity Problem
1569
References 1. Boyer, R. S., and Moore, J S., A Computational Logic Handbook, Academic Press: New York, 1988 (also Web information http://www.cli.com/software/nqthm/ obtaining.html). 2. Bruck, R. H., A Survey of Binary Systems, Springer-Verlag: Berlin, 1971. 3. Chein, O. Moufang Loops of Small Order, Memoirs of the American Mathe matical Society 13, issue 1, no. 197, January 1978. 4. Chein, O., Pflugfelder, H. 0., and Smith, J. D. H., Quasigroups and Loops: Theory and Applications, Heldermann Verlag: Berlin, 1990. 5. Fenyves, F., "Extra loops I, Publicationes Mathematicae Debrecen 15, 235238 (1968). 6. Fenyves, F., "Extra loops II, Publicationes Mathematicae Debrecen 16, 187192 (1969). 7. Kunen, K., Private communication (1994). 8. McCharen, J., Overbeek, R., and Wos, L., "Complexity and related enhance ments for automated theorem-proving programs, Computers and Mathematics with Applications 2, 1-16 (1976). 9. McCune, W., and Wos, L., "Application of automated deduction to the search for single axioms for exponent groups, pp. 131-136 in Logic Programming and Automated Reasoning, Lecture Notes in Artificial Intelligence, Vol. 624, ed. A. Voronkov, Springer-Verlag: New York, 1992. 10. McCune, W., OTTER 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, Illinois, 1994. 11. Moufang, R. "Zur Struktur von Alternativkorpern, Math. Ann. 110, 416430 (1935). 12. Padmanabhan, R., and McCune, W., "Automated reasoning about cubic curves, Computers and Mathematics with Applications (special issue on auto mated reasoning) 29, no. 2, 17-26 (January 1995). 13. Padmanabhan, R., and McCune, W., "Single identities for ternary Boolean algebras, Computers and Mathematics with Applications (special issue on auto mated reasoning) 29, no. 2, 13-16 (January 1995). 14. Pflugfelder, H. O., Quasigroups and Loops: Introduction, Heldermann Verlag: Berlin, 1990.
1570
Collected Works of Larry Wos
15. Wos, L., Automated Reasoning: 33 Basic Research Problems, Prentice-Hall: Englewood Cliffs, N.J., 1987. 16. Wos, L., "Meeting the challenge of fifty years of logic, J. Automated Rea soning 6, no. 2, 213-222 (1990). 17. Wos, L., Overbeek, R., Lusk, E., and Boyle, J., Automated Reasoning: In troduction and Applications, 2nd ed., McGraw-Hill: New York, 1992. 18. Wos, L., "Automated reasoning answers open questions, Notices of the A MS 5, no. 1, 15-26 (January 1993). 19. Wos, L., "The kernel strategy and its use for the study of combinatory logic, J. Automated Reasoning 10, no. 3, 287-343 (June 1993). 20. Wos, L., "The power of combining resonance with heat, J. Automated Rea soning (accepted for publication). 21. Wos, L., "The resonance strategy, Computers and Mathematics with Ap plications (special issue on automated reasoning) 29, no. 2, 133-178 (January 1995). 22. Wos, L., "Searching for circles of pure proofs, J. Automated Reasoning (ac cepted for publication). 23. Wos, L., The Automation of Reasoning: An Experimenter's Notebook with OTTER Tutorial, accepted for publication by Academic Press (1996).
Automating t h e Search for Elegant Proofs LARRY WOS* Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439-4844, U.S.A. e-mail: [email protected] (Received: 3 January 1997) Abstract. The research reported in this article was spawned by a colleague's request to find an elegant proof (of a theorem from Boolean algebra) to replace his proof consisting of 816 deduced steps. The request was met by finding a proof consisting of 100 deduced steps. The methodology used to obtain the far shorter proof is presented in detail through a sequence of experiments. Although clearly not an algorithm, the methodology is sufficiently general to enable its use for seeking elegant proofs regardless of the domain of study. In addition to (usually) being more elegant, shorter proofs often provide the needed path to constructing a more efficient circuit, a more effective algorithm, and the like. The methodology relies heavily on the assistance of McCune's automated reasoning program OTTER. Of the aspects of proof elegance, the main focus here is on proof length, with brief attention paid to the type of term present, the number of variables required, and the complexity of deduced steps. The methodology is iterative, relying heavily on the use of three strategies: the resonance strategy, the hot list strategy, and McCune's ratio strategy. These strategies, as well as other features on which the methodology relies, do exhibit tendencies that can be exploited in the search for shorter proofs and for other objectives. To provide some insight regarding the value of the methodology, I discuss its successful application to other problems from Boolean algebra and to problems from lattice theory. Research suggestions and challenges are also offered. Key words: automated reasoning, elegant proofs, Otter, Boolean algebra.
1
Problem Origin
Two independent themes are central to this article. One is concerned with ev idence of advances made in automated reasoning, and the other is concerned with a methodology for attacking the general problem of finding shorter proofs. Though independent, both themes are addressed by a single success with a 'This work was supported by the Mathematical, Information, and Comp utational Sci ences Division subprogram of the Office of Computational and Techno logy Research, U.S. Department of Energy, under Contract W-31-109-Eng-38. Reprinted from the Journal of Automated Reasoning, Vol. 21, 135-175 (1998) with kind permission from Kluwer Academic Publishers.
Collected Works of Larry Wos
1572
specific problem posed by W. McCune. Using his program OTTER [5, 6, 7], McCune had obtained an 816-step proof of a theorem in Boolean algebra; he calls the theorem DUAL-BA-3 in a recent monograph by him and his colleague R. Padmanabhan [8]. The problem he posed to me was to find a far more elegant proof than his 816-step proof, with the focus strictly on proof length. This article shows how the request was met by finding a proof of length 100, details the methodology that was used, and demonstrates that automated reasoning has met an important test measuring its advance. 1.1
MEASURING PROGRESS
The early years of a field are quite often occupied with developing the theory, for mulating necessary components, and—where a computer is involved—designing and implementing programs. Eventually, however, the field must be evaluated. In the case of automated reasoning, crucial to an evaluation is the testing of its programs on problems supplied by others. By solving a problem supplied by an other researcher (in this case, McCune), I provide yet more evidence of the value of automated reasoning. The successful gathering of the evidence is especially appealing—even piquant—because McCune also designed and implemented the very program, OTTER, I used to solve the problem he posed. 1.2
A METHODOLOGY OR AN ALGORITHM?
Although the problem that is solved concerns the finding of an elegant proof for a specific theorem, where elegance focuses solely on proof length, the research that produced the desired solution also resulted in the formulation of a general methodology for automating the search for elegant proofs, where elegance is not confined to proof length. Where proof length is the concern, the methodology is iterative, usually beginning with a "long" proof and, if the methodology suc ceeds, culminating with a "short" proof. As shown in Section 4, later stages of the methodology use results obtained in earlier stages; for example, the proof steps of a proof completed in Experiment j are used to guide the search for a shorter proof in Experiment A:, where j is less than k. Ideally, the world wishes access to an algorithm. Perhaps disappointing to some, the approach presented here is indeed not an algorithm. In fact, my ex periments suggest that an algorithm for producing shorter proofs is in the main impractical. Nevertheless, certain program options and parameters are benefi cial, and, to aid the researcher who wishes to use OTTER, in Section 3 I discuss tendencies concerning their use. For example, quite often the setting of a smaller (rather than a larger) maximum (max.weight) on the length of retained con clusions promotes the completion of shorter proofs. This phenomenon is by no means dependent on the use of OTTER; indeed, with his program EQP, Mc Cune was able to answer the long-standing open question on Robbins algebra
Automating the Search for Elegant Proofs
1573
by producing a 194-step proof when the max_weight was assigned the value 60, and he later obtained an 86-step proof when the max-weight was assigned the value 50 (with no other changes made). An intuitive explanation asserts that the presence in a proof of a complicated formula or equation forces the inclusion of numerous steps used to extract from the complicated item what is needed. Where I have some notion of the type just given, I supply it. Perhaps such hints will enable a researcher to explain which tendencies hold in which cases and why. Although I do not share in the view, I have colleagues who suspect that a practical algorithm can be found. I offer this as a topic for research, if one is so disposed. Indeed, just as the experiments of many years provided the basis for McCune's formulation of an autonomous mode for OTTER to apply—a mode that automatically sets options and makes choices that the researcher would otherwise make—perhaps the experiments covered in this article will provide a basis for a far more automated approach for finding shorter proofs. If such were to occur, the practical consequences would be significant, for mathematics, circuit design, program synthesis, and the like. 1.3
PERSPECTIVE
To put in perspective the research reported here and to provide some insight into the difficulty of pursuing the topic of a more fully automated search for elegant proofs, two items merit mention. First, I have studied such an automated ap proach for some years now; see [17]. However, my in-depth experiments focused mainly on problems in which equality played little or no role. The presence of equality presents many subtle obstacles, some of which concern the use of de modulation. For example, even a simple experiment whose object is to cursorily check a given proof by supplying as weight templates the proof steps, each as signed a weight of say fc, coupled with an assignment of k to the max.weight can result in OTTER completing no proof at all. Second, none of my experiments had ever begun with a proof of such magni tude, in this case, a proof of length 816. Perhaps the longest proof that initiated an experiment in my earlier studies was one of length 63. Because of these two factors, the culmination of my research in producing a 100-step proof was most startling. 1.4
ELEGANCE—IMPORTANCE AND PROPERTIES
The search for elegant proofs has continuously played a key role in mathematics and in logic. Such a search can lead to the discovery of new, important relation ships, and it can also lead to the formulation of significant concepts. In addition to the aesthetic aspects of elegance of concern to mathematics and logic, practi cal aspects also exist, easily illustrated when the focus is on proof length. Indeed, the methodology here may also serve well in the context of constructing more
1574
Collected Works of Larry Wos
efficient circuits, synthesizing more effective algorithms, and the like. Often a "short" proof (when it is, in effect, constructive) can be used to provide the needed path, for example, by using an answer literal to extract from the proof the sought-after object. The desired object might be a circuit relying on few AND gates or an algorithm in which divide rarely occurs. Although what precisely makes a given proof elegant is subject to debate, five properties merit mention: proof length, term structure, variable abundance, deduced-step complexity, and compactness. Each of the cited properties has aes thetic appeal, but each is also relevant to practical considerations, for example, in circuit design. Here I focus mainly on the first property (proof length), giving brief attention to the second through the fourth properties. (Should one happen to wonder whether proof length is indeed relevant to elegance, one need only examine the history of mathematics, for example, that focusing on the various proofs of the Church-Rosser theorem.) As for the fifth property of elegance, compactness, I introduced and studied it in [17, 19]. (Because of its newness, the concept merits immediate focus. By example, a proof of a theorem of the form P implies the and of Q, R, and S is compact if and only if it is a proof of exactly one of Q, R, or S, implying that the other two proofs are subproofs.) The likelihood of finding a proof that is elegant with respect to one or more of the cited five properties can be sharply increased if an automated reasoning program is heavily relied upon, a program such as OTTER. Throughout this article, the first of the five properties, proof length, has a rather precise meaning, namely, the number of deduced steps explicitly present in the proof. The qualifier "rather" is present because intermediate steps result ing from the use of demodulation (for canonicalization) are not counted in proof length. Compared with the typical use in mathematics, the treatment of proof length in logic is more likely to be precise. The explanation rests with the frequent practice in mathematics of omitting many obvious steps (for example, those arising from applying symmetry of equality). In fact, on many an occasion, I have heard a logician say that mathematicians only outline proofs. In contrast, in various areas of logic (such as equivalential calculus [2, 16]), all deduced steps are explicitly presented. Further, various areas of logic require the use of a specific inference rule, a rule such as condensed detachment [3, 16]. Seldom is a specific inference rule cited in a mathematics paper or book. The second property of elegance, term structure, refers to the type of term present in a proof. For example, does the proof (say, from group theory) contain terms of the form inverse (inverse(t)) for some term t, or does the proof (say, from many-valued sentential calculus) contain terms of the form n(n(t)) where n denotes negation, or does the proof contain terms in which nested divide instructions occur? Just as a proof may offer added elegance because of being terse, so also may it offer elegance by avoiding some type of term. The third property of elegance, variable abundance, refers to the number of distinct variables required by one or more deduced steps of the proof. For
Automating the Search for Elegant Proofs
1575
example, the proof might require (for one of its deduced steps) the use of five distinct variables. In the vast majority of cases, requiring a smaller number of distinct variables adds to elegance. The fourth property of elegance, deduced-step complexity, is concerned with the length as measured in symbols of the formulas, equations, or clauses present in the proof. (I sometimes use the term "clause" in a less strict fashion to refer to equations and the like written in various notations acceptable to OTTER.) Exclusive of commas or parentheses, a proof may derive some of its elegance from avoiding the use of any deduced steps that are complex (with respect to length). Indeed, when a proof contains formulas or equations that appear to be "messy" in that a lengthy array of symbols is encountered, the scientist often evinces disappointment, and sometimes comments on the lack of "elegance". Of the four properties of elegance discussed in this article, the first (proof length) presents the most difficulty. The concern for proof length presents the type of challenge that is not directly addressable with an automated reasoning program and (apparently) is not effectively addressable with an algorithm. To dispel the thought that the use of a (frequently impractical) breadth-first (or level-saturation) approach suffices, consider the case in which such an approach yields at level 3 a proof of length 7 and at level 4 a proof of length 4; see Chapter 3 of [17] for a more detailed discussion. On the other hand, (as will be seen) the concern for the second through the fourth properties of elegance is directly addressable with a program such as OTTER and, at least in theory, can be attacked algorithmically. (The fifth property of elegance, compactness, also presents difficulty, not admitting an algorithmic attack that is practical.)
1.5
A NARROW WINDOW
To attack the tough problem of finding "short" proofs, OTTER offers various means—although far from producing an algorithm; see Section 2. (Regarding the other aspects of elegance discussed in this article, one learns that OTTER offers, in most cases, just what is needed.) What becomes clear (in Section 4, where an abbreviated sequence of experiments illustrating the basic methodology is presented) is the narrowness of the path that leads to success. (A far more detailed treatment of the crucial experiments is given in [20}.) For example, a change from the value 9 to the value 10 in the assignment of but one input parameter results (sometimes) in the completion of no proofs, in contrast to the completion of a proof marking significant progress. Nevertheless, when the goal is proof-length reduction, certain tendencies seem to hold regarding option choices and parameter assignments. In that re gard, Section 7 presents a shortcut to moving toward the objective of finding a "short" proof, an approach that might be termed a pseudo-algorithm, and Section 3 discusses tendencies exhibited by various parameters and strategies.
Collected Works of Larry Wos
1576
2
T h e Target Theorem and an Arsenal of Weapons
Next in order are two items: a presentation of the specific theorem that serves as the target, and a discussion of the various weapons OTTER offers for attempting to reach the target. In particular, I address three pressing questions with regard to proof length, namely, which weapons played the key role, how they were used, and in what order. The discussion sets the stage for viewing (in Section 4) some of the crucial sequence of experiments that answer these questons; the experiments culminated in the completion of a 100-step proof to replace the 816step proof serving as the starting point. In addition, in contrast to the indirect attack on the first property of elegance, the discussion shows how OTTER can be used to directly address the second, third, and fourth cited properties of elegance. To aid the researcher interested in using OTTER for both similar and totally dissimilar objectives, the experiments are accompanied by some commentary explaining why certain choices were made. The following equations capture the theorem (DUAL-BA-3) to be proved, where x@ denotes the complement of x and where the two inequalities arise from, respectively, negating the theorem to be proved and negating its dual. x » x. x * (y + z) - (x x + (y * z) - (x x + x« « 1. x * xC « 0. (x * yO) + ((x * (x * xO) + ((x * (x * y«) + ((x * (x + yO) * ((x + (x + xO) * ((x + (x + yO) * ((x + (A * B) + B !- B (A + B) ♦ B !■ B
* y) + (x * z ) . + y) * (x + z ) .
x) + (y« * x)) z) + (xfi * z)) y) + (yfl * y)) x) * (y« + x)) z) * (xC + z)) y) * (yC + y)) I $Ans(A2). | $Ans(A4).
» -
x. z. x. x. z. x.
The theorem under study asserts that, excluding the two equations in which != occurs, the given set of equations is an axiomatization of Boolean algebra; both McCune and I placed both equations in the passive list for our respective studies. (Earlier work [8] showed that it is sufficient to derive either of the two absorption laws, whose respective negations are captured by the two equations in which != occurs; because of duality, a proof of one of the negated equations provides a proof of the other.) The context of the problem was the search for a Boolean algebra axiomatization consisting of an independent self-dual set of two equations. (A set is self-dual if and only if the dual of each equation is also in the set.) In fact, the ten equations (given earlier) lead directly to such an
Automating the Search for Elegant Proofs
1577
axiomatization by applying a method due to Padmanabhan and Quackenbush [9). Each equation of the pair has length 1103, measured in symbol count. This theorem DUAL-BA-3 was used to find the first known independent self-dual 2basis for Boolean algebra. The theorem is significant in that, from preliminary results, deriving either absorption law showed that the set of ten equations is a basis for Boolean algebra. By using Pixley reduction, the set of ten equations can be reduced to an equivalent self-dual set of two equations, which is easily shown to be independent. McCune had succeeded in obtaining a proof with OTTER, a proof of length 816, counting five steps resulting from "copy" followed either by "flip" (which interchanges the arguments of an equation) or by demodulation, and counting one step resulting from back demodulating an input equation with another input equation. (The proof length is increased by the use of "flip" if and only if the equation that is flipped is also present in the proof; both forms of an equation are retained when eq_units_both_ways is set; the use of Kunth Bendix automatically sets this option.) He asked me to find a more elegant proof, adding (in jest) that the goal was a proof of length 100 but, realistically, a 200-step proof would indeed be impressive. In addition to the incentive of finding a proof that McCune would consider elegant, the theorem offered a challenge rather different from my earlier stud ies seeking shorter proofs. Specifically, those earlier studies [15, 17, 19] focused mainly on areas of logic, areas in which the equality relation was totally absent; in contrast, the theorem of interest relies exclusively on the equality relation. (Most of the other theorems used here to test the power of the methodology also rely on the equality relation and on no other. For the actual proofs produced by the methodology, or proofs of a similar flavor, see [8] or either of the following two Web addresses, http://www.mcs.anl.gov/home/mccune/ar/monograph/ or http://www.mcs.anl.gov/people/wos/index.html.) Therefore, new obstacles no doubt would be encountered, for problems featuring equality behave (in math ematics and especially in automated reasoning) sharply differently from those in which equality is not present or present barely. 2.1
ATTACKING PROOF LENGTH
Anticipation of new obstacles to overcome naturally suggested I quickly review some of OTTER's arsenal of weapons that might be useful in seeking a shorter proof for the given theorem. I began with a study of the approach McCune had used to find any proof, the approach that culminated in his finding an 816step proof. He used three familiar weapons: weighting, the ratio strategy, and a Knuth-Bendix approach. Specifically, McCune used a max.weight (maximum weight) of 28 to limit the complexity (measured in symbol count) of retained clauses, a limit that I maintained for the first fourteen experiments, some of which are reported here. To direct the program's reasoning, he used his ratio strategy [7, 17], assigning a value of 4 to the pick_given_ratio. That assignment
1578
Collected Works of Larry Wos
instructs OTTER to choose as the focus of attention four clauses by weight (complexity, whether determined by weight templates or by symbol count), one by breadth first (first come, first serve), then four, then one, and the like. It seemed obvious to me that, for seeking shorter proofs, both max_weight and the ratio strategy would prove most useful—perhaps even required—no doubt relying on a variety of assignments for their respective values. McCune also used a Knuth-Bendix approach for drawing conclusions and for canonicalization. I adopted this weapon also, although not specifically for proof shortening. From my earlier studies focusing on finding shorter proofs, I added three weapons to the cited arsenal: the resonance strategy [15, 17], the hot list strat egy [17, 18], and McCune's dynamic hot list strategy [16, 17, 18]. Regarding the resonance strategy, its objective is to enable the researcher to suggest equa tions or formulas, called resonators, each of whose symbol pattern (all of whose variables are considered indistinguishable) is conjectured to merit special at tention for directing a program's reasoning. To each resonator one assigns a value reflecting its relative significance; the smaller the value, the greater the significance. In contrast to directing a program's reasoning, the objective of the hot list strategy is to rearrange the order in which conclusions are drawn. The rear ranging is caused by immediately visiting and, depending on the value of the (input) heat parameter, even immediately revisiting a set of input statements chosen by the researcher and placed in an input list, called list(hot). The chosen statements are used to complete applications of inference rules rather than to initiate them. The dynamic hot list strategy has the same objective as does the hot list strategy. However, whereas the (static) hot list strategy is restricted to using clauses that are present when the run begins, the dynamic hot list strategy has access to members adjoined to the list(hot) during the run. The input heat parameter governs the amount of recursion that is permitted for both the hot list strategy and the dynamic hot list strategy. The input dynamic_heat_weight parameter assignment places an upper bound on the pick-given weight of clauses that can be adjoined to the hot list during the run. (A clause technically has two weights: its pick-given weight, which is used in the context of choosing clauses as the focus of attention to drive the program's reasoning, and its purge.gen weight, which is used in the context of clause discarding; often the two weights are the same. If no member of a weight Jist matches a clause, where variables are treated as indistinguishable, the clause is given a pick-given weight equal to its symbol count; the same is true for its purge-gen weight.) Each of the cited three strategies, as well as other weapons offered by OTTER, plays a vital role in the context of the first property (proof length) of elegance.
Automating the Search for Elegant
2.2
Proofs
1579
ATTACKING OTHER PROPERTIES OF ELEGANCE
On the other hand, far fewer weapons are needed when the concern is the second, the third, or the fourth property. Further, whereas a focus on proof length requires the use of a methodology, each of properties 2 through 4 can be directly addressed with one or more features offered by OTTER. For pertinent material, see [15, 17, 19]. Indeed, when the second property, that of term structure, is the concern, one can rely either on O T T E R ' s treatment of demodulation or on weighting. Regarding the use of demodulation, consider the case in which one wishes to avoid terms of the form n(n(t)) for any term t, and assume that the only predicate present is P. The following set of demodulators will suffice; O T T E R discards clauses that are demodulated to $T for true. (One can think of the function n as negation and the function i as implication.) list(demodulators). (n(n(x)) » junk). (i(junk.x) » junk). (i(x,junk) » junk). (n(junk) - junk). (P(junk) = $T). end_of_list. A careful analysis of the demodulators in the given list shows that iteration will demodulate (rewrite) a deduced but unwanted conclusion (that takes the form of a unit clause) eventually to $T, standing for true. Regarding the use of weighting, in addition to placing entire equations or formulas in a weightJist (as is the case for resonators), one can place subex pressions in a weightJist. If, for example, the type of elegance that is desired asks for the absence of all terms of the form inverse(inverse(t)) for terms t (say, in a study of group theory), one can include an appropriate weight template. In the case under discussion, the following weight template can be included in weightJist (purge_gen) or in weightJist (pick jind-purge), with the intention that the program purge deduced conclusions containing the undesired subexpression. weight(inverse(inverse($(l))),1000). Because of the presence of this template, the purgcgen weight of any deduced conclusion containing the cited type of term will be increased by at least 1000, if no other weight template interferes. (The use of $ ( 1 ) causes the term in the corresponding position, whether complex or not, to be multiplied by 1.) If the max.weight is strictly less than 1000, such a conclusion will be immediately purged, unless its weight is sufficiently reduced because of the actions of yet another template. One thus has a taste of how O T T E R offers a means for directly addressing the concern for term structure in the context of elegant proofs. Because of the way weighting is implemented in OTTER, it can be used to serve another purpose, namely, to purge an entire class of unwanted equations
Collected Works of Larry Wos
1580
through the use of the resonance-restriction strategy. With this strategy, con clusions that match a resonator are purged; of the conclusions that are retained, the choice for the focus of attention to direct the reasoning is based on symbol count. As for the relevance to elegance, one can imagine having identified a class of equations or formulas that are to be avoided, if elegance is to be increased, a class all of whose members have the same functional shape, differing only in the presence of the particular variables. For example, the class to be avoided might contain the following equations. (x* (x* (x* (x*
((y+z)* ((z+y)* ((y+z)* ((z+y)*
(y+u)))* (y+u)))* (y+u)))* (y+u)))*
((x*y)+ ((x*y)+ ((y*x)+ ((y*x)+
(x* (x* (x* (x*
(z*u))C (z*u))C (z*u))« (z*u))«
) ) ) )
-
(x* (x* (x* (x*
(y+y))*l. (y+y))*l. (y+y))*l. (y+y))*l.
If the intent is to avoid these equations and others of their class differing only in the particular variables that occur, one need only select an equation of the class and use it as a resonator, assigning to it a value greater than the max-weight. Placing the chosen resonator in weight-list(purge^en) or in weight Jist(pick-and.purge) will, if nothing else interferes, cause the program to purge any deduced conclusion that is in the undesired class. Any member of the undesired class will suffice as the chosen resonator—one of the beautiful as pects of the resonance strategy—for (by definition) all variables in a resonator are treated as indistinguishable. (For more details concerning the resonancerestriction strategy, see Sections 4.2 and 5.2 in [17].) Regarding the third cited property of elegance, that focusing on variable abundance, OTTER offers precisely what is needed with a command of the following type. assign(max_distinct_vars,5). With the inclusion of this command, the program will purge any deduced clause that contains more than five distinct variables. The use of that feature alone can result in the completion of a proof that is far more attractive than the proof produced without using that feature. As for the fourth cited property of elegance, deduced-clause complexity, the feature that is relevant can be easily guessed, namely, the option to assign a chosen value to max-weight. For example, if no weight templates are included in the input, OTTER will assign as the purge.gen weight to each deduced clause its symbol count, of course ignoring commas and parentheses. Therefore, if one wishes the program to seek a proof in which no deduced step has a length exceeding, say, k, one merely assigns to max-weight the value k. No problem exists if, at the same time, one wishes to include weight templates to guide the program's search without interfering with the stated intention. In such an event, one places the guiding templates in weightJist(pick-given); no template in that list is consulted when computing the purge_gen weight, the weight used to decide whether to retain a new conclusion in the context of its complexity.
Automating the Search for Elegant Proofs
3
1581
Making Choices and Identifying Tendencies
Consistent with my promise regarding tendencies, this section offers notions gathered from experimentation. Indeed, as one learns here, studies with OTTER suggest that some types of parameter assignment tend to be more effective than others. This observation holds whether the objective is that of finding shorter proofs (featured in this article) or the objective is that of finding any proof. The tendencies are discussed in lieu of a practical algorithm. (Of course, the word tendency is the correct term, for exceptions abound no matter which option, parameter, strategy, and the like is the focus.) That no such algorithm exists (which is my position) might be less surprising if one pauses to note that a mathematician who is asked for an algorithm for finding any proof would, at best, be amused. (The experiments that are cited in this section are discussed in Section 4 and, more fully, in [20].) 3.1
MAX.WEIGHT
One of the more innocent-appearing parameters that governs the actions of OTTER is that which places a maximum on the weight (complexity) of retained information, namely, max.weight. Tendency 1. Assignment of a smaller max_weight rather than a larger tends to produce shorter proofs, tends to promote finding any proof, and tends to reduce the CPU time required to complete the designated task. Intuitively, my guess is that long and messy formulas or equations force the program to include numerous steps to unpack the information that they contain. Perhaps the situation is like the gift that is in a box, which is in a box, which is in yet another box, and the like, with the actual object of delight buried deeply within the nested boxes. To immediately answer a natural question, I note that I typically choose to assign max.weight (in the beginning of an attack) a value between 20 and 30. 3.2
T H E H O T LIST STRATEGY
Just as the values assigned to parameters can be crucial, so also can the choice of strategy or strategies. Tendency 2. Use of the hot list strategy tends to promote the finding of shorter proofs, tends to promote the finding of any proof, and tends to reduce the CPU time that is required to complete the assigned task. The presence or absence of the hot list strategy can produce sharply contrast ing results. Whereas its use in Experiment 26 enabled OTTER to complete the
Collected Works of Larry Wos
1582
desired 100-step proof, the corresponding experiment differing only in the omis sion of the hot list strategy completed no proofs. Such contrasting results are explained by the rearranging of the order in which conclusions are drawn. Indeed, without the hot list strategy, the retention of a new clause does not cause the program to immediately choose that new clause to initiate an application of the inference rules in use, as is the case when the hot list strategy is in use. (Nevertheless, this tendency in no way implies that such immediate initiation is necessarily an advantage. In particular, the new demodulators that might be retained with the use of the hot list strategy might get in the way, might rewrite some clause into a form that blocks finding a shorter proof or, for that matter, finding any proof.) Tendency 3. Use of clauses in the hot list that are representable in a few (perhaps less than 10) symbols—with the added property that the right-hand argument is a single symbol, variable or constant—tends to promote reaching the desired objective. Tendency 4. Inclusion of a clause in the hot list that corresponds to the added hypothesis of a theorem—for example, when studying ring theory, the assump tion that the cube of x is x for all x—tends to promote program effectiveness. The analogue of this tendency is found in many mathematics books and papers where a proof continually leans on the added hypothesis of the theorem in focus to derive one step after another. Tendency 5. Use of clauses in the hot list corresponding to associativity or commutativity tends to sharply decrease the effectiveness of a reasoning pro gram. If a clause, such as that for commutativity, easily matches terms within each newly retained conclusion, then the program can become preoccupied over and over, sharply reducing the number of clauses chosen from the set of support to initiate inference rule application. As the cited tendencies suggest, the content of the (input) hot list can be indeed critical. For example, in Experiment 26 if one uses for the input hot list that used throughout many of the experiments reported in Section 3 and in [20] (just two clauses), no proofs are returned; in contrast, with the following four clauses in the hot list, the desired 100-step proof is found. (As one clearly sees, the clauses do not adhere to Tendency 3; but the discussion focuses on tendencies, not on rigorous rules.) x x x x
* + + *
(y (y xC xC
+ * -
z) =• (x * y) + (x * z ) . z) - (x + y) * (x + z ) . 1. 0.
Automating the Search for Elegant Proofs
1583
Obviously, at least one of the two additional clauses (in the hot list) enables the immediate deduction and retention of one or more needed clauses. (As a reminder, the clauses in the hot list are used to complete application of inference rules.) However, as experimentation shows, sometimes a greater value assigned to the heat parameter harms rather than helps, and sometimes additional members in the hot list impede rather than aid progress. The answers to the questions of how and when to use the hot list strategy rest with the properties of the correspondingly perturbed space of retained conclusions; currently, the only means for making the appropriate decisions is through experimentation. Tendency 6. Adjoining clauses frequently to the hot list during a run, in the context of the dynamic hot list strategy, tends to sharply decrease program effectiveness. The performance tends to increase if clause adjunction during a run consists of the type of clause classed as useful in the context of the static hot list strategy. Tendency 7. Assignment of a value of 1 to the input heat parameter, for both the static and the dynamic hot list strategy, tends to promote the finding of shorter proofs and tends to promote the completion of any proof. Higher values tend to becalm the program, forcing it to remain in one place for a long time before choosing a new initiating clause from the set of support. Other tendencies regarding the use of McCune's dynamic hot list strategy are more difficult to identify at this time, for its use has received relatively small experimental attention so far. The answers to other questions about the hot list strategy—the value as signed to the input heat parameter or the content of the (input) hot list, for example—rest with the properties of the correspondingly perturbed space of retained conclusions. Currently, the only means for making the appropriate de cisions is through experimentation. Indeed, as the following shows, the value assigned to the (input) heat pa rameter can save or lose the day. Whereas the value 3 in Experiment 26 worked well, the value 1 yielded no proof of either the target conclusion or of the dual. (I ignored the case for the value 2 because its use in Experiment 25 did not produce the desired 100-step proof.) In the cited case, the smaller value did not permit the program to deduce clauses of heat level greater than 1, and at least one such clause must have played a significant role. 3.3
THE RESONANCE STRATEGY
The resonance strategy is designed to cope with obstacles such as cul de sacs or dead ends. Indeed, often experiments lead to a sequence of shorter and shorter
Collected Works of Larry Wos
1584
proofs that reach a point conjectured to be far from what is possible—and yet no further progress occurs. To escape such a cul de sac, the resonance strategy is recommended, using proof steps from the last "good" proof or from some earlier or related proof. Tendency 8. Resonators that correspond to proof steps of a closely related theorem aid the program, whether the objective is proof-length reduction or simply proof finding (of any proof). In the limiting case for the resonance strategy, steps from a proof of a theorem obtained in an earlier experiment, if used as resonators, tend to be useful for finding an even shorter proof in a later experiment. An explanation for this phenomenon asserts that such steps (of course, depending on the other option and parameter choices) can tightly direct a program's search, but not restrict the program to finding the already-completed proof. Note that, in contrast to lemmas that should be true when included in an attack, resonators have no truth value. Indeed, when a resonator is used that corresponds to a step of an earlier proof of the theorem under study or corresponds to a proof step of another theorem, its inclusion does not play the role of an included lemma. For a fuller discussion of the relation of resonators to lemmas, see [15]. The added needed latitude to enable a new proof to be completed is derived from such elements as a max.weight whose value is perhaps ten times the value assigned to each of the resonators, usually a small value such as 2. (However, if some of the resonators match many retained clauses, their effectiveness can be totally destroyed and, worse, no proof is found because the program becomes mired in focusing on a large number of similar clauses.) 3.4
T H E RATIO
STRATEGY
Only relatively recently did I and my colleagues at Argonne find useful a levelsaturation or breadth-first search; see [14] for relevant data. Formerly, our pref erence was for a search based on the complexity of clauses that might initiate applications of inference rules. Currently, McCune and I typically rely on the ratio strategy, which combines both direction strategies. Tendency 9. Assignment of small values, but not the smallest, to the pick^given .ratio tends to promote the finding of a shorter proof and tends to promote the completion of any proof. When the smallest possible value, 1, is assigned to the pick_given.ratio param eter, the program is instructed to place equal weight (for choosing clauses to initiate applications of the inference rules in use) on first-come first-serve infor mation and on information of least complexity. If misfortune smiles, first-come first-serve information (that saved as the program's attack begins) will include
Automating the Search for Elegant Proofs
1585
statements whose complexity equals the maximum allowed in the experiment. As conjectured, the use of messy formulas or equations can interfere with the seeking of shorter proofs. On the other hand, if one assigns the pick_given_ratio a large value (such as 20), then the program will likely consider few complicated conclusions to initiate applications of an inference rule. More than rarely, such a conclusion proves to be the key to unlocking the puzzle, to finding a shorter proof or to finding any proof. In fact, the cited phenomenon is precisely why McCune formulated the ratio strategy. To immediately answer a natural question, I note that I typically favor an assignment of either the value 3 or the value 4 to the pick_given_ratio. 3.5
ANCESTOR SUBSUMPTION
The only means offered by OTTER for directly attacking the problem of finding shorter proofs is by using ancestor subsumption (which, for practical reasons, requires the use of back subsumption). This procedure compares the derivation path lengths when presented with two copies of the same deduced conclusion and retains the copy reached by the strictly shorter path. Tendency 10. Use of ancestor subsumption tends to promote the finding of shorter proofs; however, use of ancestor subsumption tends to sharply increase the required CPU time if the goal is simply finding o proof. For an example of the fact that shorter subproofs do not necessarily lead to a shorter total proof, see Chapter 3 of [17].
4
A Tortuous P a t h t o Success
At this point, the focus is on various experiments whose objective was to find a proof of DUAL-BA-3, a proof far shorter than McCune's 816-step proof. Only barely was I seriously seeking a proof of length 100, McCune's original request (though said in jest). I conducted more than one hundred experiments, the result of which was indeed a 100-step proof and, perhaps more important, a general methodology for seeking "short" proofs. Immediately preceding Section 5, a table of twenty-six of the more interesting experiments is given, the details of which are covered in [20]. In this article, I focus on far fewer experiments, noting that setbacks did occur; my objective is mainly to illustrate the methodology, to show how it was developed and used, and to provide data supporting the tendencies regarding option choices and parameter assignments. Earlier studies seeking shorter proofs [15, 17, 19] virtually demanded the use of the resonance strategy, if the goal of finding a 100-step proof (or, for that matter, a 400-step proof) were to be reached. As for which equations to use as resonators, again virtually no thought was required. Indeed, experiments in
CoMected Works of Larry Wos
1586
areas other than Boolean algebra had shown that deduced steps from the proof of a theorem (even if only distantly related to the target theorem) often serve well as resonators, whether the goal is a sharp reduction in CPU time required to complete any proof or the goal is to find a shorter proof. In fact, steps from the proof of a target theorem often serve well as resonators for improving one's results regarding the target theorem. Note that, typically, Experiment fc+1 uses as resonators proof steps of a proof obtained in Experiment k. Therefore, I began (in Experiment 1) by using as resonators the 811 deduced steps of McCune's proof, starting with a clause obtained by back demodulation, but not including the earlier clauses produced by "copy". Almost all the experiments reported in the remainder of this article were conducted on a SPARCstation-10. For brevity, the CPU times are cited without the reminder that they are only approximations. Also, note that all resonators placed in a weight-list must be unit clauses, free of " | " (the logical or sym bol). Therefore, when I discuss positive deduced steps of a proof to be used as resonators, implicit is the requirement that each be a unit clause. 4.1
A FINE START: EXPERIMENT 1
In Experiment 1, each resonator (each of the 811 deduced steps of McCune's proof) was assigned the value 2. All resonators were placed in the pick_and_purge weight-list. This placement instructed OTTER to assign both a pick-given weight and a purge_gen weight of 2 to any deduced clause that matched any of the 811 resonators, where (in the resonator case) two clauses match if they are identical when all variables are treated as indistinguishable from each other. The steps of McCune's proof that are deduced and retained are thus used to influence OTTER's search for a proof; indeed, because of being assigned the small pick-given weight of 2, they are given preference (over most or all clauses) for being chosen to initiate an application of the inference rule or rules in use. Also (of slightly lesser significance) the deduction of a step of McCune's proof will almost certainly lead to its retention, at least in the context of max-weight, for such a step will be assigned a purge.gen weight of 2. In Experiment 1, the max-weight was assigned the same value McCune used, namely, 28. Since the pick-given_ratio remained unchanged from that used to obtain the 816-step proof, namely, that of 4, the presence of clauses matching a resonator would only guide OTTER's search, for every fifth clause used to initiate an ap plication of an inference rule would be chosen by first come, first serve. Such guiding rather than totally controlling the search gives the program the oppor tunity of finding a proof different from McCune's and, more to the point, a chance of finding a shorter proof. In 1613 CPU-seconds on a SPARCstation-10 with retention of clause (15257), OTTER completed a proof of length 680 and level 78; the level of McCune's proof is 89. (The two levels are given to provide yet one more measure of the difference between the two proofs; level will not be cited for the remaining
Automating the Search for Elegant Proofs
1587
experiments, but it is given in Table 1.) Five of its steps result from applying copy to an input clause, followed either by flip to interchange arguments or by demodulation, and one step results from back demodulating an input clause with another input clause. (This added information is given in the event that one wishes to measure proof length beginning with the first use of paramodulation. For the remainder of this article, the figures for proof length will be taken from an OTTER run; the program counts steps of the cited types in its computation.) Because McCune was equally interested in a "short" proof of the dual, its negation, captured by the following clause, was also included. (A + B) * B != B. Even more impressive in the context of proof length reduction, the dual was also proved, the proof having length 628; its level is 67, completing in 1608 CPU-seconds with retention of clause (15031). Perhaps startling—and certainly charming—except for the clause correspond ing to negating the dual, the proof of the dual is a subproof of the proof of the primary conclusion. To remove any doubt concerning technicalities, of course the proof of the dual, being one of contradiction, relies on the negation of the dual, whereas the proof of the primary conclusion relies on its negation; clearly, the negation of the dual is not present in the proof of the primary conclusion, and conversely. 4.2
T H E H O T LIST STRATEGY: EXPERIMENT 2
Whether the focus is on the target conclusion or on its dual, this fine start on a path to finding a far shorter proof does not imply that all always went according to plan—the path was unpredictable, as shown in Table 1. However, with this satisfying reduction in proof length from 816 to 680 or 628 (whichever choice is made), the stage was nicely set for Experiment 2. The only change made from Experiment 1 was that of adding the use of the hot list strategy with the heat parameter assigned the value 1. The object was to gain some insight into the value of using this strategy (coupled with the resonance strategy) in the context of seeking a shorter proof for the target theorem. Of course, the addition of the hot list strategy naturally presented the ques tion of which clauses to place in list(hot). Typically, a practice that has served well is to place in the hot list the clauses corresponding to the special hypothesis. (The special hypothesis of a theorem of the form if P then Q refers to that part of P, if such exists, that excludes the underlying axioms and lemmas. For exam ple, in the study of the theory that asserts that rings in which, for every x, the
cube of x = x are commutative, the special hypothesis consists of the equation xxx — x.) A less precise rule that often is effective suggests for the members of the hot list those clauses in the initial set of support that are "simple", where simplicity usually means expressed in a small number of symbols. The second
Collected Works of Larry Wos
1588
rule is the one that was used. Therefore, the following two clauses were placed in list(hot). x + xQ - 1. x * x9 - 0. Neither clause is directly relevant to the conclusion of the theorem, namely, (x*y)+y - y, nor to its dual, namely, (x+y)*y - y. Nevertheless, the presence of the cited two clauses in the (input) hot list, as the following data shows, produced enough of a reduction in proof length to clearly be of interest, and even piquant (to me). A 607-step proof of the conclusion obtained by McCune was completed in 3702 CPU-seconds with retention of clause (23437); a 606-step proof of the dual was also completed in the experiment. 4.3
RESONANCE AND THE DYNAMIC H O T LIST STRATEGY: E X PERIMENT 3
The next experiment, Experiment 3, was mainly influenced by a practice re flecting earlier successes (in other areas) in seeking shorter proofs, successes in which the resonance strategy played a key role [15, 17, 19]. The practice is it erative, using the deduced proof steps from one experiment as resonators in a later experiment. The first decision I had to make was to choose between the proof steps of the so-called primary conclusion and those of the dual of the conclusion. Although the two proofs completed in Experiment 2 are of almost equal length, 607 (for the primary) and 606 (for the dual), later experiments (as with Experiment 1) might not share this almost-equal-length property. Influencing my decision were two factors. First, aesthetics virtually demanded staying with the conclusion of McCune's 816-step proof; to do otherwise would in a sense be cheating, be loading the dice. Second, science virtually demanded using the primary conclu sion, for many situations to which one might wish to apply the methodology might lack a dual. (As an aside, an interesting area for research concerns using as resonators proof steps of the dual in an iterative attack on seeking a shorter proof, either of the primary or of the dual.) Therefore, with few exceptions, the choice of proof steps of the primary conclusion rather than the dual was the rule for most of the sequence of experiments to be discussed, as was the omission of the type of clause obtained with copy. Only when I hoped to extract a few extra droplets of proof-length reduction did I violate this rule. Data regarding the dual will not come into play or be presented until Experiment 18, where its use became crucial; [20] covers that data.
Automating the Search for Eiegant Proofs
1589
Therefore, for Experiment 3, 602 resonators were used, each assigned a value of 2. In other words, five of the 607 possible resonators were not included, those in which copy was applied to an input clause. One other change was made for Experiment 3, namely, the introduction of the dynamic hot list strategy. However, rather than simply adding the use of this strategy, the idea was to imitate a combination strategy that had proved most effective, namely, that of combining the resonance strategy with the dynamic hot list strategy; see [19]. The combination strategy is, intuitively, an if-then type of strategy: If a clause is retained that matches a member of a chosen set of resonators, then adjoin that clause to the hot list during the run. The notion is that the adjunction to the hot list (during the run) of "short" clauses might produce an even shorter proof, for the use of short clauses in the input hot list had resulted in progress. Therefore, the decision was to choose from among the 602 resonators to be used those expressed in five or fewer symbols (counting spaces, commas, and parentheses, but not counting the period), assign to each the value 1, and assign to the (input) dynamic_heat.weight the value 1. The consequences of this decision are that (1) any clause that is deduced and that matches a resonator with weight 1 will also be assigned a pick-given weight of 1—because the resonators were placed in the pick_and.purge weightJist—and (2), if such a clause is retained, it will be adjoined to the hot list during the run, because of its weight and because the dynamic-heat.weight has been assigned the value 1. (The value assigned to the dynamic-heat.weight places an upper bound on the pick-given weight of clauses that can be adjoined to the hot list during a run.) Of the 602 resonators to be used, 19 were assigned the value 1; they were placed in the pick-and.purge weightJist before the 602 resonators. Therefore, ignoring the assigned value, 19 weight templates appeared twice. The earliest weight template that matches a clause is that which is used to assign the weight to the clause; so the second copy of a template with assigned value of 2 did not interfere with the assignment of 1 as the weight of a clause matching one of the 19 used in conjunction with the dynamic hot list strategy. Summarizing, if a "short" clause (as just defined) was retained that matched a "short" clause of the proof completed in Experiment 2, then that clause was adjoined to the hot list during the run. Experiment 3 was markedly successful. In 321 CPU-seconds with retention of clause (9789), a proof (of the primary conclusion) of length 344 was completed. Thus the iterative approach was working well, having cut the required number of steps to prove the primary conclusion more than in half. 4.4
LAYERING RESONATORS: EXPERIMENT 4 THROUGH 7
As one sees in the preceding section, the use of a set of resonators does not re quire that each member of the set be assigned the same value. Indeed, a subset of the resonators used in Experiment 3 was assigned the value 1 (with the remain-
1590
Collected Works of Larry Wos
ing assigned the value 2) with the intention that any retained clause matching a resonator whose assigned value is 1 be added to the hot list. Experiment 3 (as noted) employed the dynamic hot list strategy. Although I dropped the use of the dynamic hot list strategy for the next four experiments (as well as for all experiments other than 3, 14, and 23), for Experiments 4 through 7 I still used a layered set of resonators, meaning that not all of the resonators were assigned the same value. In particular, for Experiment 4, resonators represented in seven or fewer symbols were assigned the value 1, and the remaining were assigned the value 2. As expected, all of the resonators correspond to proof steps from the proof completed in Experiment 3. Experiment 4 produced more progress, completing a proof of the desired theorem in 111 CPU-seconds with retention of clause (5871), a proof of length 325. As one might expect because of the iterative nature of the methodology, for Experiment 5 new resonators were used, those corresponding to the proof steps of the proof obtained in Experiment 4. The same type of layering of resonators was employed. Experiment 5 produced another significant reduction in proof length, completing a 287-step proof with retention of clause (9400) in 164 CPUseconds. Experiment 5 was followed by numerous failures in the context of finding shorter proofs. For a taste of how much ground could be lost with ineffective choices of the parameters and the strategies, one of the unsuccessful experiments completed a proof of length 514, virtually proving that an algorithm is not being discussed, merely a methodology. Also strongly and correctly suggested are the subtlety, intricacy, and complexity of seeking shorter proofs; indeed, the process is quite delicate in the sense that the slightest change in a parameter assignment can result either in finding no proofs or in finding a rather shorter proof. Regarding what was tried, the value of the heat parameter was assigned 1 sometimes and sometimes 2, and sometimes the hot list strategy was not used. The max_weight was assigned various values, from 12 to 36. Although the usual assignment for the pick_given_ratio was 4, sometimes 3 was chosen. Diverse moves were made in the context of the choice and layering of resonators. One learns that the methodology is experimental, often requiring playing with various combinations of parameter assignments and strategy choices whose effects are indeed unpredictable. In short, not much worked—until Experiment 6 was conducted. Experiment 6 differed from Experiment 5 in two ways. First, as expected, the resonators were the correspondents of the proof steps of the proof (of the pri mary conclusion) obtained in Experiment 5, in place of those from Experiment 4. Second, the resonators that correspond to a step obtained with back demod ulation were given a weight of 4 rather than 2, to delay their consideration for initiating applications of an inference rule. (This added action resulted directly from a remark my colleague McCune made concerning the possible value of be ing able to apply all relevant demodulators at once in sequence, rather than by
Automating the Search for Elegant Proofs
1591
means of a set of sequences, each producing another step in the proof.) The ideal case explains the assignment of higher weight to correspondents of proof steps obtained with back demodulation. Imagine that the proof under con sideration (whose steps are to serve as resonators) contains a chain of clauses, A, B, C, D, E, obtained by applying back demodulation: back demodulation ap plied to A yields B, back demodulation applied to B yields C, similarly D from C, and E from D. In the ideal case, when the program (relying on the resonators from the just-mentioned proof) was seeking a shorter proof, the deduction of A would find already present and in the needed order the demodulators that had been used to respectively deduce B through E. In that event, A would be successively demodulated to yield B through E, but—and here is the point—A, B, C, and D would be intermediate steps, and they would therefore not be explicitly present in the proof to add to its length. Thus, by causing the program to delay considering for the focus of attention clauses that (in the proof whose steps are used as resonators) were obtained by back demodulation, perhaps such clauses would not be needed to complete a proof, preferably a shorter proof. Nevertheless, because the ideal case seldom occurs (regardless of the context), assigning such a high weight that the type of clause under discussion is in fact purged asks for serious trouble. After all, a clause that was obtained with back demodulation might have been a parent (when paramodulation was used) of a crucial clause; indeed, its absence might prevent the program from completing any proof, much less a shorter one. Of course, as one might predict from the preceding, Experiment 6 succeeded, completing a 260-step proof with retention of clause (5977), requiring 80 CPUseconds. After additional unsuccessful attempts to find a proof of length strictly less than 260, the conditions for a successful experiment were determined. For this experiment, Experiment 7, two changes were made from Experiment 6. First, as expected with this iterative methodology that relies on the results of earlier experiments to be used for later experiments, the resonators used in Experiment 7 were the proof steps of the proof completed in Experiment 6. The assignment of their values followed the pattern of Experiment 6. Second, because of some of the data gathered in the unsuccessful runs, the pick_given_ratio was assigned the value 7, rather than 4. The intention was to use complex clauses retained early in the run less often than was the case in Experiment 6. Another significant reduction was obtained, in that OTTER completed a 236-step proof, requiring 63 CPU-seconds and the retention of clause (5092). For but one data point that might be of interest, 30 steps of the 236-step proof are not among the 260-step proof obtained in the preceding experiment. 4.5
PERTURBING THE SEARCH SPACE: EXPERIMENT 15
Experiments 8 through 14 (as shown in Table 1) produced slow but steady progress. Experiment 15 (of the 26 detailed in [20]) reflected some of the diffi-
1592
Collected Works ofL&rry Wos
culty that was being encountered in attempting to find a proof of length strictly less than 207, the length of the proof completed in Experiment 14. For that ex periment, use of the dynamic hot list strategy was dropped, and the max_weight was reduced from an assignment of 28 to one of 20; 28 had been the assigned value until this experiment. The reduction was designed to perturb the search space by avoiding the retention of numerous complex clauses; recall that the ratio strategy, which was still in use and with an assignment of the value 9 to the pick-given_ratio, permits a program to focus on complex clauses, especially those retained early in the run. Also, consistent with the discussion of tendencies given in Section 3, the conjecture was that a smaller max_weight might enable OTTER to find a shorter proof. As expected, the resonators were the correspon dents of the proof steps of the proof completed in Experiment 14. However, as yet another example of returning to what worked earlier, those corresponding to a step obtained with back demodulation were assigned a weight of 4 rather than assigned a weight of 2 (as was the usual case in this sequence of experiments). Finally, for the first time, a direct attack on proof length was initiated, by adding the use of ancestor subsumption and (as almost always then required) also adding the use of back subsumption. With the use of ancestor subsumption, when a conclusion is drawn more than once, the lengths of the corresponding derivation path lengths are compared; the program retains the copy reached by the shorter path. The experiment succeeded nobly, completing a 185-step proof in 28 CPUseconds. Not only was the realistic goal reached—a goal that McCune had classed as impressive—but the result provided incentive to seek the goal he had suggested in jest, namely, a proof of length 100. 4.6
FOCUSING ON THE DUAL: EXPERIMENT 18
At this point, I encountered an impasse. Various moves produced no progress. Therefore, in desperation—or from the belief that far more was possible—I made an additional modification to the methodology. For Experiment 18, rather than choosing for resonators proof steps of the primary conclusion, the dual was used, from Experiment 17. A significant change was made regarding the values assigned to the resonators. Although those with ten or fewer symbols in them were still assigned the value 1, those used as demodulators and that showed back-demod in their history were assigned the value 2, with the remaining assigned the value 4. The notion was that progress might occur if clauses used as demodulators that in addition were obtained with back demodulation were given preference for being chosen as the focus of attention to drive the program's reasoning. Although still based far more on intuition than on analysis, the idea also was to have certain clauses used as demodulators available earlier than they might otherwise be, in turn perhaps reducing the need for the presence (in a completed proof) of clauses obtained from back demodulation. The pick-given ratio was assigned the value 9, and the
.Automating the Search for Elegant Proofs
1593
max.weight was assigned the value 14. Experiment 18 completed a 153-step proof of the primary conclusion in 16 CPU-seconds. Although still not the main goal, the dual was proved, with a 145-step proof, in 14 CPU-seconds. Not only was the proof length decreasing, so also was the required CPU time. In case one is tempted to assert that such is expected, various experiments in diverse fields do not support such an assertion. Nevertheless, when less and less CPU time is required, the methodology under study at least gives the appearance of power. Before continuing the account of the results of the sequence of experiments, I give some commentary concerning the role of the resonance strategy. Of the 153 steps of the proof of the primary conclusion obtained in Experiment 18, 24 of them are not among the resonators that were used in the experiment. Recall that the resonators correspond to the proof steps of the dual proved in Experiment 17. Thus—precisely in the spirit of the resonance strategy—one has a nice illustration of the value of using as resonators the proof steps of a related theorem when attacking the main target theorem. Perhaps even more persuasive, 21 steps of the 153 do not match at the resonator level any of the resonators that were used, where (as a reminder) matching in this context means treating all variables as indistinguishable from each other. In the given example, the resonance strategy can be viewed as having supplied a detailed outline of a proof, with OTTER finding the remaining needed steps. 4.7
V I C T O R Y : E X P E R I M E N T S 25 AND 26
Except for changes regarding the use of the hot list strategy, Experiment 25 was indeed similar to Experiment 24, of course using for resonators the proof steps of the completed proof (from the dual still) from Experiment 24. The (input) heat parameter was assigned the value 2 rather than 1 to permit recursion, and the following four clauses were placed in the input hot list. x x x x
* + + *
(y (y xfi xfi
+ * = -
z) « (x * y) + (x * z ) . z) - (x + y) * (x + z ) . 1. 0.
When the heat parameter is assigned the value 1, conclusions that result from consulting the hot list (whether one is using the hot list strategy or the dynamic hot list strategy) have heat level 1. If the program decides to retain a conclusion of heat level 1 and if the (input) heat parameter is assigned the value 2, then before another conclusion is chosen as the focus of attention, the heat-level-1 conclusion is used to initiate the search for conclusions of heat level 2 (with the
hypotheses needed to complete the corresponding application of the inference rule chosen from the hot list). Assigning a value of 2 to the heat parameter certainly has great potential for perturbing the search space, and, at this stage of the search for a "short" proof,
1594
Collected Works of Larry Wos
such perturbation seemed imperative. Placing additional clauses in the hot list, whether dynamically or input, also sharply increases the likelihood that the search will be perturbed. Indeed, the hot list strategy was formulated to enable a program to draw conclusions far sooner than it would otherwise; with the use of the strategy, the order in which conclusions are drawn is usually dramatically rearranged. The first proof completed in Experiment 25 was that of the dual, a proof of length 108, in 8 CPU-seconds. For the record, in the context of the dual, ground finally was lost. However, in 9 CPU-seconds, OTTER completed a 102-step proof of level 44 of the target conclusion. Almost as rewarding as finding such a short proof was the fact that the (primary) target had been proved in fewer steps than had the dual, which was not the case for so many of the experiments reported here. The result led to the conjecture that, with a few experiments based on fiddling with the parameters and option choices, a singular event would occur: OTTER would complete a 100-step proof of the target. (That numbers such as 100 would have appeal for a mathematician must amuse many.) For Experiment 26, I assigned heat the value 3 rather than 2 (or 1, as was the case for so many experiments). For resonators, I chose the proof steps of the proof completed in Experiment 25, that of the primary target and not the dual. The return to focusing on proof steps of the primary conclusion as the source for resonators was in part because of the length of its proof (102) in Experiment 25 and in part because it is the target of the study. The assignment of values to the resonators was that used in so many of the cited experiments, namely, 1, 2, and 4. I assigned max_weight the value 4 rather than 14. The assignment of the value 4 to max.weight merits a short explanation. The notion was that of preventing the program from retaining any clause that does not match (at the resonator level) one of the resonators, match in the sense that all variables are considered indistinguishable. With Experiment 26, my not-so-well hidden and actual goal was reached: OTTER completed a 100-step proof of level 44 of the target of the study in 6 CPU-seconds. As for the dual, its proof has length 99 and level 44, also requiring 6 CPU-seconds. The results of each experiment are summarized in Table 1. Because the level of a proof gives yet one more measure of its difficulty, the table also gives that information. (By definition, the level of input clauses is 0; the level of a deduced clause is one greater than the maximum of the levels of its parents.)
Automating the Search for Elegant Proofs
1595
Table 1: The Road from an 816-Step Proof to a 100-Step Proof Experiment Length Level Time (sec) 1 680 78 1613 2 606 78 3701 3 344 53 321 4 325 52 111 5 287 44 164 6 260 45 80 7 236 63 43 8 235 43 51 9 233 43 51 10 229 42 52 11 225 42 51 12 224 41 51 13 219 39 47 14 207 44 50 15 185 41 28 16 176 47 31 17 211 67 69 18 153 51 16 19 136 51 12 20 172 64 38 21 136 43 13 22 126 45 11 23 131 49 13 24 122 51 9 25 102 44 9 26 100 44 6
5
Obstacles t o Finding Short Proofs
To provide additional insight into why the path that began with a proof of length 816 and ended with a proof of length 100 was of necessity tortuous and, at the same time, to show why the various parameters and options exhibit tendencies only, the following general but brief discussion of obstacles is in order. (Chapter 3 of [Wos96a] focuses in detail on the obstacles to finding shorter proofs and the means to overcome them.) My purpose is twofold: first, by discussing various obstacles, to give some feeling for the difficulty of finding such a short proof; and, second, by touching on the consequences of different choices, to provide some understanding of the complexity of option choosing in this context. To begin with, the presence of the equality relation introduces an element often not present, namely, the need for demodulation (the use of rewrite rules
1596
Collected Wbris of Larry Wos
for simplification and canonicalization). Indeed, without demodulation, in the vast majority of cases (when equality is relied upon heavily), the program will drown in redundant information—the same fact expressed in too many tightly coupled ways; see [13]. For but one example, the program might deduce that a + b = c, that 0+(a + b) = c, that a + b = c + 0, and the like. For the theorem (DUAL-BA-3) featured in Sections 2 and 4, demodulation was indispensable, whether the goal was that of finding a shorter proof or, for that matter, finding any proof. When demodulators are present, the order in which demodulators are applied can be crucial, sometimes producing the needed equation to complete the assign ment, but sometimes preventing its discovery. To complicate matters further, without an arduous analysis, one must simply wait and see which demodulator is applied before which, for example, when one chooses to use a Knuth-Bendix approach (in which demodulators are adjoined during the run). The situation is rather like creating a grammar, a set of rules (the analogue of demodulators) that affect sentence structure; when the order of rule application changes, the sentence structure changes, not always in a manner that best fits the objective. The discovery and addition of demodulators are of course dependent on the choice of options and the assignment of values to the parameters. For a simple example, whereas a max.weight of 20 might enable a program to deduce and retain a demodulator (rewrite rule) expressed in 20 symbols, a max.weight of 16 might prevent its deduction, much less its retention. Regarding the tendency of smaller max.weight to promote the completion of a shorter proof, one thus sees that an obstacle does indeed exist and that the word tendency is appropriate. One also has a glimpse of why the path culminating in the discovery of a 100-step proof was of necessity tortuous. For a subtler example, assume that the max.weight is well chosen in the sense that it itself presents no obstacle, and consider two cases: an assignment of the value 6 to the pick_given.ratio, and an assignment of the value 7. On the surface, one not familiar with the ratio strategy might hastily assert that this small difference (6 versus 7) could be of little import. Actually, as shown by some of the experiments not detailed here, quite the opposite is the case. Indeed, in some experiments a change of 1 in the value assigned to the pick-given .ratio resulted in the program completing no proofs, whereas before the change a shorter proof was found. Again one sees how a pick_given.ratio of 6 rather than 7 having the (possible) tendency of promoting the completion of a shorter proof might be in conflict with another obstacle. Also, the fact that the search is so touchy (as illustrated) suggests that the path from the 816-step proof to the 100-step proof was understandably tortuous. Compared with the preceding two examples, regarding the seeking of shorter proofs, a direct attack based on using ancestor subsumption may yield far more sinister and unintuitive results. As a reminder, ancestor subsumption (which, for practical reasons, requires the use of back subsumption) compares the derivation path lengths when presented with two copies of the same deduced conclusion
Automating the Search for Elegant Proofs
1597
and retains the copy reached by the strictly shorter path. On the surface, one might surmise that ancestor subsumption is precisely what is needed to find shorter proofs and, perhaps further, that its use is guaranteed to find the shortest proof. Such a surmise is so far from the truth that the term "sinister" is in fact justified. Actually, the truth is captured by the aphorism Shorter subproofs do not necessarily a shorter total proof make. For a glimpse of what can go wrong when using ancestor subsumption (with finer detail given in Chapter 3 of [17]), begin by assuming A is a clause that is present in all proofs of the theorem under study. (The assumption, though not necessary, permits focusing on a simple example.) Second, assume that A occurs well before the end of any proof (also not a needed assumption). Next, assume that there exist two subproofs of A of respective lengths 5 and 3, where length is measured solely in terms of deduced clauses. Then assume that, other than A, the two subproofs share no steps in common. If two proofs of the theorem under consideration are completed respectively based on the two subproofs such that, ignoring A, all of the steps of the first subproof are present in the second total proof and such that none of the steps of the second subproof are present in the first total proof, then quite often the first total proof is shorter than the second (despite the presence of a longer subproof of A). Nevertheless, the use of ancestor subsumption does tend to result in the finding of a shorter proof, but, as the example shows, only tends to. Finally, although one is free to use or ignore ancestor subsumption, the situation regarding forward subsumption is radically different: The use of the latter is virtually required, if one is to avoid being buried in useless information. Nevertheless, trouble can occur with the use of forward subsumption, somewhat reminiscent of that discussed for ancestor subsumption; see [20] for some detail.
6
Applying t h e Iterative Methodology
At this point, one might naturally conjecture that the effectiveness of the method ology is derived (perhaps inadvertently) in some way from the specific theorem DUAL-BA-3. In the following, I present evidence that refutes this conjecture. The emphasis is still on theorems in which equality is the dominant relation. Each of the theorems discussed in this section was studied by McCune and Padmanabhan and proved by OTTER; each answers a question that was open before OTTER's attack. 6.1
TESTING IN THE SAME FIELD
The first bit of evidence still concerns Boolean algebra, specifically, a theorem in which equality is the only relation. In [8], the theorem is denoted by DUAL-BA5; with its proof, an open question was answered. The following clauses capture the theorem to be proved.
Collected Works of Larry Wos
1598
x ■ x. y + (x * (y * z ) ) - y. (<x * y> + (y * z ) ) + y - y. (x + y) * (x + y«) - x. y * (x + (y + z ) ) - y. ((x + y) * (y + z)) * y - y. (x * y) + (x * y«) - x. x + y « y + x. x * y ■ y * x. (x + y) + z - x + (y + z ) . (x * y) * z * x * (y * z ) . (A * B) + (A * C) !- A * (B + I B + BC !« A + AC.
X LI '/. L3 '/. Bl 7. L2 7. L4 7. B2
C) I (A + B) * B !- B
The theorem asserts that the equations L1-L4, Bl, and B2 are a basis for Boolean algebra. To prove the theorem, one is asked to show that three conclu sions hold, respectively corresponding to a distributive law, an absorption law, and the fact that x+xe is a constant, which explains the presence of the last of the given clauses. In particular, the clause is obtained by negating the and of the three conclusions. As in the theorem DUAL-BA-3, a starting point was supplied by McCune; he had obtained with OTTER a proof of length 119 and, as with DUAL-BA-3, was interested in a more elegant proof. The goal was to find a proof of length 100 or less. My approach was iterative, in the sense detailed earlier. The following summarizes the highlights. For the first experiment of significance, the pick_given_ratio was assigned the value 4, max_weight the value 23, and heat the value 1. The following two clauses were placed in the (input) hot list. x+y»y+x. x*y-y*x.
The assigned values were those used by McCune; the choice of clauses for list(hot) was based on the usual rule of including simple clauses. The resonators were the correspondents of the deduced steps of McCune's 119-step proof. In 115 CPU-seconds with retention of clause (12600), OTTER completed a proof of length 103. As is so typical in view of my interest in the hot list strategy, the experiment was then repeated with but one change: The heat parameter was assigned the value 2 rather than 1. In 206 CPU-seconds with retention of clause (18556), a 91-step proof was completed. Then, following the iterative aspects of the approach, the next experiment mirrored the preceding, except that I used as resonators those corresponding to 87 positive deduced steps of the 91-step proof. In 192 CPU-seconds with retention of clause (22114), OTTER completed a 63-step proof, a reduction in proof length that was (for me) indeed unexpected. (As I recall, further resonator replacement, unfortunately, led nowhere.) Then,
Automating the Search for Eieganfc Proofs
1599
as called for if one is curious about how well the cited tendencies hold, a slight variant of the experiment was conducted, in which the value 1 (instead of 2) was assigned to the heat parameter. In 48 CPU-seconds with retention of clause (9397), a 118-step proof was found. The last two cited results together provide a nice illustration of why a prac tical algorithm for finding shorter proofs would be difficult to produce—perhaps even out of reach—and they also provide some insight into why the methodology presented in this article was formulated. Specifically, although many experiments suggest that assigning the value 1 to the (input) heat parameter (when the hot list strategy is in use) is the most effective assignment, here one witnesses a far better result obtained with an assignment of the value 2. Again one sees that strategies and parameters exhibit tendencies only, rather than obey hard and fast rules. At the same time the first of the two items (the 63-step proof) suggests that an even more elegant proof might be obtainable if one were to conduct a quite extensive set of experiments. Such a set of experiments would involve assigning a small time limit for each run and simply trying numerous combinations of option choices and parameter assignments, still emphasizing the methodology that proved so successful with the theorem DUAL-BA-3. One can easily feel impatience at the prospect of having to set up one experiment after another, even though the use of OTTER is often so rewarding and even though the modifications in each case take little actual time. McCune again came to the rescue: With shell scripts, he wrote a program, super-loop, that systematically and sequentially sets up and runs experiments that focus on combinations of a variety of user-chosen options and assignments. To use super-loop, one attaches to the end of the designated input file a set of commands of the following type. '/. end of fixed p a r t meta_hot_set(2,3,4). x+y=y+x. x*y«y*x. (x + y) + z - x + (y + z ) . (x * y) * z a x * (y * z ) . end_of_list. assign(heat,2,1,0,3). assign(pick_given_ratio,4,5,6,7,8,9,10,12). assign(max_weight,5,10,15,20,25). flag(ancestor_subsume,set,clear). For example, using the remainder of the specified input file, in one experiment OTTER will assign max.weight the value 15 and the pick-given ratio the value 7, will set ancestor.subsume, and will use three of the given four clauses in the context of the hot list strategy.
Collected Works of Larry Wos
1600
With access to super-loop and with a quick review of the methodology used to yield the 100-step proof of DUAL-BA-3, I decided to run the sequence of experiments determined by the just-cited set of commands, with one addition, namely, the use of the resonance strategy. I chose for the resonators the steps of McCune's proof of length 119, assigning to each the value 2. Except for the cited variations from experiment to experiment, the approach was reminiscent of that used to obtain the 103-step proof discussed earlier. I set the time limit to 120 CPU-seconds. In the 209th experiment (directed by super-loop), a 30-step proof (of DUALBA-5) was completed in 15 CPU-seconds. The crucial option choices and pa rameter assignments that were used were an assignment of the value 8 to the pick_given_ratio and the value 25 to the max.weight and the setting of ancestor subsumption; the hot list strategy was not used. The cited result provides yet one more data point supporting the fact that elements such as max.weight obey tendencies only. 6.2
FURTHER TESTING
The success with DUAL-BA-3 (when studied with the methodology central to this article) coupled with the just-cited success with DUAL-BA-5 motivated the study of DUAL-BA-2; see [McCune96]. The theorem asserts that the following seven clauses axiomatize Boolean algebra. (x + y) * y ■ y. x * (y + z) ■ (x * y) + (x * z ) . x + xC - 1. p ( x , y , z ) - (x*yfi) + ((x*z) + (yC * z ) ) . p(x,x,y) * y. p(x,y,y) " x. p(x,y,x) - x. For the study, the following clause was also included because of the use of paramodulation. x » x.
The theorem asks one to prove three properties of Boolean algebra, the same three properties proved in DUAL-BA-5, namely, a distributive law, an absorp tion law, and the fact that x+xft is a constant. If one denies the and of the three properties, one obtains the following clause. A + (B * C) != (A + B) * (A + C) I (A * B) + B !- B I B * B« !- A * Afi. To initiate the study (with the goal of fulfilling McCune's request for a more elegant proof), I used his 135-step proof. The iterative approach led to
Automating the Search for Elegant Proofs
1601
the completion of a 93-step proof. Then, by using super-loop focusing on 89 deduced steps of the 93-step proof as resonators, an 86-step proof was com pleted. The proof was found in the 87th experiment directed by super-loop. Regarding options and values, the pick-given_ratio was assigned the value 5, the max.weight was assigned the value 8, ancestor subsumption was used, and the hot list strategy was not used; 6 CPU-seconds sufficed to produce the 86-step proof. Super-loop was again used, focusing on 82 deduced steps from the 86-step proof as resonators. In the third experiment, in 3 CPU-seconds with the value 12 assigned to the pick-given_ratio, the value 8 assigned to max.weight, and the use of ancestor subsumption, an 83-step proof was completed. I then had an inspiration, most likely resulting from some earlier experi ments with Robbins algebra. In the next use of super-loop, I merely cleared eq_units_both-ways. (If this flag is set, then unit equality clauses, positive and negative, can be stored in both orientations; see [7].) In the seventh experi ment, in 4 CPU-seconds, with the pick_given_ratio assigned the value 12, the max.weight assigned the value 16, and the use of ancestor subsumption, OTTER found an 81-step proof. By following the type of iteration just illustrated, a 77-step proof was com pleted, in 3 CPU-seconds. The experiment required assigning the value 12 to the pick-given_ratio, assigning the value 8 to the max_weight, assigning the value 1 to the heat parameter, using ancestor subsumption, and setting the flag eq_units_both_ways; only 3 CPU-seconds were required. The hot list contained the following four clauses. x + x
1. « y. = x. - x.
The result was obtained in the 221st experiment directed by super-loop. 6.3
BROADENING THE T E S T
To test the iterative methodology further, I shifted focus from Boolean algebra to quasilattices, focusing on the theorem denoted by QLT-3. McCune and his col league Padmanabhan [8] had obtained with OTTER the first known equational proof that the following self-dual equation can be used to specify distributivity. (((x " y) v z) " y) v (z " x) - (((x v y) " z) v y) " (z v x) (All earlier proofs are model-theoretic.) The proof I was asked to replace with a more elegant proof has length 183.
As with DUAL-BA-3, this theorem taxed the iterative methodology greatly, in the sense that many experiments were required to reach the best result, a proof of length 108 and level 14, completed in 166 CPU-seconds. Among the points of interest are the fact that McCune's proof (used as the starting point)
Collected Works of Larry Wos
1602
has level 24 and the fact that, in contrast to many of the so-called final experi ments in a study, this one required a few CPU-minutes. Also of interest, perhaps suggesting the difficulty of obtaining such a short proof and even suggesting the charm of the proof, the use of super-loop produced no further progress. For the researcher intrigued by the challenge of finding a shorter proof, the following clauses can be used to attack the problem. x - x. X
~ X -
X.
x - y - y - x. (x ~ y) " z ■ x " (y " z ) . X V X ■ X.
x v y ■ y v x. (x v y) v z ■ x v (y v z ) . (x * (y v z)) v (x * y) - x * (x v (y " z)) " (x v y) - x v (((x * y) v z) * y) v (z " x) A " (B v C) !- (A * B) v (A "
(y v z ) . (y " z ) . - (((x v y) ~ z) v y) " (z v x ) . C).
Then, I used the methodology to attack another theorem chosen by McCune from quasilattices. The theorem, QLT-5 [8], asserts that the following self-dual equation can be used to express modularity in quasilattices. (x " y) v (z " (x v y)) - (x v y) " (z v (x " y)) As in the preceding problem, the object was to find a proof even more elegant than that first obtained by McCune using OTTER, the first known equational proof. The so-called target proof has length 125 and level 19. To consider the problem, I used the following clauses. x » x. x * x » x. x * y - y * x. (x " y) " z - x * (y " z ) . X V X ■ X.
x v y » y v (x v y) v z x " (x v y) x v (x * y) X
*
X ■
x. » x v (y v z ) . = x. » x.
X.
x * y « y * x. (x * y) * z ■ x * (y * z ) . x * (x v y) - x. x v (x * y) » x. A " B ! - A * B.
Automating the Search for Elegant Proofs
1603
Before the use of super-loop was invoked, the basic methodology eventually led to a 95-step proof. Then, for some obscure reason, I initiated the attack with super-loop by using as resonators 101 deduced steps of a proof of length 106, obtained early in the use of the basic methodology. Perhaps the reason was that use of the 95-step proof would lead no further, that it would place the program in a cul de sac, whereas starting a bit further back might avoid the dead end. As in some of these experiments, the approach was to use super-loop to find (if successful) a shorter proof, then use the positive deduced steps of the shorter proof for the next use of super-loop. In Experiment 3745 of the last use of super-loop, with the pick-given .ratio assigned the value 9, the max-weight assigned the value 15, the heat parameter assigned the value 1, and the use of ancestor subsumption, 7 CPU-seconds suf ficed to yield a proof of length 30 and level 13. Two clauses, the following, were present in list(hot). X V X=X.
x v y«y v x. This result was singularly satisfying, and it demonstrated splendidly (because of occurring in Experiment 3745) the value of using super-loop. 6.4
A PROMISING SHORTCUT
With little doubt, one interested in finding a more elegant proof than that in hand might wish access to an approach that is far less complex, an approach that requires far fewer iterations and judgments to be made. The following two-step approach is easy to apply and often does yield impressive results. For the first step of the methodological shortcut, one merely takes the input file that produced the target proof and adds the use of the hot list strategy. For the (input) hot list, one chooses for its members so-called simple or short clauses (as was the case with DUAL-BA-3, the theorem that is the focus for much of this article). If successful, the program will find a shorter proof and will find it in far less CPU time. If the first step succeeds, for the second step, one then takes the input file used in the first step and adds the use of the resonance strategy. The resonators are the correspondents of the positive deduced steps (required to be unit clauses) of the proof obtained in the first step of the abridged approach. The results of applying the just-cited two-step approach to various theorems are next in order. Quite naturally, the first theorem that was attacked was DUAL-BA-3; see Section 4. Three experiments were conducted. The first was designed to find (if possible) a proof without the use of the hot list strategy or the resonance strategy. The second experiment, as dictated by the two-step approach, differed from the first only in its use of the hot list strategy. The members of the hot list were the following. x + x« - 1.
Collected Works of Larry Wos
1604 x * xfl - 0.
The third, again dictated by the approach, added the use of the resonance strategy, where the resonators were the correspondents of the positive deduced steps yielded by the second experiment. Of course, I was somewhat prepared for the possibility that any of the three experiments might fail to complete a proof. Fortune smiled: Each of the three experiments produced a proof. In order (on the equivalent of a SPARCstation-10), the CPU times in seconds are 6681, 585, and 242. The respective lengths are 803, 515, and 475. The ensemble of the three experiments suggests that the shortcut is promising, having reduced the proof length from 803 to 475. (Regarding the proof length of 803, in contrast to the expected 816, apparently the precise input file used by McCune is lost to antiquity.) Next, three corresponding experiments were conducted for the theorem DUALBA-5. As expected, the first experiment produced a proof of length 119; it was completed in 52 CPU-seconds. Two clauses, the following, were then placed in the hot list. x+y-y+x. x*y=y*x.
The second experiment required 100 CPU-seconds to complete a proof of length 160. These results were a surprise, for adding the use of the hot list strategy so often reduces the CPU time and yields a proof of smaller length. I was most fascinated. Also, I was most intrigued to determine what would happen if I used so many resonators, those corresponding to the positive deduced steps of the 160-step proof (only 101 were used in the first visit to this theorem, as discussed in Section 6). Therefore, as if nothing untoward had occurred, 159 resonators were adjoined, and the third experiment was conducted. In 36 CPU-seconds, a proof of length 47 was found. The result, which vindi cated the shortcut, is indeed satisfying, if one takes into account that the best proof obtained with the basic methodology followed by the use of super-loop has length 30. The next target was DUAL-BA-2. The first experiment behaved as expected in regard to proof length, completing a proof of length 135 in 55 CPU-seconds. The following clauses were then placed in the hot list for the second experiment. x + x® ■ p(x,x,y) p(x,y,y) p(x,y,x)
1. - y. - x. - x.
Reminiscent of the preceding study, the second experiment required 229 CPUseconds to complete a proof, one of (not enticing) length 150. When 144 res onators were then adjoined, corresponding to positive deduced steps of the 150step proof, 36 CPU-seconds were sufficient to find a proof of length 120.
Automating the Search for Elegant Proofs
1605
Regarding further evidence, two additional theorems were studied with the two-step shortcut. Each is found in [8], and by proving each, an open question was answered. The first, QLT-1, is from quasilattices. The following clauses were used to attack this theorem. X X
= X. X
=
X.
y = y (x - y) -
X
X
V X
»
"
X.
Z
*
X
- (y - z ) .
X.
V y - y V X. (x v y) v Z * X v (y v z ) . (x " (y v z ) ) v (x * y) - x • (y v z ) . (x v (y " z ) ) - (x v y) - x i' (y * z ) .
X
x " (y v (x " z)) = x " (y v z ) . A * (B v C) != (A * B) v (A " C). The first nine clauses axiomatize quasilattices. The theorem asserts that the tenth clause is a new way to express distributivity in quasilattices. The eleventh clause is the denial of distributivity. McCune had obtained a 47-step proof. In the three experiments with the shortcut methodology, I obtained proofs of length 47, 46, and 61, respectively. The time required was 3, 6, and 5 CPU-seconds. The conclusion is that the shortcut was far from effective for this theorem. However, the results naturally led to an experiment in which the hot list was trimmed from the following four members to just two, the first and the third. X
~ X »
X.
x - y = y - x. X V X "
X.
x v y » y v x. This move led to a proof of length 19, completed in 5 CPU-seconds. Then when resonators were adjoined that correspond to the positive deduced steps of the 19-step proof (to satisfy the natural curiousity), in 3 CPU-seconds, a proof of length 21 was found. Finally, I thought one additional experiment must be run. The reasoning was that if progress was made by trimming the size of the hot list, and if some of that progress was then lost, perhaps a reduction in max.weight was in order with all other items left unchanged. After all, as noted in Section 3.1 with the discussion of Tendency 1, the use of a smaller max.weight tends to promote the finding of a shorter proof. In 1 CPU-second, after changing the max-weight from 19 to 15, OTTER completed a proof of length 8. (For the researcher who wonders how wild is the space of proofs, one of the experiments focusing on this theorem yielded a proof of length 386, providing an impressive contrast to the proof of length 8.)
1606
Collected Works of Larry Wos
One perhaps unfortunate conclusion from this ensemble of experiments fo cusing on QLT-1 is that even the shortcut methodology benefits from tinkering. Yet, if one steps back from the study a bit, tinkering should be expected, for mathematics and logic so seldom admit an algorithmic approach when the prize is an elegant proof. Before turning to the last theorem on which the promising shortcut was tested at this time, I answer the following natural question that an experienced researcher might ask. What results are obtained if one conducts three experi ments using the effective max.weight (15) rather than that which was originally used (19) and using the smaller hot list? If, in addition to testing the shortcut, one has the goal of finding an elegant proof, the new set of three experiments begins rather explosively, completing in 5 CPU-seconds a proof of length 25. The second experiment brings the excitement and the attempt to a halt, pro ducing no proof when the smaller hot list is then added. Other paths are left as research topics and challenges, especially where the goal is to find a proof of length strictly less than 8. The final theorem that was studied in the context of the promising shortcut is denoted by McCune as LT-8. This theorem from lattice theory says that the meet (intersection) operation is unique in the sense that, given a set L of elements that have the same join (union) operation and two possibly different meet operations, the two meet operations are the same. The following clauses were used to study this theorem, again suggested by McCune. x • x. '/• (v,~) is a lattice. X " X » X.
x - y - y - x. (x * y) " z = x * (y " z). X V X » X.
x v y * y v x. (x v y) v z = x v (y v z). x * (x v y) = x. x v (x ~ y) " x . '/, equations to make (v,«0 a lattice. X * X - X.
x * y » y * x. (x * y) * z = x * (y * z). x * (x v y) = x. x v (x * y) ■ x. A * B !- A * B. McCune's proof, used as a target, has length 19. In the first experiment (in which neither the hot list strategy nor the resonance strategy was used), 50 CPU-seconds were required to produce a proof of length 19. For the second experiment, use of the hot list strategy was added with the following clauses in
Automating the Search for Elegant Proofs
1607
the hot list. X " X » X. X V X » X.
x - y = y - x. x v y = y v x. The required time increased to 119 CPU-seconds, completing a proof of length 18. For the third experiment, 18 resonators were adjoined, the correspondents of the deduced steps of the just-cited proof. Although but 1 CPU-second sufficed, the resulting proof still has length 18; in that it has a different level, the proof is different from the 18-step proof yielded by the second experiment. In the second 18-step proof, commutativity of join was not required. Of the deduced steps, five are not shared by the two 18-step proofs. Annoyed by the insignificant decrease in proof length, influenced by what occurred in the study of QLT-1, and motivated by the objective of presenting McCune with a more elegant proof and the objective of offering an interesting challenge, I conducted two tightly coupled additional experiments. First, I re moved from the hot list the two clauses that encode commutativity. Such an action is in the spirit of Tendency 5 of Section 3.2. In 55 CPU-seconds, OT TER produced a 19-step proof. The proof contains the same deduced clauses as the 19-step proof obtained when neither the hot list strategy nor the resonance strategy was used. Then, I used the 19 deduced steps as resonators, noting that three of them are not among the 18 that were used in the third experiment (that which yielded an 18-step proof). Success occurred: In 1 CPU-second, a proof was completed, one of length 14. Immediately, this advance motivated me to tinker just a bit more, seeking an extra drop (or perhaps more) of satisfaction. When the pick-given_ratio (which had been assigned the value 4) was commented out, causing OTTER to choose as the focus of attention clauses by weight only, 1 CPU-second sufficed to complete a 13-step proof.
7
Research, Review, and Remarks
The focus in this article has been on automating the search for elegant proofs. Al though a precise definition of elegance is debatable, four properties are studied: proof length, term structure, variable abundance, and deduced-step complexity. The first of these four is featured. 7.1
P R O O F LENGTH—HAZY AND COMPLICATED
Clearly, proof length is somewhat hazy. For example, frequently in a mathe matics book, one finds a proof in which symmetry and transitivity of equality are used implicitly only. For OTTER, their use would normally be explicit and, if used, add to the proof length. On the other hand, the use of demodulators
1608
Collected Works of Larry Wos
(rewrite rules, for example, in the context of canonicalization) does not ordi narily contribute to proof length, for mathematics or for OTTER. Proof length in this article and for OTTER measures the number of specific steps that are present but that are not part of the input. (Sometimes, the cited length of a proof is not what is expected, for proof length does not measure, for example, the number of equality substitutions; in particular, as noted, the substitutions resulting from the use of demodulators do not contribute to proof length. If one wishes to see explicitly all substitutions and their effect, as one might with a preference for a different definition of proof length, OTTER offers the needed feature, namely, build-proof_object, which gives all of the gory details.) There fore, here most or all ambiguity regarding this aspect of elegance is removed. Proof length-especially when the goal is to find a "short" proof—naturally brings one face to face with cul de sacs and, most piquant, the need to follow a tortuous path that is often indeed narrow. For example, some experiments yield no proof at all unless the pick-given_ratio is assigned the value 11; all other values yield nothing. For a second example—especially because of the appeal of proving lemmas early—intuitively, one might conjecture that it is always profitable to have the program focus on the clauses in the initial set of support before focusing on any deduced clause (to drive the reasoning); in fact, such a move sometimes leads to finding no proofs at all. Still, as discussed in Section 3, tendencies do exist regarding the values assigned to parameters and the choice and use of strategies. One might wonder why the task of seeking shorter proofs is so complicated. At least for problems in which equality is featured, the answer rests in part with demodulators (rewrite rules). Indeed, even a small change in the order in which demodulators are found and applied can dramatically change the program's performance, the paths pursued, and the results obtained. Nevertheless, such diverse behavior is typically what is required, if the objective is to find "short" proofs. Often, the objective is reached only by radically perturbing the search space. Fortunately, an iterative methodology (featured in this article) now exists for automating the search for shorter proofs. Although the methodology is clearly not an algorithm—a practical algorithm appears to be out of reach—the evi dence given here suggests that the methodology can be indeed effective. One of the more graphic illustrations concerns starting with a proof of length 816 (of the theorem DUAL-BA-3) and, by applying the approach, eventually obtaining a proof of length 100. The approach relies heavily on OTTER, McCune's au tomated reasoning program. In Section 4, the key experiments from among the 26 fully covered in [20] are detailed with commentary; they are summarized in Table 1 at the end of that section.
Automating the Search for Elegant Proofs
7.2
1609
RESEARCH PROBLEMS
To add to the interest in the study of automating the search for elegant proofs, I offer here several challenging research problems. Solutions to these problems will increase the power of automated reasoning programs. Research Problem 1. My main research, in the context of elegance, has fo cused on theorems in which equality was the only or the dominant relation and on theorems featuring the use of condensed detachment. The research problem to solve (and one that is related to the material in this article) asks for tech niques for finding elegant proofs when the equality relation is heavily mixed with relations not concerned with equality, as occurs, for example, in the study of set theory. Just as reported here, this area of research most likely will meet with cul de sacs and occasionally require reliance on a proof far from the best in hand. Such is the nature of searching for elegant proofs. Research Problem 2. One of the most difficult problems to solve concerns correctly identifying the key ingredient for finding elegant proofs, for proving a new theorem, and for completing a difficult assignment when one is relying on the assistance of an automated reasoning program. As one expects, my answer is access to a variety of strategies, some to restrict the reasoning and some to direct it. For evidence, the following experiment suffices. Again, the theorem in focus is DUAL-BA-5, but with three changes from the earlier discussion. The restriction on paramodulation that blocks paramodulation into nonunit clauses was removed, the use of what is called basic paramodu lation [1] was added, and the focus was changed to a proof of distributivity only. Using one of his automated reasoning programs, McCune produced an 18-step proof. He then used Veroff's hints strategy [11], using as hints the steps from the 18-step proof, and obtained a 17-step proof of DUAL-BA-5. Finally, I then used super-loop with McCune's approach (of relying on hints), and OTTER found a 16-step proof of DUAL-BA-5; OTTER also found a 13-step proof of distributivity. Research Problem 3. Regarding research of a more general nature, (in the context of automated reasoning) clearly merited is solving the problem of for mulating strategies aimed at dramatically reducing the CPU time required to complete proofs. Admittedly, such reductions do not impress all researchers. However, often overlooked is the fact that theorems and, even more significant, open questions that were out of reach will become within reach of such a pro gram as the required CPU time decreases sharply. Rather than with the reduction in time—which one might suggest is obtain able merely by using a more powerful computer—the significance rests with the cause of the reduction. Specifically, if the saving in time results from sharply reducing the number of paths to be traversed—which is not addressable with
Collected Works of Larry Wos
1610
a faster computer—progress is occurring. Indeed, as Bledsoe remarked more than once, the fastest computer is not the solution, for the space of possible conclusions to be deduced is indescribably large. Were it not for access to strategy, (it seems clear to me) an attack on deep questions or hard problems would almost always fail, for there exist far too many ways to get lost. Despite years of experience and the accrued intuition, even the talented mathematician or logician can get lost, often pursuing one false lead after another. The use of strategy, such as the set of support strategy, decreases the likelihood of the program getting lost and tends to focus its attention on more profitable paths of inquiry. Ideally, additional strategies offering power somewhat comparable to that offered by the set of support strategy would mark a most significant advance for automated reasoning. 7.3
T H R E E K E Y IDEAS
Three ideas are central to the research presented in this article. First, with out the use of a program such as OTTER, many elegant proofs would remain undiscovered. Second, I am certain that the greatest advances in automated reasoning when measured by the power to prove a theorem will result from the formulation of yet more strategies, some to restrict the reasoning and some to direct it. Third, OTTER functions as an invaluable research assistant for the study of mathematics or logic. I can say with conviction that if one learns how to use OTTER, one will in the long run save many more hours than one spends in the learning, especially if one's interest is an area of abstract algebra.
Acknowledgments I thank Ross Overbeek for his invaluable suggestions regarding this article, es pecially those concerning the inclusion of material on tendencies. I also thank Robert Boyer and Robert Veroff for their excellent guidance. Finally, I thank Gail Pieper for her significant insights.
References 1. Bachmair, L., Ganzinger, H., Lunch, C , Snyder, W., Basic paramodulation and superposition, in Proc. CADE-1J, Lecture Notes in Artificial Intelli gence, Vol. 607, D. Kapur (ed.), Springer-Verlag, Berlin, 1992, pp. 462-476. 2. Kalman, J., A shortest single axiom for the classical equivalential calculus, Notre Dame J. Formal Logic, 19 (1978), 141-144. 3. Kalman, J., Condensed detachment as a rule of inference, Studia Logica 42 (1983), 443-451.
Automating the Search for Elegant Proofs
1611
4. Lukasiewicz, J., The equivalential calculus, in Jan Lukasiewicz: Selected Works, L. Borkowski (ed.), North-Holland, Amsterdam, 1970, pp. 250-277. 5. McCune, W., OTTER 2.0 Users Guide, Technical Report ANL-90/9, Argonne National Laboratory, Argonne, Illinois, 1990. 6. McCune, W., What's New in OTTER 2.2, Technical Memorandum ANL/MCS-TM-153, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, 1991. 7. McCune, W., OTTER 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, Illinois, 1994. 8. McCune, W., and Padmanabhan, R., Automated Deduction in Equational Logic and Cubic Curves, Lecture Notes in Computer Science, Vol. 1095, Springer-Verlag, Heidelberg, 1996. See http://www.mcs.anl.gov/home/mccune/ar/monograph/ for additional information. 9. Padmanabhan, R., and Quackenbush, R. W., Equational theories of algebras with distributive congruence, Proc. of A MS, 41(2) (1973), 373-377. 10. Scott, D., Private communication, 1990. 11. Veroff, R., Using hints to increase the effectiveness of an automated reason ing program: Case studies, J. Automated Reasoning, 16(3) (1996), 223-239. 12. Wos, L., Automated reasoning and Bledsoe's dream for the field, in Auto mated Reasoning: Essays in Honor of Woody Bledsoe, R. S. Boyer (ed.), Kluwer Academic Publishers, Dordrecht, 1991, pp. 297-345. 13. Wos, L., Overbeek, R., and Lusk, E., Subsumption, a sometimes underval ued procedure, in Festschrift for J. A. Robinson, J.-L. Lassez and G. Plotkin (eds.), MIT Press, Cambridge, Mass., 1991, pp. 3-40. 14. Wos, L., and McCune, W., The application of automated reasoning to ques tions in mathematics and logic, Annals of Mathematics and Al, 5(1992), 321-370. 15. Wos, L., The resonance strategy, Computers and Mathematics with Appli cations (special issue on automated reasoning), 29 (2) (1995) 133-178. 16. Wos, L., Searching for circles of pure proofs, J. Automated Reasoning, 15(3) (1995), 279-315. 17. Wos, L., The Automation of Reasoning: An Experimenter's Note book with OTTER Tutorial, Academic Press, New York, 1996. See http://www.mcs.anl.gov/people/wos/index.html for input files and infor mation on shorter proofs.
1612
Collected Works of Larry Wos
18. Wos, L., OTTER and the Moufang identity problem, J. Automated Reason ing, 17(2) (1996), 259-289. 19. Wos, L., The power of combining resonance with heat, J. Automated Rea soning, 17(1) (1996) 23-81. 20. Wos, L., Experiments Concerning the Automated Search for Elegant Proofs, Tech. Memo ANL/MCS-TM-221, Mathematics and Computer Science Di vision, Argonne National Laboratory, Argonne, Illinois, 1997.
Otter The CADE-13 Competition Incarnations
WILLIAM M c C U N E and LARRY WOS* Mathematics and Computer Science Ditnsion, Argonne National Laboratory, Argonne, IL, USA. email: { mccune, wos} Qmcs. anl. gov Abstract. This article discusses the two incarnations of Otter entered in the CADE-13 Automated Theorem Proving Competition. Also presented are some historical background, a summary of applications that have led to new results in mathematics and logic, and a general discussion of Otter. Key words: Automated theorem proving, competition, Otter, automated rea soning, equational deduction, paramodulation, resolution.
1
Introduction
Otter [17, 19] is an automated deduction system for first-order logic with equal ity. Two versions of Otter were entered in the CADE-13 Automated Theorem Proving System Competition, and the main purpose of this article is to give a detailed presentation of our entries. The first version (called Otter-304z) is es sentially Otter 3.0.4 operating in its autonomous mode, and the second (called Otter-Wos) is a minor variation of Otter 3.0.4. Because this article also serves as a general reference and overview of Otter, we also present some background, a summary of applications of Otter, and features of Otter that are not directly related to the competition. 1.1
HISTORICAL BACKGROUND
Research in automated theorem proving at Argonne started in 1962. We have always placed great importance on implementing and testing our ideas, so many 'Supported by the Mathematical, Information, and Computational Sciences Division sub program of the Office of Computational and Technology Research, U.S. Department of Energy, under Contract W-31-109-Eng-38. Reprinted from the Journal of Automated Reasoning, Vol. 18, no. 2, 211-220 (April 1997) with kind permission from Kluwer Academic Publishers.
1614
Collected Works of Larry Wos
computer programs have been written. The first generation consisted mainly of two programs. PG1 (Program 1), designed by Dan Carson, George Robinson, and Wos in 1963, had the unit preference strategy [37]; experimentation with PG1 led to the set of support strategy [39] and demodulation [40]. The program RG1 [38], designed by George Robinson and Wos in 1967, had binary resolution, factoring, paramodulation, demodulation, and the set of support strategy. The second generation, started by Ross Overbeek in 1970, was based on the NIUTP (Northern Illinois University Theorem Prover) series [10, 11], which evolved into AURA (Automated Reasoning Assistant) and its variants [28], with contributions from Brian Smith, Rusty Lusk, Bob Veroff, Steve Winker, and Wos. The NIUTP/AURA generation (written mostly in IBM assembly lan guage, with some PL/1) included the first high-performance implementations of hyperresolution, demodulation, and paramodulation, and it introduced unitresulting resolution and weighting. The result was the first practical set of pro grams in that their use led to answers to several open questions in equivalential calculus [41], ternary Boolean algebra [30] and semigroups [32]. The third generation of Argonne theorem provers, started in 1980 by Over beek, Lusk, and McCune, consisted mostly of the LMA [9] (Logic Machine Architecture), a toolkit (written in Pascal) for building deduction systems, and ITP (Interactive Theorem Prover), constructed with LMA. The function ality of ITP was similar to that of AURA. The motivations for LMA and ITP were sound software engineering and portability. Several experimental theorem provers based on technology for compiling logic programs were also part of the third generation. Otter is a member of the fourth generation. We started writing code for a new theorem prover in June 1987, and by December it was useful for our research on inference rules and search strategies. The program was later named Otter (Organized theorem-proving techniques for effective research). Other members of the fourth generation are ROO [8] (Radical Otter Optimization), by John Slaney, Lusk, and McCune, which is a parallel version of Otter; FDB [1] (For mula DataBase), by Overbeek and Ralph Butler, which is a unification and in dexing toolkit; MACE [16] (Models And CounterExamples), by McCune, which searches for finite models; and EQP [18] (EQuational Prover), by McCune, a program for equational logic, which incorporates associative-commutative uni fication. Aside from effective inference rules and strategies for proving theorems, speed and portability were the main considerations in building Otter, so C was used. The functionality of AURA and ITP, especially the inference and search meth ods, was quite useful in practice, so Otter retained most of it. However, low-level algorithms such as indexing techniques [13] were improved, resulting in sharp speedups over ITP. Major releases of Otter occurred at CADE-9 in May 1988 (version 0.9), in January 1989 (version 1), in March 1990 (version 2), and in January 1994 (version 3). The current version is 3.0.4, released in August 1995.
Otter: The CADE-13 Competition Incarnations
1.2
1615
APPLICATIONS OF OTTER
Otter has been applied to several areas of mathematics and logic, and many new results have been obtained with its use. Examples of such results are the existence of fixed point combinators in fragments of combinatory logic [23, 33], logic calculi with condensed detachment [25], single axioms for group calculi [12, 15], single axioms for group theory and subvarieties [14, 24, 4, 2, 5], single axioms for ternary Boolean algebra [27], equational theorems about cubic curves [26], single axioms for lattice-like algebras [20], self-dual bases for groups and for lattices [21], implicational axioms for groups and Abelian groups [22], Robbins algebra [18, 31], Moufang loops [21, 6, 7], illiative combinatory logic [3], and proofs with particular properties [34, 35, 36].
2
Architecture
Otter reads an input file containing a set of formulas and some control infor mation. The set of formulas represents the theory and denial of the conclusion (all proofs are by contradiction), and the control information consists of vari ous switches and parameter settings for specifying the inference rules and search strategies. As Otter searches, it writes information (including the proof, if found) to an output file. We now summarize how Otter works. See the manual [17] for details on the material in this section. 2.1
LOGIC
Otter applies to statements in classical first-order (unsorted) logic with equality. It accepts as input either clauses or quantified formulas. Quantified formulas (which may contain V, 3, V, A, ->, —»,«-») are immediately transformed to clauses. All of Otter's inference rules and search algorithms operate on clauses. 2.2
INFERENCE RULES
Otter's main inference rules are based on resolution or paramodulation. The res olution inference rules known to Otter are binary resolution, hyperresolution, negative hyperresolution, unit-resulting resolution, and linked unit-resulting res olution. Equational deduction is done with binary paramodulation and various forms of demodulation. (We sometimes classify demodulation as a rewriting rule rather than as an inference rule.) Factoring is not built into resolution and paramodulation; it is considered to be a separate inference rule. Otter has one additional inference rule (called gL) for problems about cubic curves in algebraic geometry. It is a generalization rule that applies to equations and is derived from the property of cubic curves that for some kinds of statement
Collected Works of Larry Wos
1616
P, if P holds at some point on a cubic curve, then P holds at every point on the curve. See [21] for details and applications. 2.3
CONTROL
An Otter process can be viewed as a closure computation with redundancy con trol; that is, Otter attempts to compute the closure of a set of input statements under a set of inference rules, applying deletion rules (subsumption and demod ulation) that typically preserve logical completeness of the inference system. The computation is driven by a loop. 2.3.1
The Inference Loop
Otter maintains three lists of clauses. Usable. These clauses are available for application of inference rules. They actively participate in the search. Sos. These clauses are waiting to participate in the search through applica tion of inference rules. (Members of sos that are equations may be participating as demodulators.) Demodulators. These are equations that are used as rewrite rules. Mem bers of demodulators may occur in usable or sos as well. The inference loop operates mainly on clauses in lists sos and usable: While sos is not eapty and no refutation has bean found, 1. Select a clause, the given-clause, in sos; 2. Move given-clause froa sos to usable; 3. Infer and process new clauses using the inference rules in effect; each new clause oust have the given-clause as one parent and aeabers of usable as other parents; retained clauses are appended to sos; End of while loop.
The processing of inferred clauses (Step 3 above) involves many optional retention tests and other procedures; the most important ones are listed here. Given newly inferred clause C, 1. Deaodulate C (with aeabers of list Demodulators); 2. Orient equality literals in C; 3. Discard C if weight(C) > the aax-weight paraaeter; 4. Discard C if it la subsuaed by a aeaber of usable or sos; 5. Check if C conflicts with any clause in usable or sos; 6. If C has passed the retention tests, then a. (optional) if C is an oriented equation, append it to Demodulators and rewrite all aeabers of usable or sos; b. discard aeabers of usable or sos subsuaed by C; c. (optional) Factor C;
The inference loop can be seen as a simple implementation of the set of support strategy, because no inferences are drawn in which all of the participants are in the initial usable list. That is, the initial sos list is the initial set of support: all lines of deduction must start with a clause in the initial sos list.
Otter: The CADE-13 Competition Incarnations
1617
The loop can also be used to drive a Knuth-Bendix completion procedure. If the initial sos list consists of the initial set of equations, all equations (input and derived) can be oriented into terminating demodulators, a restricted form of paramodulation is used, and the inference loop terminates because sos is empty, then the resulting usable list is a complete set of reductions for the theory. Our strategies for equational theorem proving evolved separately from the KnuthBendix completion, but in some cases they are similar. Two term ordering methods are available for orienting equations and guaran teeing termination of demodulation. The lexicographic recursive path ordering (LRPO) is used in the autonomous modes, with default symbol precedence con stants X high-arity -<•••-< binary -< unary, and within arity, the lexicographic ASCII ordering. The ad hoc method (which does not always guarantee termina tion) orders terms by user-assigned weights. Selection of the given clause in Step 1 of the inference loop is the most important aspect of the search process; it is the next path to explore. The default selection is the smallest clause in sos, which we call best-first search. Instead, the user may specify a breadth-first search, in which the first clause in sos is selected (sos operates as a queue). We have found that a combination of best-first and breadth-first search is frequently quite valuable, and one of Otter's parameters, the pick-given-ratio, can be used to specify a ratio: a value of n means that through n iterations of the inference loop, the smallest clause is selected, then in the next iteration the first clause is selected, and so on. Discarding large clauses (Step 3 in the processing of newly derived clauses) interferes with completeness, but it is quite important in practice. If maxweight= oo, clauses are retained and appended to sos at a much higher rate than they are removed as given clauses. As a result, most retained clauses never enter the search, and memory is wasted. If many of those clauses become de modulators, much time is spent (and wasted) using them to try to demodulate previous clauses. Thus, a good value for the max-weight parameter is impor tant to achieving a well-behaved search [18]. We frequently make several initial searches, varying the max-weight parameter until a good value is found. When multiple searches are not done (in particular in Otter's autonomous mode) the parameter control-memory is used to dynamically adjust the max-weight pa rameter based on the amount of memory available. 2.3.2
The Autonomous Mode
Although Otter is not an interactive program, we typically use it in an interactive way. We run a search, examine the output, change the switches and parameters to adjust Otter's behavior, then start another search, and so on. Because this kind of multiple search is still an art form, we have not attempted to automate it. However, Otter has a fully automatic mode, called the autonomous mode, which is useful for inexperienced users, easy problems, situations in which Otter is called from another program, and comparison with other programs.
Collected Works of Larry Wos
1618
In the autonomous mode, Otter unconditionally sets several options and partitions clauses into the usable and sos lists. Otter then checks its input for the following properties: whether all clauses are prepositional, whether all clauses are Horn, whether the equality relation is present, and, if equality is present, whether equality axioms are present. The autonomous mode algorithm is the following. set Bax-aea to 12 Megabytes; sat control-aaaory flag; •at pick-givan-ratio parameter to 4; sat procass-lnput flag; placa positiva clauses In sos list; placa nonpositiva clausas in usable list; if (all clauses are propositional) than sat propositional flag; sat hyperresolutlon flag; else if (nonunits are present) then set hyperresolutlon flag; if (all clausas are Horn) then clear ordered-hyperresolution flag; else set factoring flag; sat unit-deletion flag; if (equality is present) then set knuth-bendix flag; if (equality axioms are present) then clear paramodulation flags;
The max-mem parameter limits the amount of memory available for stor age of clauses and related data structures, the process-input flag causes input clauses to be processed as if they were derived clauses, the propositioned flag causes several optimizations particular to propositional clauses to be in effect, the ordered-hyperresolution flag (set by default) prevents satellites from resolv ing on nonmaximal literals, the unit-deletion flag causes each unit clause, say P, to be used as a rewrite rule P = TRUE to simplify nonunit derived clauses, and the knuth-bendix flag causes several additional options to be set so that the search resembles Knuth-Bendix completion (see [17]). The two competition entries named Otter-304z were run in the (ordinary) autonomous mode. 2.3.3
The Auto- Wos Mode
A different version of the autonomous mode, called auto-wos, was created specif ically for the CADE-13 competition. The first reason for this was so that Otter could compete in the "monolithic" class, which does not allow different modules of code to be called based on properties of the input clauses. The second reason was so that we could use a different paramodulation strategy. The auto-wos algorithm is the following. 1. set flags process-input, control-aenory, hyperresolutlon, knuth-bendix, para-froa-units-only, unit-deletion, factor; 2. clear flag index-for-back-deBOd-flag;
Otter: The CADE-13 Competition Incarnations
1619
3. set pick-given-ratio paraoeter to 4; 4. if (every positive clause in sos is ground) then move all positive clauses to sos;
The property that makes auto-wos mode "monolithic" is that (nearly) all modules used for some type of input in ordinary autonomous mode are used for all types of problem in auto-wos mode. For example, clauses are indexed for paramodulation, and paramodulation is called even if no equality literals are present; and hyperresolution and factoring are called even if no nonunit clauses are present. The flag index-for-back-demod (default set if back demodulation is in effect) causes all terms in all clauses to be indexed so that they can be found if they can be rewritten by a newly derived demodulator. We clear this flag because indexing all terms is an expensive operation and very wasteful if no equality is present. The second, and more practical, feature of auto-wos mode is that paramod ulation from nonunit clauses is prohibited. This restriction is incomplete in general, but it is quite useful in practice. The competition entry named Otter-Wos was run in the auto-wos mode. 2.4
TUNING FOR THE COMPETITION
For both Otter-304z and Otter-Wos, the value of the max-mem parameter was increased from 12 megabytes to 20 megabytes. This change affects performance because it can change the set of retained clauses; in particular, the behavior of the control-memory feature, which automatically adjusts the max-weight pa rameter, depends on the value of max-mem. The auto-wos mode (thus Otter-Wos) was tuned with the 391 "Eligible Mixed" set of TPTP problems. First, we experimented with the initial set of support. The TPTP classifies each input clause as "axiom", "hypothesis", or "conjecture". (Otter-304z does not use this information.) To decide the initial sos list, we experimented with several rules of the form sos «— hypothesis U conjecture; if P(sos), then sos «— sos U /(axiom); for various properties P and functions / . In the end, we used the rule with P = "all positive clauses are ground" and / = "positive clauses". Second, we experimented with the hot list strategy [36], which causes Otter to give special emphasis to key clauses; results indicated that our current hot list strategies are best used in the iterative-search mode rather than in autonomous modes, so the hot list strategy was not used for the competition.
3
Implementation
Otter is written in the C programming language, which was chosen for exe cution speed and portability. It contains about 35,000 lines of code (including
1620
CoJJected Works of Larry Wos
comments). Clauses and terms are stored in shared data structures, which speed some of the indexing and inference operations and save memory. Specially de signed and tuned indexing algorithms [13] are used to access terms and clauses for subsumption operations, application in inference rules, and application of demodulators. Otter is designed to run in a UNIX-like environment, but versions (with several limitations) are available also for DOS computers and Macintoshes.
4
Performance in the Competition
Otter-304z in the unit equality competition. Otter placed first, proving 43 of 50 theorems in 2750 seconds. In second place was Waldmeister, with 37 proofs in 4730 seconds. Otter's performance is not surprising to us because application to real problems has driven its development and because we have focused on equational applications in the past few years. Otter-304z in the mixed/open competition. Otter placed second in this category, with 28 proofs of 50 theorems in 7314 seconds. The winner was SPASS, with 32 proofs in 6244 seconds. Otter's performance was not surprising to us because most of these theorems are non-Horn, and many contain a mixture of equality and nonequality relations; we have worked on very few applications with these properties, and no special tuning was done for this area. Otter-Wos in the mixed/monolithic competition. Otter placed sec ond, with 32 proofs of 50 theorems in 6037 seconds. The winner was E-SETHEO, with 36 proofs in 5655 seconds. We can compare these with the mixed/open competition, because the same set of theorems was used. The positive effect of the paramodulation restriction more than offset the wasted indexing operations for theorems without equality. Both E-SETHEO and Otter did better than we expected when compared with the mixed/open results. In the design of the competition, the open/monolithic distinction was made because it was thought that the open systems would have an unfair advantage. In this competition, at least, the monolithic systems did better, so perhaps the mixed/open and mixed/monolithic categories should be judged together. In that case, E-SETHEO wins, Otter-Wos places second, SPASS is very close behind in third, and Otter-304z is sixth. Otter's performance in the competition clearly points to its strengths in unit equality reasoning and its weaknesses in non-Horn deduction and on problems with a mixture of equality and nonequality relations. A strength not evident from the competition is deduction in Horn theories, and a weakness not evident is propositional unsatisfiability.
Otter: The CADE-13 Competition Incarnations
5
1621
Conclusion
One of the most important features of the results can be seen in the tables of runtimes for each system on each theorem [29]. Most of the theorems on which the top finishers failed were proved easily by at least one system. Each of the unit equality theorems was proved by at least one system, and 45 of 50 were proved in less than 20 seconds by at least one system. Of the mixed theorems, 46 were proved by at least one system, and 44 were proved in less than 17 seconds by at least one system. These results support our long-held position that the best method for automated deduction is a variety of methods.
References 1. Butler, R. and Overbeek, R.: Formula databases for high-performance resolution/paramodulation systems, Journal of Automated Reasoning 12(2) (1994), 139-156. 2. Hart, J. and Kunen, K.: Single axioms for odd exponent groups, Journal of Automated Reasoning 14(3) (1995), 383-412. 3. Jech, T.: Otter experiments in a system of combinatory logic, Journal of Automated Reasoning 14(3) (1995), 413-426. 4. Kunen, K.: Single axioms for groups, Journal of Automated Reasoning 9(3) (1992), 291-308. 5. Kunen, K.: The shortest single axioms for groups of exponent 4, Computers and Mathematics with Applications 29(1 (1995), 1-12. 6. Kunen, K.: Moufang quasigroups, Journal of Algebra 83 (1996), 231-234. 7. Kunen, K.: Quasigroups, loops, and associative laws, J. Algebra83 (1996). To appear. 8. Lusk, E. and McCune, W.: Experiments with ROO, a parallel automated deduction system, in: B. Fronhofer and G. Wrightson (eds.), Parallelization in Inference Systems, LNAI, 590, Springer-Verlag, Berlin, 1992, pp. 139-162. 9. Lusk, E., McCune, W. and Overbeek, R.: Logic machine architecture: Kernel functions, in: D. Loveland (ed.), Proc. of CADE-6, LNCS, 138, SpringerVerlag, Berlin, 1982, pp. 70-84. 10. McCharen, J., Overbeek, R. and Wos, L.: Complexity and related enhance ments for automated theorem-proving programs, Comp. Math, with Appli cations 2 (1976), 1-16.
1622
Collected Works of Larry Wos
11. McCharen, J., Overbeek, R. and Wos, L.: Problems and experiments for and with automated theorem-proving programs, IEEE Trans. Comp. C-25(8) (1976), 773-782. 12. McCune, W.: Automated discovery of new axiomatizations of the left group and right group calculi, Journal of Automated Reasoning 9(1) (1992), 1-24. 13. McCune, W.: Experiments with discrimination tree indexing and path in dexing for term retrieval, Journal of Automated Reasoning 9(2) (1992), 147-167. Invited paper. 14. McCune, W.: Single axioms for groups and Abelian groups with various operations, Journal of Automated Reasoning 10(1) (1993), 1-13. 15. McCune, W.: Single axioms for the left group and right group calculi, Notre Dame J. Formal Logic 34(1) (1993), 132-139. 16. McCune, W.: A Davis-Putnam Program and its Application to Finite First-Order Model Search: Quasigroup Existence Problems, Tech, Report ANL/MCS-TM-194, Argonne National Laboratory, Argonne, IL, May 1994. 17. McCune, W.: OTTER 3.0 Reference Manual and Guide. Tech. Report ANL94/6, Argonne National Laboratory, Argonne, IL, 1994. 18. McCune, W.: 33 basic test problems: A practical evaluation of some paramodulation strategies. Preprint ANL/MCS-P618-1096, Argonne Na tional Laboratory, 1996. 19. McCune, W.: Otter, 1996.
http://www.mcs.anl.gov/home/mccune/ar/otter/,
20. McCune, W. and Padmanabhan, R.: Single identities for lattice theory and weakly associative lattices. Preprint MCS-P493-0395, Argonne National Laboratory, 1995. 21. McCune, W. and Padmanabhan, R.: Automated Deduction in Equational Logic and Cubic Curves, LNAI, 1095, Springer-Verlag, Berlin, 1996. 22. McCune, W. and Sands, A. D.: Computer and human reasoning: Single implicative axioms for groups and for Abelian groups, American Math. Monthly, December 1996. To appear. 23. McCune, W. and Wos, L.: The absence and the presence of fixed point combinators, Theoretical Computer Science 87 (1991), 221-228. 24. McCune, W. and Wos, L.: Application of automated deduction to the search for single axioms for exponent groups, in: A. Voronkov (ed.), Logic Program ming and Automated Reasoning, LNAI, 624, Springer-Verlag, Berlin, 1992, pp. 131-136.
Otter: The CADE-13 Competition Incarnations
1623
25. McCune, W. and Wos, L.: Experiments in automated deduction with con densed detachment, in: D. Kapur (ed.), Proceedings of CADE-11, LNAI, 607, Springer-Verlag, Berlin, 1992, pp. 209-223. 26. Padmanabhan, R. and McCune, W.: Automated reasoning about cubic curves, Comp. Math, with Applications 29(2) (1995), 17-26. 27. Padmanabhan, R. and McCune, W.: Single identities for ternary Boolean algebras, Comp. Math, ivith Applications 29(2) (1995), 13-16. 28. Smith, B.: Reference Manual for the Environmental Theorem Proven An Incarnation of AURA. Tech. Report ANL-88-2, Argonne National Labora tory, 1988. 29. Sutcliffe, G. and Suttner, C : The Results of the CADE-13 ATP System Competition. Journal of Automated Reasoning 18(2) (1997). 30. Winker, S.: Generation and verification of finite models and counterexam ples using an automated theorem prover answering two open questions, J. ACM 29 (1992), 273-284. 31. Winker, S.: Robbins algebra: Conditions that make a near-Boolean algebra Boolean, Journal of Automated Reasoningfi{4) (1990), 465-489. 32. Winker, S., Wos, L. and Lusk, E.: Semigroups, antiautomorphisms, and involutions: A computer solution to an open problem, I, Math. Comp. 37 (1981), 533-545. 33. Wos, L.: The kernel strategy and its use for the study of combinatory logic, Journal of Automated Reasoning 10(3) (1993), 287-343. 34. Wos, L.: Searching for circles of pure proofs, Journal of Automated Rea soning 15(3) (1995), 279-315. 35. Wos, L.: Otter and the Moufang identity problem, Journal of Automated Reasoning 17(2) (1996), 215-257. 36. Wos, L.: The power of combining resonance with heat, Journal of Automated Reasoning 17(1) (1996), 23-81. 37. Wos, L., Carson, D. and Robinson, G.: The unit preference strategy in theorem proving, in AFIPS Proc. 26, Spartan Books, 1964, pp. 615-621. 38. Wos, L. and Robinson, G.: Paramodulation and set of support, in IRIA Symposium on Automatic Demonstration, Springer-Verlag, 1968, pp. 276310. 39. Wos, L., Robinson, G. and Carson, D.: Efficiency and completeness of the set of support strategy in theorem proving, J. J 4 C M 1 2 ( 4 ) (1965), 536-541.
1624
Collected Works of Larry Wos
40. Wos, L., Robinson, G., Carson, D. and Shalla, L.: The concept of demodu lation in theorem proving, J. A CM 14(4) (1967), 698-709. 41. Wos, L., Winker, S., Smith, B., Veroff, R. and Henschen, L.: A new use of an automated reasoning assistant: Open questions in equivalential calculus and the study of infinite domains, Artificial Intelligence 22 (1984), 303-356.
Larry Wos Programs That Offer
Fast, Flawless, Logical Reasoning Confused by too many reasonable conclusions? Sort them out through automated reasoning using special strategies that logically restrict and direct the search for the answer. The English detective Sherlock Holmes and Star Trek's Mr. Spock could reason logically and flawlessly, always. Some people you know have that ability, sometimes. Unfortunately, without perfect reasoning, diverse problems arise, like bugs in computer programs (if a sort program places Sun before Intel, more than disappointment is experienced); flaws in chip design (one type of Pentium chip became infamous several years ago because of a flaw); and errors in mathematical proofs (paper with a title of the form "On an Error by MacLane" is easily remembered, but not with pleasure—at least by MacLane). How can the likelihood of such disasters be reduced? One answer is automated reasoning. The focus of automated reasoning is the design and implementation of com puter programs that flawlessly apply logical reasoning to reach the objective, whether the area of interest is circuit design, program verification, theorem proving, puzzle solving, or something else. Featured in this article is the versa tile program OTTER—developed by researcher William McCune [3]—that you can use to attack problems in each of these areas, especially to sharply reduce the likelihood of disaster. (For lots more on automated reasoning at Argonne National Laboratory, including pointers to obtaining an automated reasoning program called OTTER, as well as new results, neat proofs, and puzzles, see www.mcs.anl.gov/home/mccune/ar/.) Fortunately, you can easily obtain your own personal copy of OTTER, which is included on a diskette in [9]. That book also shows the process by which Reprinted with permission from CACM, Vol. 41, no. 6, 87-102 (June 1998).
1626
Collected Works of Larry Wos
significant discoveries are made using an automated reasoning program. You can obtain a copy of the OTTER source code through the Argonne Web page. Finally, for those who want a taste of using OTTER immediately, try the Son of BirdBrain hot-link on the same Web page. Before you decide whether access to an automated reasoning program will sharply increase your likelihood of success, you might wonder how its reasoning differs from your own and from that of other ordinary humans. You might also be curious about how its strategic attack differs from an attack a person typically takes. I believe—although some of my colleagues do not entirely share my views—that precious little is known about how people consciously reason, especially when attacking a deep problem. The eureka experience is well documented. Jules Henri Poincare, the French mathematician, 1854-1912, after failing to solve a problem on which he had worked for months, suddenly knew its solution. I have had a similar experience: waking at 3 A.M., instantly aware of what was needed to solve a problem in abstract algebra. The subconscious mind is a marvel. A computer program (of the type under discussion) has no subconscious mind, nor intuition, nor, for that matter, experience on which to draw. Also in sharp contrast to the way people reason is an automated reasoning program's use of specific and well-defined inference rules for drawing conclusions. People, I have discovered, seldom use specific rules for drawing conclusions, even when the subject is mathematics. In addition, people often wish to reason from pairs of statements, whereas a reasoning program like OTTER offers various means of drawing conclusions from statements taken three or more at a time. Compared with reasoning, even less is known about how a person selects appropriate data from a huge amount of information—specifically, which tech niques (strategies) are used to restrict the reasoning and which are used to direct it. Indeed, how does the disciplined mind escape drowning in useless conclusions, and how does such a mind avoid getting lost? If you question researchers, you will find that an explicit strategy is seldom used, in contrast to the case in playing chess or playing poker. If a reasoning program is to serve well as an assistant, however, the use of explicit strategies (I use the plural deliberately) is an absolute necessity. Strategies for restricting and strategies for directing a program's reasoning have intrigued me for decades. Figure 1 lists some of the more powerful ones my colleagues and I have formulated. Through such strategies, a reasoning program can avoid drowning in new conclusions and avoid getting lost. Also, by using such strategies, OTTER can attack a wide variety of problems from diverse areas. (You might find it interesting to note that one of my most important contributions to the field was the introduction, in 1963, of the use of strategy by automated reasoning programs. I am almost equally proud of having intro duced the term "automated reasoning" in 1980; it captures far better than the traditional term "automated theorem proving" the remarkable diversity of these computer programs.) OTTER has been remarkably successful in such attacks,
Programs That Offer Fast, Flawless, Logical Reasoning
1627
not the least of which is its answering of various questions that had resisted mathematicians for decades. (For such a success with a related program, see Figure 2. For evidence of how well automated reasoning is doing, see Figure 3.) But to answer a pressing question, OTTER will never replace people. Rather, it is intended to complement the approach taken by a person, not to emulate it.
The Program OTTER Through the sponsorship of the U.S. Department of Energy and its predeces sors, researchers at Argonne have been designing and implementing automated reasoning programs for more than 30 years. William McCune worked for several years on earlier reasoning programs before producing—in approximately four months—the first version of OTTER in 1987. The 1998 version of OTTER con sists of more than 24,000 lines of C and can be used on workstations, PCs, and Macintoshes. OTTER (Organized Theorem-proving Techniques for Effective Research) de rives its name in part from its original intended use and in part from the delight of using it—in the same way one delights in watching the aquatic mammal of the same name. The program was designed and implemented for a number of reasons, including to give a person using it access to a tireless team member, offering (1) a wide variety of automated means for drawing conclusions that fol low inevitably from the assumptions that are supplied and (2) diverse powerful strategies for controlling the program's reasoning.
Problems Solved with Logical Reasoning The following five example problems, each involving logical reasoning, provide a taste of what a reasoning program (such as OTTER) can do for you. They also suggest charming choices you can make for the inference rules the program uses to draw conclusions; the strategies for restricting and directing its reasoning; and features, such as those for rewriting information into a canonical form. The examples also hint at unpleasant features of using automated reasoning programs: the input language; the need to be explicit about trivial information; and the dangers of implicit information and irrelevant information, both of which affect the program's chances of success. Perhaps as compensation for your labor, however, the program protects you in various ways. For example, a person might easily think of the brother of Bob's aunt as Bob's uncle—which is not necessarily the case. An automated reasoning program would not make such a possibly incorrect translation. The five problems may help you find new applications, professional and recreational, for automated reasoning. You will discover, among other things, that a program like OTTER does not imitate the approach a person ordinarily
Collected Works of Larry Wos
1628
takes, which is why the term Iartificial ntelligence—especially in view of AI's original intent in the 1960s—does not apply. Databases. This example is vaguely reminiscent of a simple database problem. You have some knowledge and, by drawing some conclusions, hope to determine whether an assertion is true. Here is the problem: You are told that Joy, who is married and not male, is the managing editor of the Journal of Automated Reasoning and that the only person who left the parking lot before 5 P.M. was not female. You are then told that Joy left the parking lot before 5 A.M. Finally, you are asked to prove that this last assertion is false. Elementary reasoning suffices. You first conclude that Joy is female; you then use that conclusion to deduce that Joy did not leave the parking lot before 5 P.M. A contradiction is obtained— you have completed a proof by contradiction and, in particular, solved the prob lem by showing that the last assertion is false. If you look closely, you see that you ignored two items of information—that Joy is a managing editor and that Joy is married—and at the same time used one item of information that was not given explicitly-—that people are female or male. What would happen if the same example were given to an automated rea soning program? So that you can briefly experience what can be less pleasant, I give some of the possibly pertinent information in clause form [12] (one of the language representations OTTER recognizes), where logical not is denoted by «_!) .
Joy i s not male:
-MALE(Joy).
Joy is managing editor of JAR:
EDITOR(Joy).
Without more information, the program would fail to draw any conclusions. Why? First, it does not "know" that each person is female or male; it must be told so with, for example, the following clause, where logical or is denoted by Kin.
FENALE(x)
I MALE(x).
(Your conclusion that x is a variable is indeed correct, implicitly ranging over all people.) For that matter, the program does not even "understand" the concept of female or of male. Indeed, without the information that every person is female or male (which you used implicitly), the program would be helpless. Even with that information, the program would require additional help. You would have to choose some specific inference rule-a rule of reasoning with which to draw conclusions. OTTER (and many other such programs) could use a rule, such as unit-resulting resolution, which works in the following way: The program takes the clause asserting that Joy is not male and unifies it with one of the literals in the clause asserting that each person is female or male. In particular, all occurrences of the variable x in the female-or-male clause are replaced with the term Joy. Then, the -MALE(Joy) literal cancels the (just-
Programs That Offer Fast, Flawless, Logical Reasoning
1629
obtained) MALE(Joy) literal (because the two literals are identical but opposite in sign), and the useful conclusion is reached and stored in the following clause: FEMALE(Joy). (For those who are curious about other ways OTTER reasons, and especially about how its reasoning sharply differs from that expected of a person, see the circuit design and the two-inverter problem later in the article.) Does OTTER have enougha data to succeed now? Perhaps, but unless you add strategy, the odds are high that the program would get lost, at least for problems of greater depth. One of the key strategies used by OTTER to restrict its reasoning is the set of support strategy [12]. You tell the program which state ments presenting the problem are (in effect) general information. This strategy restricts the program from applying an inference rule to a set of items all of which are among the general-information statements. The program is there fore prevented from exploring the underlying theory from which the problem is taken, thereby almost always sharply increasing its efficiency. Strategy—explicit strategy—is vital to OTTER and to any reasoning pro gram attacking deep questions; for more on the various strategies used by OT TER, see [12], and Figure 1. In contrast, a person seldom if ever uses explicit strategy. In no way does this fact mean that people are not brilliant at prob lem solving; indeed, we are. It means that OTTER is not designed to emulate a person's reasoning, perhaps explaining why OTTER makes such a valuable team member—complementing the approach a person might take. Circuit design. In language, the and of two statements is true if and only if each is true; otherwise, the and is false. In circuit design, in effect, the same holds: An AND gate takes two inputs and outputs 1 if and only if each input is 1; otherwise, the AND gate outputs 0. In this example, you are asked to design a number of different circuits, but you are constrained to avoid the use of AND gates because their cost has risen sharply. Rather than taking a complicated approach, you decide to use OR gates and NOT gates. You rely on one single simplification, or "canonicalization" rule: The and of A and B = not (not (.A) or not(S)). OTTER can apply such a rule, as well as other such rules, without error, at the rate of 5,000 rules per CPU second. (In contrast to its lack of under standing of most concepts, OTTER "understands" equality, as demonstrated in this and the next example.) Perhaps you are puzzled. Application of such a simplification rule is hardly difficult and not particularly prone to error. But now imagine being presented with 17,000 simplification rules (called demodulators [12] in automated reason ing) and an expression rewritten into its final form only after 1,766 demodulators have been applied. For an additional measure of the chore that simplification and canonicalization can present, I note that in one of the greatest successes with a reasoning program—the answering of an open question concerning Robbins algebra [4]—more than 500,000 CPU seconds were spent on demodulation, ap-
Collected Works of Larry Wos
1630
proximately 75% of the total run time; also see [2] and Figure 2 for a statement of the Robbins problem, which had resisted mathematicians for 60 years. Now you see how what might seem innocent on the surface, namely, simplification, can be a nightmare if attacked by hand rather than by computer. Arithmetic and mathematics. You are given the following two well-known p roperties of arithmetic (and of group theory):
x + -x = 0 y + (-y + z) = z and are asked to deduce a third well-known, useful, and powerful property:
y = (-y)OTTER offers an inference rule, called paramodulation [12], that does almost all of the work needed to draw this conclusion immediately. (Paramodulation, ex plored later, is more general than equality substitution.) With paramodulation, the program deduces y + 0 = -(-y), which, assuming that (as is usually the case) the following simplification rule is applied, is rewritten to the desired result: w+0= w With paramodulation, an automated reasoning program finds conclusions of (in a sense) maximum generality. People, on the other hand, typically seek a conclusion that mirrors earlier experience and intuition, although that con clusion may lack the needed generality, in turn preventing them from finding the desired information by, say, completing the sought-after proof. Unification indeed saves the day, by finding the most general replacement for variables that, when applied to the two expressions of concern, yields two identical expressions. When the needed most general replacement for variables is overlooked—which can happen easily if done by hand—chaos can result. For but one example, a specific conclusion might be drawn, rather than a needed general conclusion. An automated reasoning program is not subject to such misfortune. A difficult puzzle. Like Lewis Carroll, the 19th-century English mathemati cian and author of Alice in Wonderland, and Raymond Smullyan, the logician and professor of philosophy at Indiana University, many people are fascinated by puzzles. You might find the following billiard ball puzzle stimulating and useful, and want to offer it to colleagues, friends, loved ones, and strangers. There are 12 billiard balls, 11 of which are identical in weight. The remaining ball—the odd one—has a different weight. You are not told whether it is heavier or lighter. You have a balance scale for
Programs That Offer Fast, Flawless, Logical Reasoning
1631
weighing the balls. Can you find which ball is the odd ball in three weighings? And can you also find whether it is lighter or heavier than the others? The heart of the problem concerns the constraint of the three weighings. (I'll give you a hint about solving the puzzle just after mentioning the number of solutions that were found; stop reading here if you wish to solve the puzzle independently.) Without such a constraint, you could choose two billiard balls and weigh one against the other with the scale. If they balanced, you would mark each as standard, as one of the 11 with identical weight. If they did not balance—even better—you would learn something crucial, putting them aside and marking them as undecided. You would then know that the remaining 10 were of identical weight, and it would be a trivial matter to ascertain which was the odd ball and whether it was lighter or heavier. But the billiard ball puzzle, like life itself, is not that easy, and you may be surprised to learn that OTTER found more than 40 nontrivially distinct solutions [12] for this puzzle— while some people were still working on one. (Hint: Begin by weighing four balls against four balls.) Circuit design and the two-inverter problem. The following problem has been included on various exams given to Ph.D. candidates in engineering. The challenge to the circuit designer is twofold: meet all the restrictions and ensure that the circuit behaves as desired. A person might follow one unrewarding line of reasoning after another, getting lost in the myriad of paths leading nowhere. An automated reasoning program succeeds easily [12]. Can you solve the follow ing problem? Using as many AND and OR gates as you like, but using only two NOT gates, design a circuit according to the following specification: There are three inputs—il, i2, and i3—and three outputs—ol, o2, and o3. The outputs are related to the inputs in the following simple way: ol = not(il), o2 = not(i2), o3 = not(i3). Remember, you can use only two NOT gates. (Once you have tried your hand, you might want to glance at the pictorial result in Figure 4, translated from OTTER's success.)
A Grand Return on the Investment These problems give some idea how complex reasoning often is for a person—and for a computer program. You should also have some feeling for the value of using an automated reasoning program, especially for attacking difficult problems. How comforting it is to know that none of the reasoning contains an error, that
1632
Collected Works of Larry Wos
the program does not tire, and that it explores paths of reasoning that would be too exhausting for a person. Further, we would still be in the dark about some of the problems whose solutions are now known were it not for the role played by a program like OTTER. The five example problems are no problem for OTTER; otherwise, I would have used "better" examples. But OTTER, as well as other automated reasoning programs, has answered numerous and far more challenging questions, some of which defied experts for years [4, 5, 9]. Nevertheless, a miracle is not at hand. OTTER's principal successes have been in mathematics and logic, finding answers to open questions concerning combinatory logic, lattice theory, and algebraic geometry. The reason why de pends largely on the fact that these areas offer concise descriptions of their underlying theories. Access to such a concise description (a set of axioms, for example) is often missing when one contemplates a new application of auto mated reasoning. Consider what would happen if you wanted to determine whether certain properties held in field theory (a branch of physics), relying on the assistance of an automated reasoning assistant. First, you would have to formulate the precise concepts that are needed. Then, you would have to find a way to map the salient concepts into the language accepted by whatever automated reasoning program you intended to use. In another example, if you were studying astronomy, the concepts of planet, star, galaxy, universe, and the like might be tough nuts to crack. For a problem taken directly from industry (a problem on which a colleague ad I worked), imagine that an assembly line is producing cars—some painted blue, some green; some with air conditioning; some with power windows; and the like—and you know that the line's efficiency is severely reduced if the cars are sequenced randomly. For example, flushing the painter to refill it with blue and green alternately is not ordinarily efficient; similarly, since installing air conditioning can be time consuming, a sequence of cars all requiring this feature might not be the best. An automated reasoning program might indeed aid you in finding a good car-sequencing algorithm, but you would need to convey the crucial information; for a discussion of the approach taken to this job-scheduling problem, see [6]. (The car-sequencing problem is representative of a class of optimization problems typically studied in the field of operations research.) Both tasks—"axiomatization" and representation in the program's language—can be trying, even formidable. Nevertheless, once all is in place, the return on such an investment can be astounding. Automated reasoning programs (such as OTTER and the program of Boyer and Moore [1]) have, for example, been used in research [5, 9] and applications [11] to • Produce a fully automated proof of the correctness of a division circuit that implements the floating-point IEEE standard.
Programs That Offer Fast, Flawless, Logical Reasoning
1633
• Produce correctness proofs of security systems. • Verify commercial-size adder and multiplier circuits. • Achieve new mathematics, such as new results in quasigroups and in nonEuclidean geometry. • Settle open questions, such as finding minimal axiom sets. • Confirm conjectures, such as that of Higmann, and the paradox of Gerard. Reasoning programs have also been used by educators in undergraduate logic courses and in upper-level courses in interactive theorem proving. And as a recreational example, automated reasoning programs have turned out to be excellent assistants in puzzle solving. (For easy and hard puzzles, including variations on the well-known checkerboard-and-dominoes puzzle, see [12] and the OTTER Web site.)
Computer-oriented Reasoning vs. Person-oriented Reasoning Does automated reasoning mirror the classic approach to artificial intelligence? Does OTTER reason somewhat the way a person does? The answer to both questions is a resounding no, if one adheres to the strictest definitions. Unification is at the heart of automated reasoning. You have probably con jectured, correctly, that unifying expressions is not typical of what a person does. Reasoning focusing on equality offers another example of what OTTER does well and what a person sometimes finds arduous. As evidence, focusing on the following three equations, return to the puzzle problem and the use of paramodulation. You obtain te third equation by way of a well-chosen replacement of terms for variables in each of the first two equations, followed by an equals-for-equals substitution from the lefthand argument of the modified first equation into a proper subterm of the lefthand argument of the modified second equation: x H—x = 0 y + (-y + z) = z y
+ o = -(-y)
Indeed, if you replace all occurrences of x in the first equation with — y and all occurrences of z in the second with — (—y), then you are ready to apply the final equality-substitution step to obtain the third equation. Paramodulation com bines the variable-replacement step and the equality-substitution step; it finds the most general replacement yielding identical subexpressions (through unifi cation); and it produces the conclusion without an intermediate (so to speak)
1634
Collected Works of Larry Wos
subconclusion—and in one step. I suspect that you do not need a more com plicated illustration of paramodulation to be persuaded that a person might not always find the appropriate variable replacement and might therefore miss drawing a key conclusion. On the other hand, if you want an example of what a person does well and what OTTER does not even attempt, you need to turn to what is called instanti ation. Although perhaps without actual use of the formal name "instantiation," you may have seen this inference rule being used by some lecturer or author when a formula or equation is presented with, say, variables x, y, and z and a possibly mysterious conclusion is presented next by replacing (instantiating) z with uv + vv (with u and v being variables); y by a(b + c) for the constants a, b, and c; and x by the familiar and well-known number 2. The mysteriousness (if present) results from the lack of explanation for this particular instantiation (choice of instances) from among the infinite possibilities that could have been chosen. For that reason, OTTER (and similar programs) do not offer instan tiation as an inference rule; I know of no effective strategy for wisely choosing from among the infinite set of instances usually available. Aware that an automated reasoning program does not rely on instantia tion to draw conclusions, you might wonder what inference rules it does use, in addition to UR-resolution (used in the databases problem example) and paramodulation (used in the arithmetic and mathematics problem example). You are correct if you conjectured that UR-resolution requires the conclusion to be nonempty and free of logical or. OTTER also offers an inference rule— called hyperresolution—that yields conclusions free of logical not. the program frequently applies both inference rules to a set of statements containing more than two items—which is not the way a person usually reasons. The program also offers you binary resolution, a rule that always focuses on pairs of state ments without any constraint on the conclusion that is drawn—which is like a person's reasoning. Even with such diverse rules for drawing conclusions, all would still be lost were it not for strategy. The program's reasoning must be restricted and must also be directed; see, for example, the arithmetic and mathematics problem and Figure 4. Further, to be most effective, some means is needed for the program to benefit from what you know and what you do well. Indeed, you might wish to combine a person's experience, intuition, and reasoning with the reasoning power of a program like OTTER. In a sense, OTTER lets you do just that—through the use of strategy—as shown in the following examples. • You have some experience that suggests one subexpression merits prefer ence over another, such as one involving sum over one involving product. You can use the weighting strategy [12] to guide the program accordingly. In another example, you might want to design circuits with minimal, but perhaps some, use of OR gates. With weighting, you can instruct the pro-
Programs That Offer Fast, Flawless, Logical Reasoning
1635
gram to follow your preference. • Your intuition—or even a wild guess—tells you that the steps of a proof you have in hand could be used profitably to find a proof only distantly related. Either of two strategies—hints [7] or resonance [9]—can be used to achieve your intent. • You conjecture that certain statements in the problem description, as par ents, merit immediate visiting and even immediate revisiting when con clusions are to be drawn. You can have your way by using the hot list strategy [10]. Using these and other strategies, and guided by your own reasoning, experi ence, and intuition, OTTER can assist you in solving a wide variety of problems.
A Replacement for People? OTTER is not a replacement for a person. As noted, it cannot draw on intuition or experience, nor can it formulate new concepts. Such a program does not learn, nor is it self-analytical in the following sense: When you choose an approach to attacking a problem, you often monitor (in some fashion) your progress and, based on that monitoring, modify that approach as you go along. Someday, but not now, an automated reasoning program will have that self-analytical capacity (see [8]). An automated reasoning program lacks common sense, cannot judge the accuracy of your statements, and cannot test for omissions in the problem description. It trusts you. In many other ways, however, a program like OTTER provides excellent incentives for relying on it instead of relying exclusively on a person. First, OTTER is remarkably powerful, being able to draw thousands of conclusions per second. And it does not tire, continuing to draw conclusions at virtually the same rate even after millions of conclusions have been drawn. OTTER does not leave huge gaps in its reasoning, gaps that would force you to guess the not-always-obvious omitted steps. It explicitly presents its work in an output file, including the history of each of its conclusions. If your initial attack on a problem fails, you can examine what OTTER has done and modify your approach or correct the input. You might, for example, find that the problem description is inconsistent. Moreover, OTTER can have multiple personalities (without the attendant psychological problems). In particular, if you have access to a number of com puters, you can apply a multifaceted attack, having the program try different approaches on different machines. Such a multifaceted attack might enable you to explore radically different approaches that might ordinarily be rejected be cause of the required effort.
1636
Collected Works of Larry Wos
OTTER was not designed to replace people, or even to imitate people. Will it or will any other automated reasoning program replace scientists and engi neers? Never. Unquestionably, automated reasoning has made great strides in the past few years. But at most, I expect automated reasoning programs to enable people to devote their energy and time to bigger pictures, having reason ing programs attack smaller problems quickly and flawlessly and, occasionally, without assistance answer deep questions.
REFERENCES 1. Boyer, R., and Moore, J A Computational Logic Handbook. New York: Academic Press, 1988 (seealsowww.cli.com/software/nqthm/obtaining.html). 2. Kolata, G. With major math proof, brute computers show flash of reason ing power. New York Times, Dec. 10, 1996, Cl. 3. McCune, W. OTTER 3.0 reference manual and guide. Technical Report ANL-94/6, Argonne National Laboratory, Argonne, 111., 1994. 4. McCune, W. Solution of the Robbins problem. J. Autom. Reasoning 19, 3 (Dec. 1997) 277-318. 5. McCune, W., and Padmanabhan, R. Automated Deduction in Equational Logic and Cubic Curves. Lecture Notes in Computer Science 1095. SpringerVerlag, Heidelberg, Germany, 1996; see also www.mcs.anl.gov/home/mccune/ar/ monograph/. 6. Parrello, B., Kabat, W., and Wos, L. Job-shop scheduling using auto mated reasoning: A case study of the car-sequencing problem. J. Autom. Reasoning 2, 1 (1986), 1-42. 7. Veroff, R. Using hints to increase the effectiveness of an automated rea soning program: Case studies, J. Autom. Reasoning 16, 3 (June 1996), 223-239. 8. Wos, L. Automated Reasoning: S3 Basic Research Problems. Prentice-Hall, Englewood Cliffs, N.J., 1987. 9. Wos, L. The Automation of Reasoning: An Experimenter's Notebook with OTTER Tutorial. Academic Press: New York, 1996. 10. Wos, L. OTTER and the Moufang identity problem. J. Autom. Reasoning 17, 2 (October 1996), 259-289. 11. Wos, L., and Pieper, G., eds. Special issue on automated reasoning. Com puters and Mathematics with Applications 29, 2 (Feb. 1995), 133-178.
Programs That Offer Fast, Flawless, Logical Reasoning
1637
12. Wos, L., Overbeek, R., Lusk, E., and Boyle, J. Automated Reasoning: fntroduction and Applications, 2nd ed., McGraw-Hill, New York, 1992. This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Computational and Technology Research, U.S. Department of Energy, under Contract W-31-109-Eng-38 LARRY W O S ([email protected]) is a senior mathematician in the Mathematics and Computer Science Division of Argonne National Laboratory. He is president of the Association for Automated Reasoning, a member of the editorial board of the Journal of Automated Reasoning (which he founded in 1983), and the first recipient of the Herbrand Award in Automated Deduction, presented in 1992. Unit Preference (Wos): Directs a program's reasoning by preferring nonempty statements free of logical or as parents for drawing conclusions. Set of Support (Wos): Restricts a program's reasoning by preventing it from drawing conclusions whose parents are all among statements the user (in effect) designates as general information. Weighting (Overbeek): Directs a program's reasoning by giving priority to terms the user selects and restricting a program's reasoning by discarding new conclusions whose complexity exceeds a user-chosen upper bound. Ratio (McCune): Directs a program's reasoning by focusing on a combination of least-complex information and first-come first-served information, where the combination is determined by the user. Resonance (Wos): Directs a program's reasoning based on the inclusion of formulas and equations that the user conjectures merit high priority. Hints (Veroff): Directs a program's reasoning based on the inclusion of formulas and equations that the user conjectures merit proving. Hot List (Wos): Rearranges conclusion-drawing by visiting and revisiting as parents statements that the user conjectures to be especially significant. From Variable (Wos): Restricts a program's reasoning in the context of equalityoriented reasoning (paramodulation) by preventing it from substituting from a variable. Into Variable (Wos): Restricts a program's reasoning in the context of equalityoriented reasoning (paramodulation) by preventing it from substituting into a variable. Figure 1. Examples of strategy
1638
Collected Works of Larry Wos
Figure 3. Progress in automated reasoning
As background for the Robbins problem, note that a Boolean algebra is the mathematical abstraction of the study of logical and, or, and not. The Robbins problem asks if the following three axioms completely characterize Boolean algebra, where + can be thought of as logical or, and the function n can be thought of as logical not. x + y » y + x. 7. Commutativity of + (x + y) + z - x + (y + z). '/, Associativity of + n(n(n(x) + y) + n(x + y)) » y. '/, Robbins axiom From a set-theoretic perspective, + can be thought of as union and the function n as complement. Figure 2. The Robbins problem
Programs That Offer Fast, Flawless, Logical Reasoning
1639
D-4
Figure 4. Solution to the two-inverter puzzle, which asked for the design of a circuit in which there are three inputs and three outputs and only two NOT gates