04 Solucionario Inteligencia Artificial 2da Edicion Stuart Russell, Peter Norvig

•
Xochicalli

Eddie Wright 17/06/2024
Prévia do material em texto
LIBROS UNIVERISTARIOS Y SOLUCIONARIOS DE MUCHOS DE ESTOS LIBROS GRATIS EN DESCARGA DIRECTASIGUENOS EN:VISITANOS PARA DESARGALOS GRATIS.http://librosysolucionarios.nethttps://twitter.com/Libros_y_Soluhttps://www.facebook.com/pages/Solucionarios-de-Libros/345772498866324https://plus.google.com/b/113394888343830071226/113394888343830071226Instructor’sManual:ExerciseSolutionsforArtificial IntelligenceA ModernApproachSecondEditionStuartJ.RussellandPeterNorvigUpperSaddleRiver, New Jersey 07458http://librosysolucionarios.netLibrary of CongressCataloging-in-PublicationDataRussell,StuartJ. (StuartJonathan)Instructor’s solutionmanualfor artificial intelligence: a modernapproach(secondedition)/ StuartRussell,PeterNorvig.Includesbibliographicalreferencesandindex.1. Artificial intelligence I. Norvig, Peter. II. Title.Vice PresidentandEditorialDirector, ECS:Marcia J. HortonPublisher:Alan R.AptAssociateEditor: Toni DianneHolmEditorialAssistant:Patrick LindnerVice PresidentandDirectorof ProductionandManufacturing,ESM: DavidW. RiccardiExecutiveManagingEditor: VinceO’BrienManagingEditor: CamilleTrentacosteProductionEditor: Mary MasseyManufacturingManager:Trudy PisciottiManufacturingBuyer: Lisa McDowellMarketingManager:PamelaShafferc�2003PearsonEducation,Inc.PearsonPrenticeHallPearsonEducation,Inc.UpperSaddleRiver, NJ07458All rightsreserved.No partof this manualmaybereproducedin any form or by any means,withoutpermissionin writing from thepublisher.PearsonPrenticeHall R�is a trademarkof PearsonEducation,Inc.Printedin theUnitedStatesof America10 9 8 7 6 5 4 3 2 1ISBN: 0-13-090376-0PearsonEducationLtd., LondonPearsonEducationAustraliaPty. Ltd., SydneyPearsonEducationSingapore,Pte.Ltd.PearsonEducationNorthAsiaLtd., HongKongPearsonEducationCanada,Inc.,TorontoPearsonEducacíondeMexico, S.A.deC.V.PearsonEducation—Japan,TokyoPearsonEducationMalaysia,Pte.Ltd.PearsonEducation,Inc.,UpperSaddleRiver, New Jerseyhttp://librosysolucionarios.netPrefaceThis Instructor’s SolutionManualprovidessolutions(or at leastsolutionsketches)foralmostall of the 400 exercisesin Artificial Intelligence: A ModernApproach (SecondEdi-tion). Weonly giveactualcodefor a few of theprogrammingexercises;writing a lot of codewouldnotbethathelpful, if only becausewedon’t know whatlanguageyouprefer.In many cases,we give ideasfor discussionand follow-up questions,and we try toexplainwhywe designedeachexercise.Thereis moresupplementarymaterialthat we want to offer to the instructor, but wehave decidedto do it throughthemediumof theWorld Wide Webratherthanthrougha CDor printedInstructor’s Manual.Theideais thatthissolutionmanualcontainsthematerialthatmustbekeptsecretfrom students,but theWebsitecontainsmaterialthatcanbeupdatedandaddedto in amoretimely fashion.Theaddressfor thewebsiteis:http://aima.cs.b er ke le y. eduandtheaddressfor theonlineInstructor’s Guideis:http://aima.cs.b er ke le y. edu /i ns tr uc tor s. ht mlThereyouwill find:� Instructionson how to join the aima-instructors discussionlist. We stronglyrecom-mendthat you join so that you can receive updates,corrections,notification of newversionsof this SolutionsManual,additionalexercisesandexam questions,etc., in atimely manner.� Sourcecodefor programsfrom thetext. We offer codein Lisp, Python,andJava, andpoint to codedevelopedby othersin C++ andProlog.� Programmingresourcesandsupplementaltexts.� Figuresfrom thetext; for overheadtransparencies.� Terminologyfrom theindex of thebook.� Othercoursesusingthebookthathave homepageson theWeb. You canseeexamplesyllabi andassignmentshere. Pleasedo not put solutionsetsfor AIMA exercisesonpublicwebpages!� AI Educationinformationon teachingintroductoryAI courses.� OthersitesontheWebwith informationonAI. Organizedby chapterin thebook;checkthis for supplementalmaterial.We welcomesuggestionsfor new exercises,new environmentsandagents,etc. Thebookbelongsto you, theinstructor, asmuchasus. We hopethatyou enjoy teachingfrom it,thatthesesupplementalmaterialshelp,andthatyouwill shareyour supplementsandexperi-enceswith otherinstructors.iiihttp://librosysolucionarios.nethttp://librosysolucionarios.netSolutionsfor Chapter1Introduction1.1a. Dictionary definitionsof intelligence talk about “the capacityto acquireand applyknowledge” or “the faculty of thoughtandreason”or “the ability to comprehendandprofit from experience.” Theseareall reasonableanswers,but if we want somethingquantifiablewe would usesomethinglike “the ability to apply knowledgein ordertoperformbetterin anenvironment.”b. We defineartificial intelligenceasthestudyandconstructionof agentprogramsthatperformwell in agivenenvironment,for agivenagentarchitecture.c. We defineanagentasanentity that takesactionin responseto perceptsfrom anenvi-ronment.1.2 Seethesolutionfor exercise26.1for somediscussionof potentialobjections.Theprobabilityof fooling aninterrogatordependson just how unskilledtheinterroga-tor is. Oneentrantin the2002Loebnerprizecompetition(which is not quite a real TuringTest)did fool one judge,althoughif you look at the transcript,it is hard to imaginewhatthat judgewasthinking. Therecertainlyhave beenexamplesof a chatbotor otheronlineagentfooling humans. For example,seeSeeLenny Foner’s accountof the Julia chatbotat foner.www.media.mit.edu/people/foner/Julia/. We’d say the chancetoday is somethinglike 10%,with the variationdependingmoreon theskill of the interrogatorratherthantheprogram.In 50 years,we expectthattheentertainmentindustry(movies,videogames,com-mercials)will have madesufficient investmentsin artificial actorsto createvery credibleimpersonators.1.3 The2002Loebnerprize(www.loebner.net)went to Kevin Copple’s programELLA . Itconsistsof a prioritizedsetof pattern/actionrules: if it seesa text stringmatchinga certainpattern,it outputsthe correspondingresponse,which may includepiecesof the currentorpastinput. It alsohasa large databaseof text andhasthe Wordnetonline dictionary. It isthereforeusingratherrudimentarytools,andis not advancingthetheoryof AI. It is provid-ing evidenceon the numberandtype of rulesthat aresufficient for producingonetype ofconversation.1.4 No. It meansthatAI systemsshouldavoid trying to solveintractableproblems.Usually,thismeansthey canonly approximateoptimalbehavior. Noticethathumansdon’t solve NP-completeproblemseither. Sometimesthey aregoodatsolvingspecificinstanceswith a lot of1http://librosysolucionarios.net2 Chapter 1. Introductionstructure,perhapswith theaid of backgroundknowledge. AI systemsshouldattemptto dothesame.1.5 No. IQ testscorescorrelatewell with certainothermeasures,suchassuccessin college,but only if they’re measuringfairly normalhumans.TheIQ testdoesn’t measureeverything.A programthat is specializedonly for IQ tests(andspecializedfurtheronly for theanalogypart)would very likely performpoorly on othermeasuresof intelligence.SeeTheMismea-sure of Man by StephenJayGould, Norton, 1981or Multiple intelligences: the theory inpracticeby HowardGardner, BasicBooks,1993for moreon IQ tests,what they measure,andwhatotheraspectsthereareto “intelligence.”1.6 Justasyou areunawareof all the stepsthat go into makingyour heartbeat,you arealsounawareof mostof whathappensin your thoughts.You do have aconsciousawarenessof someof your thoughtprocesses,but themajority remainsopaqueto your consciousness.Thefield of psychoanalysisis basedon the ideathat oneneedstrainedprofessionalhelp toanalyzeone’s own thoughts.1.7a. (ping-pong)A reasonablelevel of proficiency wasachievedby Andersson’s robot(An-dersson,1988).b. (driving in Cairo)No. Althoughtherehasbeena lot of progressin automateddriving,all suchsystemscurrently rely on certainrelatively constantclues: that the roadhasshouldersanda centerline, thatthecaraheadwill travel a predictablecourse,thatcarswill keepto their sideof theroad,andsoon. To ourknowledge,noneareableto avoidobstaclesor othercarsor to changelanesasappropriate;theirskillsaremostlyconfinedto stayingin onelaneatconstantspeed.Driving in downtown Cairois toounpredictablefor any of theseto work.c. (shoppingat themarket)No. No robotcancurrentlyput togetherthetasksof moving inacrowdedenvironment,usingvision to identify awidevarietyof objects,andgraspingtheobjects(includingsquishablevegetables)without damagingthem. Thecomponentpiecesarenearlyableto handletheindividual tasks,but it would take a major integra-tion effort to put it all together.d. (shoppingon the web) Yes. Softwarerobotsarecapableof handlingsuchtasks,par-ticularly if thedesignof thewebgroceryshoppingsitedoesnot changeradicallyovertime.e. (bridge)Yes.ProgramssuchasGIB now playatasolid level.f. (theoremproving) Yes. For example,theproof of Robbinsalgebradescribedon page309.g. (funny story) No. While somecomputer-generatedproseand poetry is hystericallyfunny, this is invariably unintentional,except in the caseof programsthat echobackprosethatthey have memorized.h. (legal advice)Yes,in somecases.AI hasa long history of researchinto applicationsof automatedlegal reasoning.Two outstandingexamplesaretheProlog-basedexperthttp://librosysolucionarios.net3systemsusedin theUK to guidemembersof thepublic in dealingwith theintricaciesofthesocialsecurityandnationalitylaws. Thesocialsecuritysystemis saidto havesavedtheUK governmentapproximately$150million in its first yearof operation.However,extensioninto morecomplex areassuchascontractlaw awaitsa satisfactoryencodingof thevastwebof common-senseknowledgepertainingto commercialtransactionsandagreementandbusinesspractices.i. (translation)Yes. In a limited way, this is alreadybeingdone. SeeKay, Gawron andNorvig (1994)andWahlster(2000)for an overview of thefield of speechtranslation,andsomelimitationson thecurrentstateof theart.j . (surgery) Yes. Robotsareincreasinglybeingusedfor surgery, althoughalwaysunderthecommandof adoctor.1.8 Certainlyperceptionandmotorskills areimportant,andit is agoodthingthatthefieldsof vision androboticsexist (whetheror not you want to considerthempart of “core” AI).But given a percept,an agentstill hasthe taskof “deciding” (eitherby deliberationor byreaction)which action to take. This is just as true in the real world as in artificial micro-worldssuchaschess-playing.Socomputingtheappropriateactionwill remainacrucialpartof AI, regardlessof theperceptualandmotorsystemtowhichtheagentprogramis “attached.”On theotherhand,it is true thata concentrationon micro-worldshasled AI away from thereally interestingenvironments(seepage46).1.9 Evolution tendsto perpetuateorganisms(and combinationsand mutationsof organ-isms) that aresuccesfulenoughto reproduce.That is, evolution favors organismsthat canoptimizetheirperformancemeasureto at leastsurvive to theageof sexualmaturity, andthenbeableto win a mate.Rationalityjust meansoptimizingperformancemeasure,so this is inline with evolution.1.10 Yes, they are rational, becauseslower, deliberative actionswould tend to result inmoredamageto the hand. If “intelligent” means“applying knowledge” or “using thoughtandreasoning”thenit doesnot requireintelligenceto make a reflex action.1.11 Thisdependsonyourdefinitionof “intelligent” and“tell.” In onesensecomputersonlydo what theprogrammerscommandthemto do,but in anothersensewhat theprogrammersconsciouslytellsthecomputerto dooftenhasverylittle to dowith whatthecomputeractuallydoes. Anyonewho haswritten a programwith an ornerybug knows this, asdoesanyonewho haswritten a successfulmachinelearningprogram.So in onesenseSamuel“told” thecomputer“learn to play checkers betterthan I do, and thenplay that way,” but in anothersensehetold thecomputer“follow this learningalgorithm” andit learnedto play. Sowe’releft in thesituationwhereyoumayor maynotconsiderlearningto playcheckersto bessignof intelligence(or you maythink that learningto play in theright way requiresintelligence,but not in this way), andyou maythink the intelligenceresidesin theprogrammeror in thecomputer.1.12 Thepointof thisexerciseis to noticetheparallelwith thepreviousone.Whateveryoudecidedaboutwhethercomputerscouldbeintelligentin 1.9,youarecommittedto makingthehttp://librosysolucionarios.net4 Chapter 1. Introductionsameconclusionaboutanimals(includinghumans),unlessyourreasonsfor decidingwhethersomethingis intelligent take into accountthe mechanism(programmingvia genesversusprogrammingvia a humanprogrammer).Note thatSearlemakesthis appealto mechanismin his ChineseRoomargument(seeChapter26).1.13 Again, thechoiceyoumake in 1.11drivesyour answerto thisquestion.http://librosysolucionarios.netSolutionsfor Chapter2IntelligentAgents2.1 Thefollowing arejust someof themany possibledefinitionsthatcanbewritten:� Agent: an entity thatperceivesandacts;or, onethat canbe viewedasperceiving andacting.Essentiallyany objectqualifies;thekey point is theway theobjectimplementsan agentfunction. (Note: someauthorsrestrict the term to programsthat operateonbehalfof a human,or to programsthat cancausesomeor all of their codeto run onothermachineson anetwork, asin mobile agents.)������� ���������� Agentfunction: a functionthatspecifiestheagent’s actionin responseto everypossibleperceptsequence.� Agent program: that programwhich, combinedwith a machinearchitecture,imple-mentsan agentfunction. In our simpledesigns,the programtakesa new perceptoneachinvocationandreturnsanaction.� Rationality: a propertyof agentsthatchooseactionsthatmaximizetheir expectedutil-ity, giventheperceptsto date.� Autonomy: a propertyof agentswhosebehavior is determinedby their own experienceratherthansolelyby their initial programming.� Reflex agent: anagentwhoseactiondependsonly on thecurrentpercept.� Model-basedagent: an agentwhoseactionis derived directly from an internalmodelof thecurrentworld statethatis updatedover time.� Goal-basedagent: an agentthat selectsactionsthat it believeswill achieve explicitlyrepresentedgoals.� Utility-basedagent: an agentthat selectsactionsthat it believes will maximizetheexpectedutility of theoutcopmestate.� Learningagent: anagentwhosebehavior improvesover time basedon its experience.2.2 A performancemeasureis usedby an outsideobserver to evaluatehow successfulanagentis. It is a functionfrom historiesto arealnumber. A utility functionis usedby anagentit*elf to evaluatehow desirablestatesor historiesare. In our framework, theutility functionmaynotbethesameastheperformancemeasure;furthermore,anagentmayhavenoexplicitutility functionatall, whereasthereis alwaysaperformancemeasure.2.3 Although thesequestionsarevery simple,they hint at somevery fundamentalissues.Ouranswersarefor thesimpleagentdesignsfor staticenvironmentswherenothinghappens5http://librosysolucionarios.net6 Chapter 2. IntelligentAgentswhile the agentis deliberating;the issuesget even more interestingfor dynamicenviron-ments.a. Yes;take any agentprogramandinsertnull statementsthatdonotaffect theoutput.b. Yes; the agentfunction might specify that the agentprint ������� whenthe perceptis aTuring machineprogramthat halts,and��������� otherwise. (Note: in dynamicenviron-ments,for machinesof lessthaninfinite speed,therationalagentfunctionmaynot beimplementable;e.g.,theagentfunction thatalwaysplaysa winning move, if any, in agameof chess.)c. Yes;theagent’s behavior is fixedby thearchitectureandprogram.d. Thereare "! agentprograms,althoughmany of thesewill not run at all. (Note: Anygivenprogramcandevoteat most # bits to storage,so its internalstatecandistinguishamongonly "! pasthistories.Becausetheagentfunctionspecifiesactionsbasedonper-cepthistories,therewill bemany agentfunctionsthatcannotbeimplementedbecauseof lackof memoryin themachine.)2.4 Noticethatfor our simpleenvironmentalassumptionswe neednotworry aboutquanti-tative uncertainty.a. It sufficesto show thatfor all possibleactualenvironments(i.e.,all dirt distributionsandinitial locations),thisagentcleansthesquaresat leastasfastasany otheragent.This istriviallytruewhenthereis no dirt. Whenthereis dirt in theinitial locationandnoneintheotherlocation,theworld is cleanafteronestep;no agentcando better. Whenthereis nodirt in theinitial locationbut dirt in theother, theworld is cleanaftertwo steps;noagentcando better. Whenthereis dirt in both locations,theworld is cleanafter threesteps;noagentcandobetter. (Note: in general,theconditionstatedin thefirst sentenceof thisansweris muchstricterthannecessaryfor anagentto berational.)b. Theagentin (a) keepsmoving backwardsandforwardseven after theworld is clean.It is betterto do $&%('*) oncethe world is clean(the chaptersaysthis). Now, sincethe agent’s perceptdoesn’t saywhetherthe othersquareis clean,it would seemthatthe agentmusthave somememoryto saywhetherthe othersquarehasalreadybeencleaned. To make this argumentrigorousis more difficult—for example,could theagentarrangethingssothatit wouldonly bein acleanleft squarewhentheright squarewasalreadyclean? As a generalstrategy, an agentcan usethe environmentit*elf asa form of external memory—a commontechniquefor humanswho usethings like,+-�.�/0���01������2/-3appointmentcalendarsandknotsin handkerchiefs.In thisparticularcase,however, thatis not possible.Considerthereflex actionsfor 4 5&6879�����;:=< and 4 >?6879�����;:@< . If eitheroftheseis $&%('*) , thentheagentwill fail in thecasewherethat is the initial perceptbutthe othersquareis dirty; hence,neithercanbe $A%B'*) andthereforethe simplereflexagentis doomedto keepmoving. In general,theproblemwith reflex agentsis thattheyhave to do the samething in situationsthat look the same,even whenthe situationsareactuallyquite different. In the vacuumworld this is a big liability, becauseeveryinterior square(excepthome)looks either like a squarewith dirt or a squarewithoutdirt.http://librosysolucionarios.net7AgentType PerformanceMeasureEnvironment Actuators SensorsRobotsoccerplayerWinning game,goalsfor/againstField,ball, ownteam,otherteam,own bodyDevices(e.g.,legs)forlocomotionandkickingCamera,touchsensors,accelerometers,orientationsensors,wheel/jointencodersInternetbook-shoppingagentObtainre-quested/interestingbooks,minimizeexpenditureInternet Follow link,enter/submitdatain fields,displayto userWebpages,userrequestsAutonomousMarsroverTerrainexploredandreported,samplesgatheredandanalyzedLaunchvehicle,lander, MarsWheels/legs,samplecollectiondevice,analysisdevices,radiotransmitterCamera,touchsensors,accelerometers,orientationsensors,,wheel/jointencoders,radioreceiverMathematician’stheorem-provingassistantFigureS2.1 AgenttypesandtheirPEASdescriptions,for Ex. 2.5.c. If we considerasymptoticallylong lifetimes, then it is clear that learninga map (insomeform) confersan advantagebecauseit meansthat the agentcanavoid bumpinginto walls. It canalso learn wheredirt is most likely to accumulateandcan devisean optimal inspectionstrategy. The precisedetailsof the explorationmethodneededto constructa completemap appearin Chapter4; methodsfor deriving an optimalinspection/cleanupstrategy arein Chapter21.2.5 Somerepresentative, but notexhaustive,answersaregivenin FigureS2.1.2.6 Environmentpropertiesaregivenin FigureS2.2.Suitableagenttypes:a. A model-basedreflex agentwould suffice for mostaspects;for tacticalplay, a utility-basedagentwith lookaheadwouldbeuseful.b. A goal-basedagentwould be appropriatefor specificbook requests.For moreopen-endedtasks—e.g.,“Find mesomethinginterestingto read”—tradeoffs areinvolvedandtheagentmustcompareutilities for variouspossiblepurchases.http://librosysolucionarios.net8 Chapter 2. IntelligentAgentsTaskEnvironment Observable Deterministic Episodic Static Discrete AgentsRobotsoccer Partially Stochastic SequentialDynamic Continuous MultiInternetbook-shopping Partially DeterministicC Sequential StaticC Discrete SingleAutonomousMarsrover Partially Stochastic SequentialDynamic Continuous SingleMathematician’sassistant Fully Deterministic Sequential Semi Discrete MultiFigureS2.2 Environmentpropertiesfor Ex. 2.6.c. A model-basedreflex agentwould suffice for low-level navigationandobstacleavoid-ance;for routeplanning,explorationplanning,experimentation,etc.,somecombinationof goal-basedandutility-basedagentswouldbeneeded.d. For specificproof tasks,a goal-basedagentis needed.For “exploratory” tasks—e.g.,“Prove someusefullemmataconcerningoperationson strings”—autility-basedarchi-tecturemight beneeded.2.7 Thefile "agents/environm ents /v ac uum.l is p" in thecoderepositoryimple-mentsthe vacuum-cleanerenvironment. Studentscaneasilyextendit to generatedifferentshapedrooms,obstacles,andsoon.2.8 A reflex agentprogramimplementingtherationalagentfunctiondescribedin thechap-ter is asfollows:(defun reflex-rational -v ac uum-agent (percept)(destructuring- bi nd (location status) percept(cond ((eq status ’Dirty) ’Suck)((eq location ’A) ’Right)(t ’Left))))For states1, 3, 5, 7 in Figure3.20, the performancemeasuresare1996,1999,1998,2000respectively.2.9 Exercises2.4,2.9,and2.10maybemergedin futureprintings.a. No; seeanswerto 2.4(b).b. Seeanswerto 2.4(b).c. In this case,a simplereflex agentcanbe perfectlyrational. The agentcanconsistofa tablewith eightentries,indexed by percept,that specifiesanactionto take for eachpossiblestate.After theagentacts,theworld is updatedandthenext perceptwill telltheagentwhat to do next. For larger environments,constructinga tableis infeasible.Instead,theagentcould run oneof theoptimalsearchalgorithmsin Chapters3 and4andexecutethefirst stepof thesolutionsequence.Again,no internalstateis required,but it wouldhelpto beableto storethesolutionsequenceinsteadof recomputingit foreachnew percept.2.10http://librosysolucionarios.net9Figure S2.3 An environmentin which randommotion will take a long time to cover allthesquares.a. Becausetheagentdoesnot know thegeographyandperceivesonly locationandlocaldirt, andcanotrememberwhat just happened,it will get stuckforever againsta wallwhenit triesto move in adirectionthatis blocked—thatis, unlessit randomizes.b. Onepossibledesigncleansup dirt andotherwisemovesrandomly:(defun randomized-refle x- va cu um-agent (percept)(destructuring -b ind (location status) percept(cond ((eq status ’Dirty) ’Suck)(t (random-elemen t ’(Left Right Up Down))))))This is fairly closeto whattheRoombaDFE vacuumcleanerdoes(althoughtheRoombahasa bumpsensorandrandomizesonly whenit hits anobstacle).It worksreasonablywell in nice,compactenvironments.In maze-like environmentsor environmentswithsmallconnectingpassages,it cantake avery long time to cover all thesquares.c. An exampleis shown in FigureS2.3.Studentsmayalsowish to measureclean-uptimefor linearor squareenvironmentsof differentsizes,andcomparethoseto theefficientonlinesearchalgorithmsdescribedin Chapter4.d. A reflex agentwith statecanbuild a map(seeChapter4 for details).An onlinedepth-first exploration will reachevery statein time linear in the size of the environment;therefore,theagentcando muchbetterthanthesimplereflex agent.Thequestionof rationalbehavior in unknown environmentsisacomplex onebut it isworthencouragingstudentsto think aboutit. Weneedto have somenotionof thepriorhttp://librosysolucionarios.net10 Chapter 2. IntelligentAgentsprobaility distribution over the classof environments;call this the initial belief state.Any actionyields a new perceptthat canbe usedto updatethis distribution, movingtheagentto anew belief state.Oncetheenvironmentis completelyexplored,thebeliefstatecollapsesto a single possibleenvironment. Therefore,the problemof optimalexplorationcanbe viewedasa searchfor an optimalstrategy in thespaceof possiblebelief states.This is a well-defined,if horrendouslyintractable,problem. Chapter21discussessomecaseswhereoptimalexplorationis possible.Anotherconcreteexampleof explorationis theMinesweepercomputergame(seeExercise7.11). For very smallMinesweeperenvironments,optimal exploration is feasiblealthoughthebelief stateupdateis nontrivial to explain.2.11 Theproblemappearsat first to bevery similar; themaindifferenceis that insteadofusingthelocationperceptto build themap,theagenthasto “invent” its own locations(which,afterall, arejust nodesin a datastructurerepresentingthestatespacegraph).Whena bumpis detected,theagentassumesit remainsin thesamelocationandcanadda wall to its map.For grid environments,theagentcankeeptrackof its G�H96�IKJ locationandsocantell whenithasreturnedto anold state.In thegeneralcase,however, thereis no simpleway to tell if astateis new or old.2.12a. For areflex agent,thispresentsnoadditionalchallenge,becausetheagentwill continueto LF��M�N as long as the currentlocationremainsdirty. For an agentthat constructsasequentialplan, every LF��MON actionwould needto be replacedby “ L=�8MON until clean.”If the dirt sensorcanbe wrong on eachstep,thenthe agentmight want to wait for afew stepsto getamorereliablemeasurementbeforedecidingwhetherto LF��M�N or moveon to a new square. Obviously, thereis a trade-off becausewaiting too long meansthatdirt remainson thefloor (incurringa penalty),but actingimmediatelyriskseitherdirtying a cleansquareor ignoringa dirty square(if the sensoris wrong). A rationalagentmustalsocontinuetouring andcheckingthesquaresin caseit missedoneon aprevious tour (becauseof badsensorreadings).it is not immediatelyobvioushow thewaiting time at eachsquareshouldchangewith eachnew tour. Theseissuescanbeclarified by experimentation,which may suggesta generaltrend that canbe verifiedmathematically. This problemis a partially observableMarkov decisionprocess—seeChapter17. Suchproblemsarehard in general,but somespecialcasesmay yield tocarefulanalysis.b. In this case,theagentmustkeeptouring thesquaresindefinitely. Theprobability thata squareis dirty increasesmonotonicallywith thetime sinceit waslastcleaned,sotherationalstrategy is, roughlyspeaking,to repeatedlyexecutetheshortestpossibletourofall squares.(Wesay“roughly speaking”becausetherearecomplicationscausedby thefact that theshortesttour mayvisit somesquarestwice, dependingon thegeography.)Thisproblemis alsoapartially observableMarkov decisionprocess.http://librosysolucionarios.netSolutionsfor Chapter3SolvingProblemsby Searching3.1 A state is asituationthatanagentcanfind itself in. Wedistinguishtwo typesof states:world states(theactualconcretesituationsin therealworld) andrepresentationalstates(theabstractdescriptionsof therealworld thatareusedby theagentin deliberatingaboutwhattodo).A state spaceis a graphwhosenodesare the set of all states,and whoselinks areactionsthattransformonestateinto another.A search treeis a tree(a graphwith no undirectedloops)in which therootnodeis thestartstateandthesetof childrenfor eachnodeconsistsof thestatesreachableby takinganyaction.A search node is a nodein thesearchtree.A goal is astatethattheagentis trying to reach.An action is somethingthattheagentcanchooseto do.A successorfunction describedthe agent’s options: given a state,it returnsa setof(action,state)pairs,whereeachstateis thestatereachableby takingtheaction.Thebranching factor in a searchtreeis thenumberof actionsavailableto theagent.3.2 In goal formulation,we decidewhich aspectsof the world we are interestedin, andwhich canbe ignoredor abstractedaway. Thenin problemformulationwe decidehow tomanipulatetheimportantaspects(andignoretheothers).If wedid problemformulationfirstwewouldnot know whatto includeandwhatto leaveout. Thatsaid,it canhappenthatthereis a cycle of iterationsbetweengoal formulation,problemformulation,andproblemsolvinguntil onearrivesatasufficiently usefulandefficient solution.3.3 In Pythonwe have:#### successor_fn defined in terms of result and legal_actionsdef successor_fn(s) :return [(a, result(a, s)) for a in legal_actions(s )]#### legal_actions and result defined in terms of successor_fndef legal_actions(s ):return [a for (a, s) in successor_fn(s) ]def result(a, s):11http://librosysolucionarios.net12 Chapter 3. SolvingProblemsby Searchingfor (a1, s1) in successor_fn(s) :if a == a1:return s13.4 From http://www.cut-the-knot.com/pythagoras/fifteen.shtml, this proof appliesto thefifteenpuzzle,but thesameargumentworksfor theeightpuzzle:Definition: Thegoalstatehasthenumbersin acertainorder, whichwewill measureasstartingat theupperleft corner, thenproceedingleft to right, andwhenwe reachtheendof arow, goingdown to theleftmostsquarein therow below. For any otherconfigurationbesidesthegoal,whenever a tile with a greaternumberon it precedesa tile with a smallernumber,thetwo tilesaresaidto be inverted.Proposition: For agivenpuzzleconfiguration,let P denotethesumof thetotalnumberof inversionsandtherow numberof theemptysquare.Then G�PRQTSVU; WJ is invariantunderanylegal move. In otherwords,after a legal move an odd P remainsodd whereasan even Premainseven. Thereforethegoalstatein Figure3.4,with no inversionsandemptysquareinthefirst row, has PYX[Z , andcanonly bereachedfrom startingstateswith odd P , not fromstartingstateswith even P .Proof: First of all, sliding a tile horizontallychangesneitherthe total numberof in-versionsnor the row numberof the emptysquare. Thereforelet us considersliding a tilevertically.Let’s assume,for example,that the tile 5 is locateddirectly over the emptysquare.Sliding it down changestheparity of therow numberof theemptysquare.Now considerthetotal numberof inversions.Themoveonly affectsrelative positionsof tiles 5 , > , \ , and ] .If noneof the > , \ , ] causedaninversionrelative to 5 (i.e.,all threearelargerthan 5 ) thenafter sliding onegetsthree(an odd number)of additionalinversions. If oneof the threeissmallerthan 5 , thenbeforethemove > , \ , and ] contributeda singleinversion(relative to5 ) whereasafterthemove they’ ll becontributing two inversions- achangeof 1, alsoanoddnumber. Two additionalcasesobviously leadto thesameresult.Thusthechangein thesumP is alwayseven.This is preciselywhatwe have setout to show.Sobeforewe solve apuzzle,weshouldcomputethe P valueof thestartandgoalstateandmake surethey have thesameparity, otherwiseno solutionis possible.3.5 Theformulationputsonequeenpercolumn,with a new queenplacedonly in a squarethatis not attackedby any otherqueen.To simplify matters,we’ll first considerthe # –rooksproblem.Thefirst rook canbeplacedin any squarein column1, thesecondin any squareincolumn2 exceptthesamerow thatasthe rook in column1, andin generaltherewill be #_^elementsof thesearchspace.3.6 No, a finite statespacedoesnot always leadto a finite searchtree. Considera statespacewith two states,bothof whichhaveactionsthatleadto theother. Thisyieldsaninfinitesearchtree,becausewe cango backandforth any numberof times. However, if the statespaceis afinite tree,or in general,afinite DAG (directedacyclic graph),thentherecanbenoloops,andthesearchtreeis finite.3.7http://librosysolucionarios.net13a. Initial state:No regionscolored.Goaltest:All regionscolored,andno two adjacentregionshave thesamecolor.Successorfunction:Assignacolor to a region.Costfunction:Numberof assignments.b. Initial state:As describedin thetext.Goaltest:Monkey hasbananas.Successorfunction: Hop on crate;Hop off crate;Pushcratefrom onespotto another;Walk from onespotto another;grabbananas(if standingon crate).Costfunction:Numberof actions.c. Initial state:consideringall input records.Goaltest:consideringasinglerecord,andit gives“illegal input” message.Successorfunction: run againon thefirst half of therecords;run againon thesecondhalf of therecords.Costfunction:Numberof runs.Note: This is a contingencyproblem; you needto seewhethera run givesan errormessageor not to decidewhatto do next.d. Initial state:jugshave values 4 `W6"`W6"`�< .Successorfunction: givenvalues 4 H96�IK6"a< , generate4 ZW W6�IK6"a< , 4 Hb6"cW6"a< , 4 Hb6�Id6"e�< (by fill-ing); 4 `W6�Id6"a< , 4 H96"`W6"a< , 4 H96�IK6"`�< (by emptying);or for any two jugswith currentvaluesH and I, pour I into H ; this changesthe jug with H to theminimumof HgfhI andthecapacityof thejug, anddecrementsthejug with I by by theamountgainedby thefirstjug.Costfunction:Numberof actions.12 34 5 6 78 9 10 1211 13 14 15FigureS3.1 Thestatespacefor theproblemdefinedin Ex. 3.8.3.8a. SeeFigureS3.1.b. Breadth-first:1 2 3 4 5 6 7 8 9 10 11Depth-limited:1 2 4 8 9 5 10 11Iterative deepening:1; 1 2 3; 1 2 4 5 3 6 7; 1 2 4 8 9 5 1011http://librosysolucionarios.net14 Chapter 3. SolvingProblemsby Searchingc. Bidirectionalsearchis veryuseful,becausetheonly successorof # in thereversedirec-tion is i�G�#_jW WJ�k . Thishelpsfocusthesearch.d. 2 in theforwarddirection;1 in thereversedirection.e. Yes;startat thegoal,andapplythesinglereversesuccessoractionuntil you reach1.3.9a. Hereis onepossiblerepresentation:A stateis asix-tupleof integerslisting thenumberof missionaries,cannibals,andboatson the first side,andthenthe seondsideof theriver. Thegoal is a statewith 3 missionariesand3 cannibalson thesecondside. Thecostfunctionis oneperaction,andthesuccessorsof a stateareall thestatesthatmove1 or 2 peopleand1 boatfrom onesideto another.b. The searchspaceis small, so any optimal algorithmworks. For an example,seethefile "search/domains/ ca nnib al s.l is p" . It sufficesto eliminatemovesthatcircle backto thestatejust visited. From all but thefirst andlast states,thereis onlyoneotherchoice.c. It is not obvious thatalmostall movesareeitherillegal or revert to theprevious state.Thereis a feelingof a largebranchingfactor, andno clearway to proceed.3.10 For the8 puzzle,thereshouldn’t bemuchdifferencein performance.Indeed,thefile"search/domains /p uz zl e8. li sp " shows thatyoucanrepresentan8 puzzlestateasa single32-bit integer, so the questionof modifying or copying datais moot. But for the#mlg# puzzle,as # increases,the advantageof modifying ratherthancopying grows. Thedisadvantageof a modifying successorfunction is that it only workswith depth-firstsearch(or with avariantsuchasiterative deepening).3.11 a. The algorithmexpandsnodesin orderof increasingpathcost; thereforethe firstgoalit encounterswill bethegoalwith thecheapestcost.b. It will be the sameas iterative deepening,U iterations,in which noG�p2q�J nodesaregenerated.c. U;jWrd. Implementationnot shown.3.12 If thereare two pathsfrom the startnodeto a given node,discardingthe moreex-pensive one cannoteliminateany optimal solution. Uniform-costsearchand breadth-firstsearchwith constantstepcostsbothexpandpathsin orderof s -cost.Therefore,if thecurrentnodehasbeenexpandedpreviously, thecurrentpathto it mustbe moreexpensive thanthepreviously foundpathandit is correctto discardit.For IDS, it iseasytofindanexamplewith varyingstepcostswherethealgorithmreturnsasuboptimalsolution:simplyhave two pathsto thegoal,onewith onestepcosting3 andtheotherwith two stepscosting1 each.3.13 Consideradomainin whicheverystatehasasinglesuccessor, andthereis asinglegoalat depth# . Thendepth-firstsearchwill find thegoal in # steps,whereasiterative deepeningsearchwill take Ztfu vfwevf*ck"x"x;fy#TXznoG�#|{WJ steps.http://librosysolucionarios.net153.14 As an ordinaryperson(or agent)browsing the web, we canonly generartethe suc-cessorsof a pageby visiting it. We canthendo breadth-firstsearch,or perhapsbest-searchsearchwheretheheuristicis somefunctionof thenumberof wordsin commonbetweenthestartandgoalpages;thismayhelpkeepthelinks ontarget.Searchengineskeepthecompletegraphof theweb,andmayprovide theuseraccessto all (or at leastsome)of thepagesthatlink to apage;thiswould allow usto do bidirectionalsearch.3.15a. If weconsiderall G�H96�IKJ points,thenthereareaninfinite numberof states,andof paths.b. (For this problem,we considerthe startandgoal pointsto be vertices.) The shortestdistancebetweentwo points is a straightline, and if it is not possibleto travel in astraightline becausesomeobstacleis in the way, thenthe next shortestdistanceis asequenceof line segments,end-to-end,that deviate from the straightline by as littleas possible. So the first segmentof this sequencemust go from the start point to atangentpoint on an obstacle– any paththat gave theobstaclea wider girth would belonger. Becausetheobstaclesarepolygonal,the tangentpointsmustbe at verticesoftheobstacles,andhencetheentirepathmustgo from vertex to vertex. Sonow thestatespaceis thesetof vertices,of which thereare35 in Figure3.22.c. Codenot shown.d. Implementationsandanalysisnot shown.3.16 Codenot shown.3.17a. Any path,no matterhow badit appears,might leadto anarbitraily largereward(nega-tivecost).Therefore,onewouldneedto exhaustall possiblepathsto besureof findingthebestone.b. Supposethegreatestpossiblerewardis } . Thenif wealsoknow themaximumdepthofthestatespace(e.g.whenthestatespaceisatree),thenany pathwith U levelsremainingcanbeimprovedby atmost }U , soany pathsworsethan }U lessthanthebestpathcanbepruned.For statespaceswith loops,this guaranteedoesn’t help,becauseit is possibleto go arounda loopany numberof times,picking up } rewwardeachtime.c. The agentshouldplan to go aroundthis loop forever (unlessit canfind anotherloopwith evenbetterreward).d. The valueof a scenicloop is lessenedeachtime onerevisits it; a novel scenicsightis a greatreward,but seeingthesameonefor thetenthtime in anhour is tedious,notrewarding. To accomodatethis, we would have to expandthestatespaceto includeamemory—astateis now representednot just by the currentlocation,but by a currentlocationanda bagof already-visitedlocations.Therewardfor visiting a new locationis now a(diminishing)functionof thenumberof timesit hasbeenseenbefore.e. Realdomainswith loopingbehavior includeeatingjunk food andgoingto class.3.18 Thebeliefstatespaceis shown in FigureS3.2.No solutionis possiblebecausenopathleadsto abeliefstateall of whoseelementssatisfythegoal. If theproblemis fully observable,http://librosysolucionarios.net16 Chapter 3. SolvingProblemsby SearchingLRL RS SSFigureS3.2 Thebeliefstatespacefor thesensorlessvacuumworld underMurphy’s law.theagentreachesagoalstateby executingasequencesuchthat ~K�9}� is performedonly in adirty square.Thisensuresdeterministicbehavior andeverystateis obviouslysolvable.3.19 Codenot shown, but a good start is in the coderepository. Clearly, graphsearchmustbeused—thisis a classicgrid world with many alternatepathsto eachstate.Studentswill quickly find thatcomputingtheoptimalsolutionsequenceis prohibitively expensive formoderatelylarge worlds,becausethestatespacefor an #�lo# world has#|{Ax� "! states.Thecompletiontimeof therandomagentgrowslessthanexponentiallyin # , sofor any reasonableexchangeratebetweensearchcostadpathcosttherandomagentwill eventuallywin.http://librosysolucionarios.netSolutionsfor Chapter4InformedSearchandExploration4.1 Thesequenceof queuesis asfollows:L[0+244=244]M[70+241=311],T[111+329=440]L[140+244=384],D[145+242=387],T[111+329=440]D[145+242=387],T[111+329=440],M[210+241=451],T[251+329=580]C[265+160=425],T[111+329=440],M[210+241=451],M[220+241=461],T[251+329=580]T[111+329=440],M[210+241=451],M[220+241=461],P[403+100=503],T[251+329=580],R[411+193=604],D[385+242=627]M[210+241=451],M[220+241=461],L[222+244=466],P[403+100=503],T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627]M[220+241=461],L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627]L[222+244=466],P[403+100=503],L[280+244=524],D[285+242=527],L[290+244=534],D[295+242=537],T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627]P[403+100=503],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537],T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662]B[504+0=504],L[280+244=524],D[285+242=527],M[292+241=533],L[290+244=534],D[295+242=537],T[251+329=580],A[229+366=595],R[411+193=604],D[385+242=627],T[333+329=662],R[500+193=693],C[541+160=701]4.2 ��X�` gives �KG�#_J�X��sBG�#_J . This behavesexactly like uniform-costsearch—thefactorof two makesnodifferencein theorderingof thenodes.��XzZ givesA� search.��Xz gives�KG�#_J�Xz W�BG�#_J , i.e.,greedybest-firstsearch.Wealsohave�KG�#_J�X�G� v�y�?J�4 s(G�#_J*f � v��� �=G�#_J�<which behavesexactly like A� searchwith a heuristic �{�� � �BG�#_J . For ����Z , this is alwayslessthan �=G�#_J andhenceadmissible,provided �BG�#_J is itself admissible.4.3a. Whenall stepcostsareequal, sBG�#_J������1)B���(G�#_J , so uniform-costsearchreproducesbreadth-firstsearch.b. Breadth-firstsearchis best-firstsearchwith �KG�#_J�X��;��)B���BG�#_J ; depth-firstsearchisbest-firstsearchwith �KG�#_J�X��?����)B���BG�#_J ; uniform-costsearchis best-firstsearchwith17http://librosysolucionarios.net18 Chapter 4. InformedSearchandExploration�KG�#_J9XwsBG�#_J .c. Uniform-costsearchis A� searchwith �=G�#_J_Xz` .S�A� GBh=7�h=5�h=1 h=0�2 14¡4¡Figure S4.1 A graphwith an inconsistentheuristicon which GRAPH-SEARCH fails toreturnthe optimal solution. The successorsof ¢ are £ with ¤¦¥¨§ and © with ¤v¥�ª . £ isexpandedfirst, so the pathvia © will be discardedbecause£ will alreadybe in the closedlist.4.4 SeeFigureS4.1.4.5 Going betweenRimnicu Vilcea and Lugoj is one example. The shortestpath is thesouthernone,throughMehadia,DobretaandCraiova. But agreedysearchusingthestraight-line heuristicstartingin RimnicuVilceawill startthewrongway, headingto Sibiu. Startingat Lugoj, theheuristicwill correctlyleadusto Mehadia,but thena greedysearchwill returnto Lugoj, andoscillateforever betweenthesetwo cities.4.6 Theheuristic��X��B«bfw� { (addingmisplacedtilesandManhattandistance)sometimesoverestimates.Now, suppose�BG�#_Jg���B�G�#_J¦f�} (as given) and let ¬ { be a goal that issuboptimalby morethan } , i.e., s(G�¬ { J¦[\ � f®} . Now considerany node# on a pathto anoptimalgoal.Wehave�KG�#_J�X�s(G�#_J|fu�=G�#_J��s(G�#_J|fu� � G�#_JKf®}�¯\ � fu}��s(G�¬ { Jso ¬ { will never beexpandedbeforeanoptimalgoalis expanded.4.7 A heuristicis consistentiff, for every node# andevery successor#|° of # generatedbyany action ± ,�BG�#_J²�z}G�#_6"±86�# ° J*fu�=G�# ° JOnesimpleproof is by inductionon thenumber� of nodeson theshortestpathto any goalfrom # . For �³X´Z , let # ° be the goal node; then �BG�#_J��´}G�#_6"±�6�# ° J . For the inductivecase,assume# ° is on theshortestpath � stepsfrom thegoalandthat �=G�# ° J is admissiblebyhypothesis;then�BG�#_J²�z}G�#_6"±86�# ° J*fu�=G�# ° J²�z}G�#_6"±86�# ° JKf®� � G�# ° J²Xz� � G�#_Jhttp://librosysolucionarios.net19so �=G�#_J at �ofuZ stepsfrom thegoalis alsoadmissible.4.8 Thisexercisereiteratesasmallportionof theclassicwork of Held andKarp (1970).a. TheTSPproblemis to find a minimal (total length)paththroughthecities that formsa closedloop. MST is a relaxed versionof that becauseit asksfor a minimal (totallength)graphthatneednot bea closedloop—it canbeany fully-connectedgraph.Asaheuristic,MST is admissible—itis alwaysshorterthanor equalto aclosedloop.b. The straight-linedistanceback to the start city is a ratherweak heuristic—it vastlyunderestimateswhentherearemany cities. In thelaterstageof asearchwhenthereareonly a few citiesleft it is not sobad.To saythatMST dominatesstraight-linedistanceis to saythatMST alwaysgivesa highervalue. This is obviously truebecausea MSTthatincludesthegoalnodeandthecurrentnodemusteitherbethestraightline betweenthem,or it mustincludetwo or morelines thataddup to more. (This all assumesthetriangleinequality.)c. See"search/domain s/t sp .l is p" for astartat this. Thefile includesaheuristicbasedon connectingeachunvisited city to its nearestneighbor, a closerelative to theMST approach.d. See(Cormenet al., 1990,p.505)for analgorithmthatrunsin n¨G�µ·¶ ¸W¹ºµoJ time,whereµ is the numberof edges. The coderepositorycurrently containsa somewhat lessefficientalgorithm.4.9 Themisplaced-tilesheuristicis exactfor theproblemwhereatile canmovefrom squareA to squareB. As this is a relaxationof theconditionthata tile canmove from squareA tosquareB if B is blank,Gaschnig’s heuristiccanotbe lessthanthemisplaced-tilesheuristic.As it is alsoadmissible(beingexact for a relaxationof the original problem),Gaschnig’sheuristicis thereforemoreaccurate.If wepermutetwo adjacenttiles in thegoalstate,wehaveastatewheremisplaced-tilesandManhattanbothreturn2, but Gaschnig’s heuristicreturns3.To computeGaschnig’s heuristic,repeatthe following until the goal stateis reached:let B be the currentlocationof the blank; if B is occupiedby tile X (not the blank) in thegoalstate,moveX to B; otherwise,moveany misplacedtile to B. Studentscouldbeaskedtoprove thatthis is theoptimalsolutionto therelaxedproblem.4.11a. Localbeamsearchwith �RX�Z is hill-climbing search.b. Local beamsearchwith �»X½¼ : strictly speaking,this doesn’t make sense.(Exercisemay be modified in future printings.) The idea is that if every successoris retained(because� is unbounded),thenthesearchresemblesbreadth-firstsearchin thatit addsonecompletelayerof nodesbeforeaddingthenext layer. Startingfrom onestate,thealgorithmwouldbeessentiallyidenticalto breadth-firstsearchexceptthateachlayerisgeneratedall at once.c. Simulatedannealingwith ¾�Xz` atall times:ignoringthefactthattheterminationstepwouldbetriggeredimmediately, thesearchwouldbeidenticalto first-choicehill climb-http://librosysolucionarios.net20 Chapter 4. InformedSearchandExplorationing becauseevery downwardsuccessorwouldberejectedwith probability1. (Exercisemaybemodifiedin futureprintings.)d. Geneticalgorithmwith populationsize P¿XÀZ : if the populationsize is 1, then thetwo selectedparentswill bethesameindividual; crossover yieldsanexactcopy of theindividual; then thereis a small chanceof mutation. Thus, the algorithmexecutesarandomwalk in thespaceof individuals.4.12 If we assumethecomparisonfunctionis transitive, thenwe canalwayssortthenodesusing it, and choosethe nodethat is at the top of the sort. Efficient priority queuedatastructuresrely only on comparisonoperations,sowe losenothingin efficiency—exceptforthefactthatthecomparisonoperationonstatesmaybemuchmoreexpensive thancomparingtwo numbers,eachof whichcanbecomputedjustonce.A� reliesonthedivisionof thetotalcostestimate�KG�#_J into thecost-so-far andthecost-to-go. If we have comparisonoperatorsfor eachof these,thenwe canprefer to expandanodethat is betterthanothernodeson bothcomparisons.Unfortunately, therewill usuallybeno suchnode.Thetradeoff betweensBG�#_J and �BG�#_J cannotberealizedwithout numericalvalues.4.13 The spacecomplexity of LRTA� is dominatedby the spacerequiredfor �Á�O�����Â�W4 ±86"Ã�< ,i.e., theproductof thenumberof statesvisited(# ) andthenumberof actionstried perstate(Q ). The time complexity is at least n¨G�#KQ�{VJ for a naive implementationbecausefor eachaction taken we computean Ä value, which requiresminimizing over actions. A simpleoptimizationcanreducethis to noG�#KQTJ . This expressionassumesthateachstate–actionpairis tried at mostonce,whereasin factsuchpairsmaybetried many times,astheexampleinFigure4.22shows.4.14 Thisquestionis slightly ambiguousasto whattheperceptis—eithertheperceptis justthe location,or it givesexactly the setof unblocked directions(i.e., blocked directionsareillegal actions). We will assumethe latter. (Exercisemay be modifiedin future printings.)Thereare12 possiblelocationsfor internalwalls, so thereare « {²XÆÅ=`WÇWÈ possibleenviron-mentconfigurations.A belief statedesignatesa subsetof theseaspossibleconfigurations;for example,beforeseeingany perceptsall 4096configurationsarepossible—thisis asinglebelief state.a. Wecanview thisasacontingency problemin belief statespace.Theinitial belief stateis thesetof all 4096configurations.The total belief statespacecontains É�Ê�Ë�Ì beliefstates(onefor eachpossiblesubsetof configurations,but mostof thesearenot reach-able. After eachactionandpercept,the agentlearnswhetheror not an internalwallexistsbetweenthecurrentsquareandeachneighboringsquare.Hence,eachreachablebelief statecan be represntedexactly by a list of statusvalues(present,absent,un-known) for eachwall separately. That is, thebelief stateis completelydecomposableand thereareexactly e « { reachablebelief states. The maximumnumberof possiblewall-perceptsin eachstateis 16 ( É ), soeachbelief statehasfour actions,eachwith upto 16 nondeterministicsuccessors.http://librosysolucionarios.net21b. Assumingtheexternalwalls areknown, therearetwo internalwalls andhence { XÍÅpossiblepercepts.c. Theinitial null actionleadsto four possiblebeliefstates,asshown in FigureS4.2.Fromeachbeliefstate,theagentchoosesasingleactionwhichcanleadto upto 8 beliefstates(on enteringthemiddlesquare).Given thepossibility of having to retraceits stepsata deadend, the agentcan explore the entire mazein no more than 18 steps,so thecompleteplan (expressedasa tree)hasno morethan c «�Î nodes. On the otherhand,therearejust e « { , so theplancouldbe expressedmoreconciselyasa tableof actionsindexedby belief state(apolicy in theterminologyof Chapter17).?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï ?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï ?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?Ï?ÏNoOpRightFigure S4.2 The ÐAÑÒÐ mazeexplorationproblem:the initial state,first percept,andoneselectedactionwith its perceptualoutcomes.4.15 Hereis onesimplehill-climbing algorithm:� Connectall thecitiesinto anarbitrarypath.� Pick two pointsalongthepathat random.� Split thepathat thosepoints,producingthreepieces.� Try all six possiblewaysto connectthethreepieces.� Keepthebestone,andreconnectthepathaccordingly.� Iteratethestepsabove until no improvementis observedfor a while.4.1 4.16Codenot shown.http://librosysolucionarios.net22 Chapter 4. InformedSearchandExploration4.17 Hillclimbing is surprisinglyeffectiveatfindingreasonableif notoptimalpathsfor verylittle computationalcost,andseldomfails in two dimensions.a. It is possible(seeFigureS4.3(a))but veryunlikely—theobstaclehasto haveanunusualshapeandbepositionedcorrectlywith respectto thegoal.b. With convex obstacles,getting stuck is much more likely to be a problem(seeFig-ureS4.3(b)).c. Noticethatthis is justdepth-limitedsearch,whereyouchooseastepalongthebestpathevenif it is notasolution.d. Set � to themaximumnumberof sidesof any polygonandyoucanalwaysescape.CurrentÓpositionGoalÔ(a) (b)CurrentÓpositionGoalÔFigureS4.3 (a)Gettingstuckwith aconvex obstacle.(b) Gettingstuckwith anonconvexobstacle.4.18 The studentshouldfind that on the 8-puzzle,RBFS expandsmore nodes(becauseit doesnot detectrepeatedstates)but haslower costper nodebecauseit doesnot needtomaintaina queue. The numberof RBFS nodere-expansionsis not too high becausethepresenceof many tiedvaluesmeansthatthebestpathchangesseldom.Whentheheuristicisslightly perturbed,thisadvantagedisappearsandRBFS’sperformanceis muchworse.For TSP, thestatespaceis a tree,sorepeatedstatesarenotanissue.On theotherhand,theheuristicis real-valuedandthereareessentiallyno tied values,soRBFSincursa heavypenaltyfor frequentre-expansions.http://librosysolucionarios.netSolutionsfor Chapter5ConstraintSatisfactionProblems5.1 A constraint satisfactionproblem is a problemin which thegoalis to choosea valuefor eachof a setof variables,in suchaway thatthevaluesall obey asetof constraints.A constraint is arestrictionon thepossiblevaluesof two or morevariables.For exam-ple,aconstraintmight saythat 5ÕX�± is notallowedin conjunctionwith >[Xzp .Backtracking search is a form depth-firstsearchin which thereis a singlerepresenta-tion of thestatethatgetsupdatedfor eachsuccessor, andthenmustberestoredwhena deadendis reached.A directedarc from variable 5 to variable > in a CSPis arc consistentif, for everyvaluein thecurrentdomainof 5 , thereis someconsistentvalueof > .Backjumping is awayof makingbacktrackingsearchmoreefficient,by jumpingbackmorethanonelevel whenadeadendissreached.Min-conflicts is a heuristicfor usewith local searchon CSPproblems.Theheuristicsaysthat, whengiven a variableto modify, choosethe valuethat conflictswith the fewestnumberof othervariables.5.2 Thereare18 solutionsfor coloring Australiawith threecolors. Startwith SA, whichcanhave any of threecolors. Thenmoving clockwise,WA canhave eitherof theothertwocolors,andeverythingelseis strictly determined;thatmakes6 possibilitiesfor themainland,times3 for Tasmaniayields18.5.3 The most constrainedvariablemakes sensebecauseit choosesa variablethat is (allother things beingequal) likely to causea failure, and it is more efficient to fail as earlyaspossible(therebypruning large partsof the searchspace).The leastconstrainingvalueheuristicmakes sensebecauseit allows the most chancesfor future assignmentsto avoidconflict.5.4 a. Crossword puzzleconstructioncan be solved many ways. One simple choiceisdepth-firstsearch.Eachsuccessorfills in a word in thepuzzlewith oneof thewordsin thedictionary. It is betterto go onewordata time, to minimizethenumberof steps.b. As a CSP, thereareevenmorechoices.You couldhave a variablefor eachbox inthecrosswordpuzzle;in thiscasethevalueof eachvariableis a letter, andtheconstraintsarethat the lettersmustmake words. This approachis feasiblewith a most-constrainingvalueheuristic.Alternately, we couldhave eachstringof consecutive horizontalor verticalboxesbea singlevariable,andthedomainof thevariablesbewordsin thedictionaryof the right23http://librosysolucionarios.net24 Chapter 5. ConstraintSatisfactionProblemslength.Theconstraintswouldsaythattwo intersectingwordsmusthavethesameletterin theintersectingbox. Solvingaproblemin this formulationrequiresfewersteps,but thedomainsarelarger(assuminga big dictionary)andtherearefewer constraints.Both formulationsarefeasible.5.5 a. For rectilinearfloor-planning,onepossibility is to have a variablefor eachof thesmall rectangles,with the valueof eachvariablebeinga 4-tupleconsistingof the H and Icoordinatesof the upperleft and lower right cornersof the placewherethe rectanglewillbe located.Thedomainof eachvariableis thesetof 4-tuplesthatarethe right sizefor thecorrespondingsmallrectangleandthatfit within thelargerectangle.Constraintssaythatnotwo rectanglescanoverlap;for exampleif thevalueof variableÖ×« is 4 `W6"`W6"ØW6"c�< , thennoothervariablecantake on avaluethatoverlapswith the `W6"` to ØW6"c rectangle.b. Forclassscheduling,onepossibilityis to havethreevariablesfor eachclass,onewithtimesfor values(e.g. MWF8:00,TuTh8:00,MWF9:00,...), onewith classroomsfor values(e.g. Wheeler110,Evans330,...) andonewith instructorsfor values(e.g. Abelson,Bibel,Canny, ...). Constraintssaythatonly oneclasscanbein thesameclassroomat thesametime,andan instructorcanonly teachoneclassat a time. Theremaybeotherconstraintsaswell(e.g.aninstructorshouldnothave two consecutive classes).5.6 The exact stepsdependon certainchoicesyou are free to make; hereare the onesImade:a. Choosethe ÙRÚ variable.Its domainis ÛW`W6"ZWÜ .b. Choosethevalue1 for Ù Ú . (We can’t choose0; it wouldn’t survive forwardchecking,becauseit would force Ý to be0, andtheleadingdigit of thesummustbenon-zero.)c. ChooseÝ , becauseit hasonly oneremainingvalue.d. Choosethevalue1 for Ý .e. Now Ù { andX « aretied for minimumremainingvaluesat2; let’s chooseÙ { .f. Eithervaluesurvivesforwardchecking,let’s choose0 for Ù { .g. Now Ùg« hastheminimumremainingvalues.h. Again,arbitrarily choose0 for thevalueof Ùg« .i. The variable n mustbe an even number(becauseit is the sumof ¾�fw¾ lessthan5(becausen®fun�XzÖwfuZW`¨ly` ). Thatmakesit mostconstrained.j . Arbitrarily choose4 asthevalueof n .k. Ö now hasonly 1 remainingvalue.l. Choosethevalue8 for Ö .m. ¾ now hasonly 1 remianingvalue.n. Choosethevalue7 for ¾ .o.Þ mustbeanevennumberlessthan9; chooseÞ .p. Theonly valuefor Þ thatsurvivesforwardcheckingis 6.q. Theonly variableleft is ß .r . Theonly valueleft for ß is 3.http://librosysolucionarios.net25s. This is asolution.This is a rathereasy(under-constrained) puzzle,so it is not surprisingthat we arrive at asolutionwith no backtracking(giventhatwe areallowedto useforwardchecking).5.7 Thereare implementationsof CSPalgorithmsin the Java, Lisp, andPythonsectionsof theonlinecoderepository;theseshouldhelpstudentsgetstarted.However, studentswillhave to addcodeto keepstatisticson theexperiments,andperhapswill want to have somemechanismfor making an experimentreturn failure if it exceedsa certain time limit (ornumber-of-stepslimit). Theamountof codethatneedsto bewritten is small; theexerciseismoreaboutrunningandanalyzinganexperiment.5.8 We’ll tracethrougheachiterationof thewhile loop in AC-3 (for onepossibleorderingof thearcs):a. Remove ~²5h�uß[5 , deleteÖ from ~²5 .b. Remove ~²5h�uà , delete> from ~²5 , leaving only ¬ .c. Remove Pá¾®�uß[5 , deleteÖ from PR¾ .d. Remove Pá¾®�u~²5 , delete¬ from PR¾ , leaving only > .e. Remove P�~²ßâ�u~²5 , delete¬ from P�~²ß .f. Remove P�~²ßâ�uà , delete> from Pg~²ß , leaving only Ö .g. Remove ãÕ�uPR¾ , delete> from ã .h. Remove ãÕ�u~²5 , delete¬ from ã .i. remove ã®�®Pg~²ß , deleteÖ from ã , leaving nodomainfor ã .5.9 On a tree-structuredgraph,no arc will be consideredmore than once,so the AC-3algorithmis noG�µ¨]�J , where µ is the numberof edgesand ] is the sizeof the largestdo-main.5.10 Thebasicideais to preprocesstheconstraintsso that, for eachvalueof Ùáä , we keeptrackof thosevariablesÙáå for whichanarcfrom Ùáå to Ù ä is satisfiedby thatparticularvalueof Ù ä . This datastructurecanbe computedin time proportionalto thesizeof theproblemrepresentation.Then,whena valueof Ù ä is deleted,we reduceby 1 thecountof allowablevaluesfor each G�Ùáå86�Ù ä J arc recordedunderthat value. This is very similar to the forwardchainingalgorithmin Chapter7. See? (?) for detailedproofs.5.11 Theproblemstatementsetsout thesolutionfairly completely. To expresstheternaryconstrainton 5 , > and \ that 5mfz>�Xæ\ , we first introducea new variable, 5A> . If thedomainof 5 and > is the setof numbersP , thenthe domainof 5A> is the setof pairsofnumbersfrom P , i.e. P�l·P . Now therearethreebinaryconstraints,onebetween5 and5A> sayingthatthevalueof 5 mustbeequalto thefirst elementof thepair-valueof 5A> ; onebetween> and 5&> sayingthat thevalueof > mustequalthesecondelementof thevalueof 5A> ; andfinally onethatsaysthat thesumof thepair of numbersthat is thevalueof 5&>mustequalthevalueof \ . All otherternaryconstraintscanbehandledsimilarly.Now that we canreducea ternaryconstraintinto binary constraints,we canreducea4-ary constrainton variables5A6">ç6"\v6"] by first reducing 5&6">ç6"\ to binary constraintsashttp://librosysolucionarios.net26 Chapter 5. ConstraintSatisfactionProblemsshown above, thenaddingback ] in a ternaryconstraintwith 5&> and \ , andthenreducingthis ternaryconstraintto binaryby introducing \o] .By induction,we canreduceany # -ary constraintto an G�#»�®ZWJ -ary constraint.We canstopat binary, becauseany unaryconstraintcanbedropped,simply by moving theeffectsoftheconstraintinto thedomainof thevariable.5.12 A simplealgorithmfor finding a cutsetof no morethan � nodesis to enumerateallsubsetsof nodesof size ZW6" W6"è"è"èé6"� , andfor eachsubsetcheckwhethertheremainingnodesform a tree.Thisalgorithmtakestimeêë�ì�í !! å , which is noG�# å J .Becker andGeiger(1994;http://citeseer.nj.nec.com/becker94approximation.html) giveanalgorithmcalledMGA (modifiedgreedyalgorithm)thatfindsacutsetthatis nomorethantwice thesizeof theminimalcutset,usingtime n¨G�µ·f·à·¶ ¸W¹KG�à?J�J , whereµ is thenumberofedgesand à is thenumberof variables.Whetherthis makes the cycle cutsetapproachpracticaldependsmore on the graphinvolved thanon theagorithmfor finding a cutset.That is because,for a cutsetof size } , westill have anexponentialG�U�î�J factorbeforewe cansolve theCSP. Soany graphwith a largecutsetwill beintractibleto solve,evenif we couldfind thecutsetwith no effort atall.5.13 The “Zebra Puzzle”canbe representedasa CSPby introducinga variablefor eachcolor, pet,drink,countryandcigaretbrand(atotalof 25variables).Thevalueof eachvariableis anumberfrom 1 to 5 indicatingthehousenumber. This is agoodrepresentationbecauseiteasyto representall theconstraintsgivenin theproblemdefinitionthis way. (We have doneso in the Pythonimplementationof the code,andat somepoint we may reimplementthisin theotherlanguages.)Besideseaseof expressinga problem,theotherreasonto choosearepresentationis theefficiency of finding a solution. herewe have mixed results—onsomeruns,min-conflictslocal searchfindsa solutionfor this problemin seconds,while on otherrunsit fails to find asolutionafterminutes.Anotherrepresentationis to have fivevariablesfor eachhouse,onewith thedomainofcolrs,onewith pets,andsoon.http://librosysolucionarios.netSolutionsfor Chapter6AdversarialSearch6.1 FigureS6.1showsthegametree,with theevaluationfunctionvaluesbelow theterminalnodesandthebacked-upvaluesto theright of thenon-terminalnodes.Thevaluesimply thatthebeststartingmove for X is to take thecenter. Theterminalnodeswith a bold outlinearetheonesthatdo notneedto beevaluated,assumingtheoptimalordering.xxxx o xox o xoxoxoxox o xoxoxoxo1−1 1 −21 −1 0 0 1 −1 −2 0 −1 01 2FigureS6.1 Partof thegametreefor tic-tac-toe,for Exercise6.1.6.2 Considera MIN nodewhosechildrenare terminalnodes. If MIN playssuboptimally,thenthevalueof thenodeis greaterthanor equalto thevalueit would have if MIN playedoptimally. Hence,the value of the MAX nodethat is the MIN node’s parentcan only beincreased.This argumentcanbe extendedby a simpleinductionall theway to the root. Ifthesuboptimalplay by MIN is predictable, thenonecando betterthana minimax strategy.For example,if MIN always falls for a certainkind of trap and loses,thensettingthe trapguaranteesa win even if thereis actuallya devastatingresponsefor MIN. This is shown inFigureS6.2.6.3a. (5) Thegametree,completewith annotationsof all minimaxvalues,is shown in Fig-ureS6.3.b. (5) The“?” valuesarehandledby assumingthatanagentwith a choicebetweenwin-ning thegameandenteringa “?” statewill alwayschoosethewin. That is, min(–1,?)is –1 andmax(+1,?)is +1. If all successorsare“?”, thebacked-upvalueis “?”.27http://librosysolucionarios.net28 Chapter 6. AdversarialSearchMAXMINa1AïB D−101000 10002ðbñ 3òbñ1bñ2ðdó1dó3òdó−5−5−5a2ôFigureS6.2 A simplegametreeshowing thatsettingatrapfor MIN by playing õ"ö is awinif MIN falls for it, but mayalsobedisastrous.Theminimaxmove is of courseõ÷ , with valueø § .(1,4)ù(2,4)ù(2,3)ù(1,3)ù(1,2)ù(3,2)ù(3,4)ù(4,3)ù(3,1)ù(2,4)ù(1,4)ù+1−1???−1−1−1+1+1+1Figure S6.3 Thegametreefor thefour-squaregamein Exercise6.3. Terminalstatesarein singleboxes,loop statesin doubleboxes.Eachstateis annotatedwith its minimaxvaluein a circle.c. (5) Standardminimax is depth-firstandwould go into an infinite loop. It canbefixedby comparingthecurrentstateagainstthestack;andif thestateis repeated,thenreturna “?” value. Propagationof “?” valuesis handledasabove. Althoughit works in thiscase,it doesnot alwayswork becauseit is not clearhow to compare“?” with a drawnposition;nor is it clearhow to handlethecomparisonwhentherearewins of differentdegrees(asin backgammon).Finally, in gameswith chancenodes,it is unclearhow tohttp://librosysolucionarios.net29computetheaverageof a numberanda “?”. Notethatit is not correctto treatrepeatedstatesautomaticallyasdrawn positions;in this example,both(1,4) and(2,4) repeatinthetreebut they arewonpositions.Whatis really happeningis thateachstatehasa well-definedbut initially unknownvalue.Theseunknown valuesarerelatedby theminimaxequationatthebottomofpage163. If thegametreeis acyclic, thentheminimaxalgorithmsolvestheseequationsbypropagatingfrom theleaves.If thegametreehascycles,thena dynamicprogrammingmethodmustbeused,asexplainedin Chapter17. (Exercise17.8studiesthisprobleminparticular.) Thesealgorithmscandeterminewhethereachnodehasa well-determinedvalue(asin this example)or is really aninfinite loop in thatbothplayerspreferto stayin theloop(or havenochoice).In suchacase,therulesof thegamewill needto definethevalue(otherwisethegamewill neverend).In chess,for eaxmple,astatethatoccurs3 times(andhenceis assumedto bedesirablefor bothplayers)is adraw.d. This questionis a little tricky. Oneapproachis a proof by inductionon thesizeof thegame.Clearly, thebasecase#áX�e is a lossfor A andthebasecase#áXÍÅ is a win forA. For any #TwÅ , theinitial movesarethesame:A andB bothmove onesteptowardseachother. Now, we canseethat they areengagedin a subgameof size #y�ú on thesquares4 W6"è"è"è�6�#��³Z�< , exceptthat thereis anextra choiceof moveson squares and#��úZ . Ignoringthis for a moment,it is clearthat if the“ #y�m ” is won for A, thenAgetsto the square#û�zZ beforeB getsto square (by the definition of winning) andthereforegetsto # beforeB getsto Z , hencethe “ # ” gameis won for A. By thesameline of reasoning,if “ #��Õ ” is won for B then“ # ” is won for B. Now, thepresenceoftheextramovescomplicatestheissue,but not too much.First, theplayerwho is slatedto win thesubgame4 W6"è"è"è�6�#y�ÕZ�< never movesbackto his homesquare.If theplayerslatedto losethesubgamedoesso,thenit is easyto show thathe is boundto losethegameitself—theotherplayersimply movesforwardanda subgameof size #��m W� isplayedonestepcloserto theloser’s homesquare.6.4 See"search/algorit hms/ games .l is p" for definitionsof games,game-playingagents,andgame-playingenvironments."search/algorit hms/ mi nim ax .l is p" con-tainsthe minimax andalpha-betaalgorithms. Notice that the game-playingenvironmentisessentiallya genericenvironmentwith theupdatefunctiondefinedby therulesof thegame.Turn-takingis achievedby having agentsdo nothinguntil it is their turn to move.See"search/domains /co gnac .l is p" for thebasicdefinitionsof asimplegame(slightly morechallengingthanTic-Tac-Toe). Thecodefor this containsonly a trivial eval-uation function. Studentscan useminimax and alpha-betato solve small versionsof thegameto termination(probablyup to Å�lwe ); they shouldnoticethat alpha-betais far fasterthanminimax,but still cannotscaleupwithoutanevaluationfunctionandtruncatedhorizon.Providing an evaluationfunction is an interestingexercise. From the point of view of datastructuredesign,it is alsointerestingto look at how to speedup thelegal move generatorbyprecomputingthedescriptionsof rows,columns,anddiagonals.Very few studentswill have heardof kalah, so it is a fair assignment,but the gameis boring—depth6 lookaheadanda purely material-basedevaluationfunction areenoughhttp://librosysolucionarios.net30 Chapter 6. AdversarialSearchto beatmosthumans.Othello is interestingandaboutthe right level of difficulty for moststudents.Chessandcheckers aresometimesunfair becauseusually a small subsetof theclasswill beexpertswhile therestarebeginners.6.5 This questionis not ashardasit looks. Thederivationbelow leadsdirectly to a defini-tion of ü andý values.Thenotation# ä refersto (thevalueof) thenodeatdepthþ on thepathfrom theroot to theleafnode#*ÿ . Nodes# ä «(è"è"è�# ä�� ë arethesiblingsof nodeþ .a. Wecanwrite # { X������dG�# Ú 6�# Ú�« 6"è"è"èO6�# Ú�� J , giving#�«�X�� � G������dG�#|ÚW6�#|Ú�«6"è"è"è�6�# Ú� J�6�# { «6"è"è"èO6�# { �� JThen#|Ú canbesimilarly replaced,until we have anexpressioncontaining#*ÿ itself.b. In termsof the � and� values,we have# « X�� � G�� { 6������|G�� Ú 6�# Ú 6�� Ú J�6�� { JAgain, #|Ú can be expandedout down to #*ÿ . The most deeplynestedterm will be� � G�� ÿ 6�# ÿ 6�� ÿ J .c. If # ÿ is a maxnode,thenthe lower boundon its valueonly increasesasits successorsareevaluated.Clearly, if it exceeds�Âÿ it will haveno furthereffecton #�« . By extension,if it exceeds� � G�� { 6�� É 6"è"è"è 6�� ÿOJ it will have no effect. Thus,by keepingtrack of thisvaluewe candecidewhento prune#*ÿ . This is exactlywhat ü -ý does.d. Thecorrespondingboundfor min nodes#*å is �����KG�� Ú�6�����6"è"è"è 6�� å8J .6.7 Thegeneralstrategy is to reducea generalgametreeto a one-plytreeby inductiononthedepthof thetree. Theinductive stepmustbedonefor min, max,andchancenodes,andsimply involvesshowing thatthetransformationis carriedthoughthenode.Supposethatthevaluesof thedescendantsof anodeareH « è"è"è2H ! , andthatthetransformationis ± H×f»p , where± is positive. Wehave� � G�± H « fup6"± H { fup6"è"è"è 6"±�H ! fupJ�X¯±�� � G�H « 6�H { 6"è"è"è�6�H ! JKfup�����KG�± H�«bfup6"± H { fup6"è"è"è 6"±�H ! fupJ�X¯±�� � G�Hb«6�H { 6"è"è"è�6�H ! JKfup� « G�±�H « f®p�J*f � { G�± H { fupJKf*ck"x"x;f � ! G�±�H ! fupJ�X¯±�G � « H « f � { H { f*ck"x"x � ! H ! J|fupHencetheproblemreducestoaone-plytreewheretheleaveshavethevaluesfrom theoriginaltreemultiplied by thelinear transformation.SinceH·úI��¯± H�fmpº�± IáfÕp if ±��` , thebestchoiceat theroot will bethesameasthebestchoicein theoriginal tree.6.8 This procedurewill give incorrectresults. Mathematically, the procedureamountstoassumingthat averagingcommuteswith min andmax, which it doesnot. Intuitively, thechoicesmadeby eachplayerin thedeterministictreesarebasedon full knowledgeof futuredice rolls, andbearno necessaryrelationshipto the movesmadewithout suchknowledge.(Noticetheconnectionto thediscussionof cardgameson page179andto thegeneralprob-lem of fully andpartially observableMarkov decisionproblemsin Chapter17.) In practice,themethodworksreasonablywell, andit might beagoodexerciseto have studentscompareit to thealternative of usingexpectiminimaxwith sampling(ratherthansummingover) dicerolls.http://librosysolucionarios.net316.9 Codenot shown.6.10 Thebasicphysicalstateof thesegamesis fairly easyto describe.Oneimportantthingto rememberfor Scrabbleandbridgeis that thephysicalstateis not accessibleto all playersandsocannotbeprovideddirectly to eachplayerby theenvironmentsimulator. Particularlyin bridge,eachplayerneedsto maintainsomebestguess(or multiple hypotheses)asto theactualstateof theworld. We expectto beputtingsomeof thegameimplementationsonlineasthey becomeavailable.6.11 Onecan think of chanceeventsduring a game,suchasdice rolls, in the samewayashiddenbut preordainedinformation(suchasthe orderof the cardsin a deck). The keydistinctionsarewhethertheplayerscaninfluencewhat informationis revealedandwhetherthereis any asymmetryin theinformationavailableto eachplayer.a. Expectiminimaxis appropriateonly for backgammonand Monopoly. In bridge andScrabble,eachplayerknows thecards/tilesheor shepossessesbut not theopponents’.In Scrabble,thebenefitsof a fully rational,randomizedstrategy thatincludesreasoningabouttheopponents’stateof knowledgeareprobablysmall,but in bridgethequestionsof knowledgeandinformationdisclosurearecentralto goodplay.b. None,for thereasonsdescribedearlier.c. Key issuesincludereasoningabouttheopponent’s beliefs,theeffect of variousactionson thosebeliefs,andmethodsfor representingthem. Sincebelief statesfor rationalagentsareprobabilitydistributionsoverall possiblestates(includingthebeliefstatesofothers),this is nontrivial.function MAX-VALUE( �������� ) returns �"!#�%$'&($'��)+*,�-&(!# if TERMINAL-TEST( �.������ ) then return UTIL ITY( �.������ )*0/ ø21for �435� in SUCCESSORS( �.�%�-�% ) doif WINNER( � ) = MAXthen *0/ MAX(v, MAX-VALUE( � ))else *0/ MAX(v, M IN-VALUE( � ))return *FigureS6.4 Partof themodifiedminimaxalgorithmfor gamesin which thewinnerof theprevioustrick playsfirst on thenext trick.6.12 (In thefirst printing,thisexericserefersto WINNER( ���,6�M�N ); subsequentprintingsreferto WINNER( � ), denotingthewinnerof thetrick justcompleted(if any), ornull.) Thisquestionis interpretedasapplyingonly to theobservablecase.a. Themodificationto MAX-VALUE is shown in FigureS6.4.If MAX hasjustwonatrick,MAX getsto playagain,otherwiseplayalternates.Thus,thesuccessorsof aMAX nodehttp://librosysolucionarios.net32 Chapter 6. AdversarialSearch MAX MINS 2H 6 4D 6 C 9,8 10,5 MAX MINS 2H 4D 6 C 9,8 10,5 MAX MINS 2H 6 4D C 9,8 10,5 MAX MINS 2H 6 4D 6 C 9 10,5 MAX MINS 2H D 6 C 9,8 10,5 MAX MINS 2H D 6 C 9 10,5 MAX MINS 2H D C 9,8 10,5 MAX MINS H D C 9,8 10,5 MAX MINS 2H D C 9,8 10 MAX MINS 2H D C 9,8 5 MAX MINS H D C 9 10,5 MAX MINS 2H D C 9 10 MAX MINS 2H D C 9 5 MAX MINS H D C 9 5 MAX MINS H D C 9 10 MAX MINS H D C 9 MAX MINS H D C 10 MAX MINS 2H D C 9 MAX MINS 2H D C 9 MAX MINS H D C 9 MAX MINS 2H D C MAX MINS 2H D 6 C 9 10 MAX MINS 2H D 6 C 9 5 MAX MINS 2H D C 9 10 MAX MINS 2H D 6 C 10 MAX MINS H D C 9 10 MAX MINS 2H D C 9 MAX MINS H D C 10 MAX MINS 2H D C MAX MINS H D 6 C MAX MINS 2 H D 6 C MAX MINS H D 6 C 9 5 MAX MINS 2H D 6 C 9 MAX MINS H D C 9 5 MAX MINS H D 6 C 5 MAX MINS H D C 9 MAX MINS H D 6 C MAX MINS 2H D 6 C MAX MINS 2H D C +2 +2 07+4 +2 +4 0707−2 +2+2 +2 07+4 +2 +4 0707−2 +2+2 +2 07+4 +2 +4 0707−2 +2+2+207+407+407+2 0707+207+2070707FigureS6.5 Ex. 6.12:Part of thegametreefor thecardgameshown on p.179.canbe a mixture of MAX andM IN nodes,dependingon the variouscardsMAX canplay. A similar modificationis neededfor M IN-VALUE.b. Thegametreeis shown in FigureS6.5.6.13 The naive approachwould be to generateeachsuchposition,solve it, andstoretheoutcome.Thiswouldbeenormouslyexpensive—roughlyontheorderof 444billion seconds,or 10,000years,assumingit takes a secondon averageto solve eachposition (which isprobablyvery optimistic). Of course,we can take advantageof already-solved positionswhen solving new positions,provided thosesolved positionsare descendantsof the newpositions.To ensurethatthis alwayshappens,we generatethefinal positionsfirst, thentheirpredecessors, andsoon. In this way, theexactvaluesof all successorsareknown wheneachstateis generated.Thismethodis calledretrogradeanalysis./0 �./����2/��980�0���� 3;:�� :http://librosysolucionarios.net336.14 Themostobviouschangeis thatthespaceof actionsis now continuous.For example,in pool, thecueingdirection,angleof elevation,speed,andpoint of contactwith thecueballareall continuousquantities.Thesimplestsolutionis just to discretizetheactionspaceandthenapplystandardmeth-ods.This might work for tennis(modelledcrudelyasalternatingshotswith speedanddirec-tion), but for gamessuchas pool and croquetit is likely to fail miserablybecausesmallchangesin directionhave large effects on actionoutcome. Instead,onemust analyzethegameto identify a discretesetof meaningfullocal goals,suchas“potting the4-ball” in poolor “laying up for thenext hoop” in croquet.Then,in thecurrentcontext, a localoptimizationroutinecanwork out thebestwayto achieveeachlocalgoal,resultingin adiscretesetof pos-siblechoices.Typically, thesegamesarestochastic,sothebackgammonmodelis appropriateprovidedthatweusesampledoutcomesinsteadof summingover all outcomes.Whereaspool andcroquetaremodelledcorrectlyasturn-takinggames,tennisis not.While oneplayeris moving to theball, theotherplayeris moving to anticipatetheopponent’sreturn.This makestennismorelike thesimultaneous-actiongamesstudiedin Chapter17. Inparticular, it maybereasonableto derive randomizedstrategiesso that theopponentcannotanticipatewheretheball will go.6.15 The minimax algorithm for non-zero-sumgamesworks exactly as for multiplayergames,describedon p.165–6;that is, the evaluationfunction is a vectorof values,oneforeachplayer, andthebackupstepselectswhichever vectorhasthehighestvaluefor theplayerwhoseturn it is to move. Theexampleat theendof Section6.2(p.167)shows thatalpha-betapruning is not possiblein generalnon-zero-sumgames,becausean unexaminedleaf nodemight beoptimalfor bothplayers.6.16 With 32 pieces,eachneeding6 bits to specifyits positionon oneof 64 squares,weneed24bytes(6 32-bitwords)tostoreaposition,sowecanstoreroughly20million positionsin thetable(ignoringpointersfor hashtablebucket lists). This is aboutone-ninthof the180million positionsgeneratedduringa three-minutesearch.Generatingthe hashkey directly from an array-basedrepresentationof the positionmight be quite expensive. Modernprograms(see,e.g.,Heinz, 2000)carry alongthe hashkey andmodify it aseachnew positionis generated.Supposethis takeson theorderof 20operations;thenon a 2GHz machinewherean evaluationtakes2000operationswe candoroughly100lookupsperevaluation.Usinga roughfigureof onemillisecondfor adisk seek,we coulddo 1000evaluationsper lookup. Clearly, usinga disk-residenttableis of dubiousvalue,evenif wecangetsomelocality of referenceto reducethenumberof disk reads.http://librosysolucionarios.netSolutionsfor Chapter7AgentsthatReasonLogically7.1 The wumpusworld is partially observable,deterministic,sequential(you needto re-memberthestateof onelocationwhenyou returnto it on a later turn), static,discrete,andsingleagent(thewumpus’s soletrick—devouring anerrantexplorer—is not enoughto treatit asanagent).Thus,it is a fairly simpleenvironment.Themaincomplicationis thepartialobservability.7.2 To save space,we’ll show the list of modelsasa tableratherthana collectionof dia-grams.Thereareeightpossiblecombinationsof pits in thethreesquares,andfour possibili-tiesfor thewumpuslocation(includingnowhere).We can seethat <y>>= X ü { becauseevery line where <y> is true also has ü { true.Similarly for üBÚ .7.3a. Thereis a pl true in the Pythoncode,anda versionof ask in the Lisp codethatservesthesamepurpose.TheJava codedid nothave this functionasof May 2003,butit shouldbeaddedsoon.)b. Thesentences¾?�V�A@ , BDCFEGB , and BDHIEJB canall bedeterminedto betrueor falseinapartialmodelthatdoesnot specifythetruth valuefor B .c. It is possibleto createtwo sentences,eachwith � variablesthatarenot instantiatedinthepartialmodel,suchthatoneof themis truefor all å possiblevaluesof thevariables,while theothersentenceis falsefor oneof the å values.Thisshowsthatin generalonemustconsiderall å possibilities.Enumeratingthemtakesexponentialtime.d. The python implementationof pl true returnstrue if any disjunctof a disjunctionis true,andfalseif any conjunctof a conjunctionis false. It will do this even if otherdisjuncts/conjunctscontainsuninstantiatedvariables.Thus,in thepartialmodelwhereB is true, BKCçã returnstrue,and EJB�Hçã returnsfalse.But thetruthvaluesof ãLCME_ã ,ãDC?¾?�V�A@ , and ãDHNE²ã arenotdetected.e. Ourversionof tt entails alreadyusesthismodifiedpl true . It wouldbeslowerif it did not.7.4 Remember, üO= X ý if f in very modelin which ü is true, ý is alsotrue.Therefore,a. A valid sentenceis onethatis truein all models.Thesentence¾?�V�A@ is alsovalid in allmodels.Soif
EDU621_2_1
Condutas em Situações de Trabalho
Bullying nas Escolas
Etarismo: Preconceito por Idade
A democratização da cultura
Conectivos e Cláusulas MUN
Representação gráfica de rotinas administrativas
Métodos de avaliação para inversões de capital
INSTRUCAO_REDACAO_7ENEM
Instruções para Redação ENEM
MaitA Maria
Polarização e Intolerância na Sociedade
Orientações para a Correção - PDF
Na maioria dos materiais, enfatizando os metálicos, nos processos e sistemas metalúrgicos e de materiais de interesse, o equilíbrio de fases envolv...
As moléculas dos polímeros como cadeias lineares, desprezando-se o arranjo em zigue-zague dos átomos da cadeia principal. As ligações simples na ca...
Na difusão volumétrica ocorre quando os átomos se movem, através do cristal de uma posição cristalina ou intersticial para outra. E com isso, devid...
O corpo de prova é preso por suas extremidades, nas garras de fixação do dispositivo de testes. A máquina de ensaio de tração alonga o corpo de pro...
O carbono, é elemento que pode existir na forma de grafita e na forma de diamante. Como pode a mesma composição química ter característica tão dist...
MECANISMO CORRETO DE PRODUÇÃO DE TEXTOS DENOMINADOS COERENCIA TEXTUAL
Qual será o plano melhor a ser aceitoI - A parte fixa no valor de 1.200,00 e mais 1% de comissãoII - Uma parte fixa de 1.000,00 e 5% de comissão...
Pergunta 815%.11%.2,5%.3%.7%.Inserir a estratégia no plano tático, que envolve, por exemplo, a definição do orçamento. “Deve-se verificar se ...
Pergunta 4Para pagamentos eventuais, no caso de processos trabalhistas.Com o isolamento social que teve inicio com o período de pandemia, as pess...
Pergunta 3Apenas papel de ativo e proativo, porém sem se tornar produtorPapel de produtor de conteúdoApenas proativo, uma vez que se tornou prod...
Pergunta 2Segundo Costa (2017, p. 58), o advento da era da informação criou um tipo de consumidor com diferentes perspectivas, desafios e oportuni...
Pergunta 1Podcasts, TikTok e TwitterFacebook, Twitter e ReelsPodcasts, Reels e Facebook.Podcasts, TikTok e FacebookInstagram, Facebook e Podca...
A respeito do código de ética, julgue os itens que segue:I. É o código formal que serve de guia para a tomada de decisão em situações que envolvam...
edital-02_2024-cex_car
jogo-com-essas-silabas-eu-faco-palavras
04 Solucionario Inteligencia Artificial 2da Edicion Stuart Russell, Peter Norvig - Redação (2024)

Xochicalli