版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Chapter 6 Adversarial Search2OutlineGamesOptimal Decision In GamesAlpha-Beta PruningImperfect, Real-Time DecisionsGames That Include An Element of ChanceState-of-the-Art Game Programs3GamesSearch no adversarySolution is a (heuristic) method for finding goalHeuristics and CSP techniques can find opti
2、mal solutionEvaluation function: estimate of cost from start to goal through given nodesExamples: path planning, activities scheduling Games adversarySolution is strategy (strategy specifies move for every possible opponent reply).Time limits force an approximate solutionEvaluation function: evaluat
3、e “goodness” of game positionExamples: chess, checkers, Othello, backgammon 4GamesIn AI, “games” are usually of a rather specialized kindwhat game theorists call deterministic, turn-taking, two-player, zero-sum games of perfect information.In our terminology, this means deterministic, fully observab
4、le environment in which there are two agents whose actions must alternate and in which the utility values at the end of the game are always equal and opposite.For example, if one player wins a game of chess(+1),the other player necessarily loses it(-1). It is an opposition between the agents utility
5、 functions that makes the situation adversarial.5GamesGame history:Game playing was one of the first tasks undertaken in AI. By 1950, almost as soon as computers became programmable, chess had been tackled by Konrad Zuse, by Claude Shannon, by Norbert Wiener.Since then, there has been steady progres
6、s in the standard of play.6GamesExample:Bao 1.0Gipfted - GipfAnky - AmazonsMIA - LOAMagog - Go7GamesExample:8GamesExample:9GamesExample:10GamesExample:11GamesExample:12Optimal Decisions In GamesA game can be formally defined as a kind of search problem with the following components:The initial state
7、A successor functionA terminal testA utility functionThe initial state and the legal moves for each side define the game tree for The game. Figure 6.1 shows part of the game tree for tic-tac-toe. 13Optimal Decisions In GamesFig 6.1Max maximize my own utilityMin Minimize my utility by opponent14Optim
8、al Decisions In GamesOptimal strategies:In a normal search problem, the optimal solution would be a sequence of moves leading to a goal statea terminal state that is a win.In a game, on the other hand, MIN has something to say about it. MAX therefore must find a contingent strategy.15Optimal Decisio
9、ns In GamesEven a simple game like tic-tac-toe is too complex for us to draw the entire game tree, so we will switch to the trivial game in Figure 6.2.The possible moves for MAX at the root node are labeled a1,a2,and a3.The possible replies to a1 for MIN are b1,b2,b3, and so on.This particular game
10、ends after one move each by MAX and MIN.16Optimal Decisions In Games17Optimal Decisions In GamesObviously, the minimax value of a terminal state is just its utility.Furthermore, given a choice, MAX will prefer to move to a state of maximum value, whereas MIN prefers a state of minimum value. So we h
11、ave the following:18Optimal Decisions In GamesThe minimax algorithm:The minimax algorithm( Figure 6.3) computes the minimax decision from the current state.It uses a simple recursive computation of the minimax values of each successor state, directly implementing the defining equations.The recursion
12、 proceeds all the way down to the leaves of the trees, and then the minimax values are backed up through the tree as the recursion unwinds.19Optimal Decisions In Games20Optimal Decisions In GamesOptimal decisions in multiplayer games:Many popular games allow more than two players. Let us examine how
13、 to extend the minimax idea to multiplayer games:First, we need to replace the single value for each node with a vector of values.Now we have to consider nonterminal states. Consider the node X in the game tree in Figure 6.4.Multiplayer games usually involve alliances.If the game is not zero-sum, th
14、en collaboration can also occur with just two players.21Optimal Decisions In Games22Alpha-Beta PruningThe number of states of game is exponential to the number of movesAlthough we cant eliminate the exponent, we can effectively cut it in half:The trick is that it is possible to compute the correct m
15、inimax decision without looking at every node in the game tree.We can use alpha-beta pruning.23Alpha-Beta PruningConsider again the two-ply game tree from Figure 6.2:The steps are explained in Figure 6.5. The outcome is that we can identify the minimax decision without ever evaluating two of the lea
16、f nodes.24Alpha-Beta PruningAnother way to look at this is as a simplification of the formula for MINIMAX-VALUE. The value of the root node is given by:x and y are values of unexplored nodes. We do not need to know their values because it does not affect our decisionOur decision is independent to th
17、e prund leaves x and y25Alpha-Beta Pruning26Alpha-Beta Pruning27Alpha-Beta PruningAlpha-beta prunings general principle: Consider a node n somewhere in the tree( see Figure 6.6),such that Player has a choice of moving to that node. If Player has a better choice m either at the parent node of n or at
18、 any choice point further up, then n will never be reached in actual play. So once we have found out enough about n to reach this conclusion, we can prune it.28Alpha-Beta Pruning29Alpha-Beta PruningMinimax search is depth-first.Alpha-beta pruning gets its name from the following two parameters that
19、describe bounds on the backed-up values that appear anywhere along the path:30Alpha-Beta PruningAlpha-beta search updates the values of and as it goes along and prunes the remaining branches at a node( i.e., terminates the recursive call) as soon as the value of the current node is known to be worse
20、 than the current or value for MAX or MIN, respectively. The complete algorithm is given in Figure 6.7. We encourage the reader to trace its behavior when applied to the tree in Figure 6.5.31Alpha-Beta Pruning32Alpha-Beta Pruning33Alpha-Beta PruningTranspositionRepeated states as we have in other se
21、arch problemsDifferent permutations of the move sequence that end up in the same positionIt is worth to store the evaluation of this position in a hash table in the first time we encounter itSave time and resource for re-computing it34Alpha-Beta PruningThe hash table of previously seen positions is
22、traditionally called a transposition table.Transposition table provides us a dramatic effect, sometimes as much as doubling the reachable search depth in chess.On the other hand, if we are evaluating a million nodes per second, it is not practical to keep all of them in the transposition table.35Imp
23、erfect, Real-Time DecisionsAlpha-beta still has to search all the way to terminal states for at least a portion of the search space. This depth is usually not practical. Alter minimax or alpha-beta in two ways:The utility function is replaced by a heuristic evaluation function EVALThe terminal test
24、is replaced by a cutoff test that decides when to apply EVAL.36Imperfect, Real-Time DecisionsEvaluation function:An evaluation function returns an estimate of the expected utility of the game from a given position, just as the heuristic function of Chapter 4 return an estimate of the distance to the
25、 goal.How exactly do we design good evaluation functions?37Imperfect, Real-Time DecisionsFirst, the evaluation function should order the terminal states in the same way as the true utility; otherwise, an agent using it might select suboptimal moves even if it can see ahead all the way to the end of
26、the game.Second, the computation must not take too long!Third, for nonterminal states, the evaluation function should be strongly correlated with the actual chances of winning.38Imperfect, Real-Time DecisionsMost evaluation function work by calculating various features of the state. The features, ta
27、ken together, define various categories or equivalence classes of states:The states in each category have the same values for all the features.Any given category, generally speaking, will contain some states that lead to wins, some that lead to draws, and some that lead to losses.The evaluation func
28、tion cannot know which states are which, but it can return a single value that reflects the proportion of states with each outcome.39Imperfect, Real-Time DecisionsIn principle, the expected value can be determined for each category, resulting in an evaluation function need not return actual expected
29、 values, as long as the ordering of the states is the same.In practice, this kind of analysis requires too many categories and hence too much experience to estimate all the probability of winning.Instead, most evaluation functions compute separate numerical contributions from each feature and then c
30、ombine them to find the total value.40Imperfect, Real-Time DecisionsIn a chess game, chess book may suggest some approximate material value for each piece of chess on the boardA pawn is worth 1A knight is worth 3The queen is worth 9Other features may beGood pawn structure might be worth half a pawnT
31、he final evaluation is to sum up all these values41Imperfect, Real-Time DecisionsMathematically, this kind of evaluation function is called a weighted linear function, because it can be expressed as:wi: weight;fi: a value of the position.42Imperfect, Real-Time Decisions43Imperfect, Real-Time Decisio
32、nsThe feature and weights are not part of the rules of chess, they come from centuries of human chess-playing experience.The weights of the evaluation function can be estimated by the machine learning techniques.44Imperfect, Real-Time DecisionsCutting off search:The next step is to modify Alpha-Beta
33、-Search.In terms of implementation, we replace the two lines in Figure 6.7 that mention TERMINAL-TEST with the following line:45Imperfect, Real-Time DecisionsWe also must arrange for some bookkeeping so that the current depth is incremented on each recursive call:The most straightforward approach to
34、 control the amount of search: set a fixed depth limit.A more robust approach: apply iterative deepening.46Imperfect, Real-Time DecisionsA more sophisticated cutoff test is needed.The evaluation should be applied only to positions that are quiescentthat is, unlikely to exhibit wild swings in value i
35、n the near future.Nonquiescent positions can be expanded further until quiescent positions are reached. This extra search is called a quiescence search. 47Imperfect, Real-Time DecisionsHorizon effectMore difficult to eliminateIt arises when the program is facing a move by the opponent that causes se
36、rious damage and is ultimately unavoidableConsider the chess game in Figure 6.9.It looks like Black will win soonHowever, when the pawn on 7th row move forward to 8th, it will become a queen and White win the game!The touching of horizon reverse the seemingly win of Black48Imperfect, Real-Time Decis
37、ions49Imperfect, Real-Time DecisionsSingular extensionA move that is “clearly better” than all other moves in a given position. Quite effective in avoiding the horizon effect without adding too much search costAlthough horizon effect is not often to occur It will find the eventual queening move of W
38、hite pawnCan go beyond the normal depth limit without incurring much cost because its branching factor is 1Quiescence search is a special case of singular extension50Imperfect, Real-Time Decisions51Imperfect, Real-Time DecisionsForward pruningsome moves at a given node are pruned immediately without
39、 further consideration.Most humans playing chess only consider a few moves from each position.Unfortunately, the approach is rather dangerous because there is no guarantee that the best move will not be pruned away.It should be used when there are two similar nodes, we may choose one to movewe are a
40、lready deep in the search tree52Imperfect, Real-Time DecisionsA program combining all these techniques can play creditable chess (or other games).Assume we have implemented an evaluation function for chess a reasonable cutoff test with a quiescence search and a large transposition table.Let us also
41、assume that, after months of tedious bit-bashing, we can generate and evaluate around a million nodes per second on the latest PC, allowing us to search roughly 200 million nodes per move under standard time control( three minutes per move).53Imperfect, Real-Time DecisionsBranching factor for chess
42、is about 35, on average, and 355 is about 50 million.Minimax search: we could look ahead only about five plies.Easily fooled by average human player who plan around 8 plies aheadAlpha-beta search: we get to about 10 plies.Similar to expert level human playersSection 6.7 describes additional pruning
43、techniques that can extend the effective search depth to roughly 14 plies.54Games That Include An Element Of ChangeIn real life, there are many unpredictable events that put us into unforeseen situation.Many games mirror this unpredictability by including a random element.In this way, they take us a
44、 step nearer reality, and it is worthwhile to see how this affects the decision-making process.55Games That Include An Element Of Change56Games That Include An Element Of ChangeWhite cannot construct a standard game tree of the sort we saw in chess and tic-tac-toe.A game tree in backgammon must incl
45、ude chance nodes in addition to MAX and MIN nodes.57Games That Include An Element Of Change58Games That Include An Element Of ChangeHow to make correct decisions?Problems:The resulting positions do not have definite minimax values.Instead, we can only calculate the expected value, where the expectat
46、ion is taken over all the possible dice rolls that occur.59Games That Include An Element Of ChangeTerminal nodes and MAX and MIN nodes( for which the dice roll is known) work exactly the same way as before.Chance nodes are evaluated by taking the weighted average of the values resulting from all pos
47、sible dice rolls, that is:60Games That Include An Element Of Change61Games That Include An Element Of ChangeWhere the successor function for a chance node n simply augments the state of n with each possible dice roll to produce each successor s.P(s): The probability that that dice roll occurs.62Game
48、s That Include An Element Of ChangePosition evaluation in games with chance nodes:One might think that evaluation functions for games such as backgammon should be just like functions for chess. But in fact, the presence of chance nodes means that one has to be more careful about what the evaluation
49、values mean.Figure 6.12 shows what happens. 63Games That Include An Element Of Change64Games That Include An Element Of ChangeIt turns out that, to avoid this sensitivity, the evaluation function must be a positive linear transformation of the probability of winning from a position( or, more general
50、ly, of the expected utility of the position).65Games That Include An Element Of ChangeComplexity of expectiminimax:If the program knew in advance all the dice rolls that would occur for the rest of the game, solving a game with dice would be just like solving a game without dice.Even if the search d
51、epth is limited to some small depth d, the extra cost compared with that of minimax makes it unrealistic to consider looking ahead very far in most games of chance.66Games That Include An Element Of ChangeAnother way to think about the problem is this: the advantage of alpha-beta is that it ignores
52、future developments that just are not going to happen, given best play.Thus, it concentrates on likely occurrences.67Games That Include An Element Of ChangeIn games with dice, there are no likely sequences of movesThis is a general problem whenever uncertainly enters the picture: the possibilities a
53、re multiplied enormouslyForming detailed plans of action becomes pointless, because the world probably will not play along.68Games That Include An Element Of ChangeNo doubt it will have occurred to the reader that perhaps something like alpha-beta pruning could be applied to game trees with chance n
54、odes.The analysis for MIN and MAX nodes is unchanged.But we can also prune chance nodes, using a bit of ingenuity.69Games That Include An Element Of ChangeIs it possible to find an upper bound on the value of C in Figure 6.11 before we have looked at all its children?At first sight, it might seem im
55、possible, because the value of C is the average of its childrens values.Until we have looked at all the dice rolls, this average could be anything, because the unexpected children might have any value at all.70Games That Include An Element Of ChangeBut if we put bounds on the possible values of the
56、utility function, then we can arrive at bounds for the average.71Games That Include An Element Of ChangeCard games:Card games are interesting for many reasons besides their connection with gambling.Cards are dealt randomly at the beginning of the game, with each player receiving a hand of cards that
57、 is not visible to the other players.Such games include bridge, whist, hearts, and some forms of poker.72Games That Include An Element Of ChangeCard games are just like dice games: the cards are dealt randomly and determine the moves available to each player.But all the dice are rolled at the beginn
58、ing! We will pursue this observation further.It will turn out to be quite useful in practice.It is also quite wrong, for interesting reasons.73Games That Include An Element Of ChangeImagine two players, MAX and MIN, playing some practice hands of four-card two handed bridge with all the cards showin
59、gMAX: 6 9 8 6 MIN: 2 4 10 5MAX: 6 8 6 MIN: 2 4 10 5MAX: 6 8 6 MIN: 2 4 5MAX: has no , so 6 8 6 MIN: 2 4 5MAX will win both remaining tricks and the game will be tied at two tricks each74Games That Include An Element Of ChangeThis is easy when we know what cards our opponent hasHow about not?MAX: 6 9
60、 8 6 MIN: xx 4 10 5Usually, card games will not let opponents know all of my own cards75Games That Include An Element Of ChangeIf you think this is reasonable (or if you have no idea because you dont understand bridge), consider the following story:76Games That Include An Element Of ChangeDay 1:Road
溫馨提示
- 1. 本站所有資源如無(wú)特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 幼兒園美術(shù)葫蘆課程設(shè)計(jì)
- 環(huán)保型脈沖變壓器設(shè)計(jì)與生產(chǎn)技術(shù)
- 我愛郊游課程設(shè)計(jì)
- 折紙小狐貍課程設(shè)計(jì)
- 大學(xué)創(chuàng)業(yè)與就業(yè)機(jī)會(huì)探索
- 小班數(shù)字游戲課程設(shè)計(jì)
- 青島黃海學(xué)院《高級(jí)時(shí)裝紙樣與工藝設(shè)計(jì)》2023-2024學(xué)年第一學(xué)期期末試卷
- 整流斬波電路課程設(shè)計(jì)
- 控制課程設(shè)計(jì)選題
- 幼兒園繪畫花朵課程設(shè)計(jì)
- 圖文轉(zhuǎn)換-圖表(小題訓(xùn)練)(解析版)-2025年部編版中考語(yǔ)文一輪復(fù)習(xí)
- 七上語(yǔ)文期末考試復(fù)習(xí)計(jì)劃表
- 大數(shù)據(jù)+治理智慧樹知到期末考試答案章節(jié)答案2024年廣州大學(xué)
- 江蘇省建筑與裝飾工程計(jì)價(jià)定額(2014)電子表格版
- 電話機(jī)和對(duì)講機(jī)裝配實(shí)習(xí)報(bào)告
- 廣州美術(shù)學(xué)院關(guān)于本科畢業(yè)論文、畢業(yè)創(chuàng)作(設(shè)計(jì))工作的若干規(guī)定
- 壓力管道元件產(chǎn)品合格證
- 1000以內(nèi)自然數(shù)數(shù)數(shù)表
- 10KV變電站供電系統(tǒng)設(shè)計(jì)
- 起重機(jī)傳動(dòng)裝置的設(shè)計(jì)
- 15立方米的液氯儲(chǔ)罐課程設(shè)計(jì)說明書
評(píng)論
0/150
提交評(píng)論