iccv2019論文全集9089-approximating-the-permanent-by-sampling-from-adaptive-partitions_第1頁
iccv2019論文全集9089-approximating-the-permanent-by-sampling-from-adaptive-partitions_第2頁
iccv2019論文全集9089-approximating-the-permanent-by-sampling-from-adaptive-partitions_第3頁
iccv2019論文全集9089-approximating-the-permanent-by-sampling-from-adaptive-partitions_第4頁
iccv2019論文全集9089-approximating-the-permanent-by-sampling-from-adaptive-partitions_第5頁
已閱讀5頁,還剩7頁未讀 繼續(xù)免費閱讀

付費下載

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、Approximating the Permanent by Sampling from Adaptive Partitions Jonathan Kuck1, Tri Dao1 , Hamid Rezatofi ghi1, Ashish Sabharwal2, and Stefano Ermon1 1Stanford University2 Allen Institute for Artifi cial Intelligence kuck,trid,hamidrt,, Abstract Computing the per

2、manent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for fi nding an exact solu- tion that can be computed effi ciently. While the problem admits a

3、 fully polynomial randomized approximation scheme, this method has seen little use because it is both ineffi cient in practice and diffi cult to implement. We present ADAPART, a simple and effi cient method for drawing exact samples from an unnormalized distri- bution. Using ADAPART, we show how to

4、construct tight bounds on the permanent which hold with high probability, with guaranteed polynomial runtime for dense matrices. We fi nd that ADAPARTcan provide empirical speedups exceeding 30 x over prior sampling methods on matrices that are challenging for variational based approaches. Finally,

5、in the context of multi-target tracking, exact sampling from the distribution defi ned by the matrix permanent allows us to use the optimal proposal distribution during particle fi ltering. Using ADAPART, we show that this leads to improved tracking performance using an order of magnitude fewer samp

6、les. 1Introduction The permanent of a square, non-negative matrixAis a quantity with natural graph theoretic interpre- tations. IfAis interpreted as the adjacency matrix of a directed graph, the permanent corresponds to the sum of weights of its cycle covers. If the graph is bipartite, it correspond

7、s to the sum of weights of its perfect matchings. The permanent has many applications in computer science and beyond. In target tracking applications 47,37,38,40, it is used to calculate the marginal probability of measurements-target associations. In general computer science, it is widely used in g

8、raph theory and network science. The permanent also arises in statistical thermodynamics 7. Unfortunately, computing the permanent of a matrix is believed to be intractable in the worst-case, as the problem has been formally shown to be #P-complete 48. Surprisingly, a fully polynomial randomized app

9、roximation scheme (FPRAS) exists, meaning that it is theoretically possible to accurately approximate the permanent in polynomial time. However, this algorithm is not practical: it is diffi cult to implement and it scales asO(n7log4n) . Ignoring coeffi cients, this is no better than exact calculatio

10、n until matrices of size 40 x40, which takes days to compute on a modern laptop. The problems of sampling from an unnormalized distribution and calculating the distributions normalization constant (or partition function) are closely related and interreducible. An effi cient solutiontooneproblemleads

11、toaneffi cientsolutiontotheother30,28. Computingthepermanentof a matrix is a special instance of computing the partition function of an unnormalized distribution 51. In this case the distribution is overn! permutations, the matrix defi nes a weight for each permutation, and the permanent is the sum

12、of these weights. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. 1.1Contributions First, we present ADAPART, a novel method for drawing exact samples from an unnormalized distribution using any algorithm that upper bounds its partition function. We use th

13、ese samples to estimate and bound the partition function with high probability. This is a generalization of prior work 25,32 , which showed that a specifi c bound on the matrix permanent nests, or satisfi es a Matryoshka doll like property where the bound recursively fi ts within itself, for a fi xe

14、d partitioning of the state space. Our novelty lies in adaptively choosing a partitioning of the state space, which (a) is suited to the particular distribution under consideration, and (b) allows us to use any upper bound or combination of bounds on the partition function, rather than one that can

15、be proven a priori to nest according to a fi xed partitioning. Second, we provide a complete instantiation of ADAPARTfor sampling permutations with weights defi ned by a matrix, and correspondingly computing the permanent of that matrix. To this end, we identify and use an upper bound on the permane

16、nt with several desirable properties, including being computable in polynomial time and being tighter than the best known bound that provably nests. Third, we empirically demonstrate that ADAPART is both computationally effi cient and practical for approximating the permanent of a variety of matrice

17、s, both randomly generated and from real world applications. We fi nd that ADAPARTcan be over 30 x faster compared to prior work on sampling from and approximating the permanent. In the context of multi-target tracking, ADAPARTfacilitates sampling from the optimal proposal distribution during partic

18、le fi ltering, which improves multi-target tracking performance while reducing the number of samples by an order of magnitude. 2Background The permanent of ann nnon-negative matrixA is defi ned asper(A) = P ?2Sn Qn j=1A(j,?(j), wherethesumisoverallpermutations?of1,2,.,nandSndenotesthecorrespondingsy

19、mmetric group. Let us defi ne the weight function, or unnormalized probability, of a permutation asw(?) = Qn j=1A(j,?(j). The permanent can then be written as per(A) = P ?2Snw(?), which is the partition function (normalization constant) of w, also denoted Zw. We are interested in sampling from the c

20、orresponding probability distribution over permutations p(?) = w(?) P ?02Snw(?0), or more generally from any unnormalized distribution where the exact partition function is unknown. Instead, we will assume access to a function that upper bounds the partition function, for instance an upper bound on

21、the permanent. By verifying (at runtime) that this upper bound satisfi es a natural nesting property w.r.t. a partition of the permutations, we will be able to guarantee exact samples from the underlying distribution. Note that verifi cation is critical since the nesting property does not hold for u

22、pper bounds in general. In the next few sections, we will consider the general case of any non-negative weight functionw overNstates (i.e.,w : S ! R?0,|S| = N) and its partition functionZw , rather than specifi cally discussing weighted permutations of a matrix and its permanent. This is to simplify

23、 the discussion and present it in a general form. We will return to the specifi c case of the permanent later on. 2.1Nesting Bounds Huber25and Law32have noted that upper bounds on the partition function that nest can be used to draw exact samples from a distribution defi ned by an arbitrary, non-neg

24、ative weight function. For their method to work, the upper bound must nest according to some fi xed partitioningTof the weight functions state space, as formalized in Defi nition 1. In Defi nition 2, we state the properties that must hold for an upper bound to nest according to the partitioning T .

25、Defi nition 1(Partition Tree).LetS denote a fi nite state space. A partition treeTforSis a tree where each node is associated with a non-empty subset of S such that: 1. The root of T is associated with S. 2. If S = a, the tree a formed by a single node is a partition tree for S. 3.Letv1, ,vkbe the c

26、hildren of the root node ofT, andS1, ,Skbe their associated subsets ofS.Tis a partition tree ifSi,Sjare pairwise disjoint,iSi= S, and for each the subtree rooted at vis a partition tree for S. 2 Defi nition 2(Nesting Bounds).Letw : S ! R?0be a non-negative weight function with partition functionZw.

27、LetTbe a partition tree forSand letSTbe the set containing the subsets ofS associated with each node inT. The functionZUB w (S) : ST! R?0is a nesting upper bound forZw with respect to T if: 1. The bound is tight for all single element sets: ZUB w (i) = w(i) for all i 2 S.1 2.The bound nests at every

28、 internal nodevinT. LetSbe the subset ofSassociated withv. LetS1, ,Skbe the subsets associated with the children ofvinT. Then the bound nests at v if: k X =1 ZUB w (S) ZUB w (S).(1) 2.2Rejection Sampling with a Fixed Partition Setting aside the practical diffi culty of fi nding such a bound and part

29、ition, suppose we are given a fi xed partition treeTand a guarantee thatZUB w nests according toT. Under these conditions, Law 32proposed a rejection sampling method to perfectly sample an element,i w(i) P j2Sw(j), from the normalized weight function (see Algorithm A.1 in the Appendix). Algorithm A.

30、1 takes the form of a rejection sampler whose proposal distribution matches the true distribution preciselyexcept for the addition of slack elements with joint probability mass equal toZUB w (S) ? Zw. The algorithm recursively samples a partition of the state space until the sampled partition contai

31、ns a single element or slack is sampled. Samples of slack are rejected and the procedure is repeated until a valid single element is returned. According to Proposition A.1 (see Appendix), Algorithm A.1 yields exact samples from the desired target distribution. Since it performs rejection sampling us

32、ingZUB w (S)to construct a proposal, its effi ciency depends on how close the proposal distribution is to the target distribution. In our case, this is governed by two factors: (a) the tightness of the (nesting) upper bound,ZUB w (S), and (b) the tree Tused to partition the state space (in particula

33、r, it is desirable for every node in the tree to have a small number of children). In what follows, we show how to substantially improve upon Algorithm A.1 by utilizing tighter bounds (even if they dont nest a priori) and iteratively checking for the nesting condition at runtime until it holds. 3Ada

34、ptive Partitioning A key limitation of using the approach in Algorithm A.1 is that it is painstaking to prove a priori that an upper bound nests for a yet unknown weight function with respect to a complete, fi xed partition tree. Indeed, a key contribution of prior work 25,32 has been to provide a p

35、roof that a particular upper bound nests for any weight functionw : 1,.,N ! R?0 according to a fi xed partition tree whose nodes all have a small number of children. In contrast, we observe that it is nearly trivial to empirically verify a posteriori whether an upper bound respects the nesting prope

36、rty for a particular weight function for a particular partition of a state space; that is, whether the condition in Eq. (1) holds for a particular choice ofS,S1, ,Sk andZUB w . This corresponds to checking whether the nesting property holds at an individual node of a partition tree. If it doesnt, we

37、 can refi ne the partition and repeat the empirical check. We are guaranteed to succeed if we repeat until the partition contains only single elements, but empirically fi nd that the check succeeds after a single call to Refi ne for the upper bound we use. The use of this adaptive partitioning strat

38、egy provides two notable advantages: (a) it frees us to choose any upper bounding method, rather than one that can be proven to nest according to a fi xed partition tree; and (b) we can customizeand indeed optimizeour partitioning strategy on a per weight function basis. Together, this leads to sign

39、ifi cant effi ciency gains relative to Algorithm A.1. 1 This requirement can be relaxed by defi ning a new upper bounding function that returnsw(i)for single element sets and the upper bound which violated this condition for multi-element sets. 3 Algorithm 1 ADAPART: Sample from a Weight Function us

40、ing Adaptive Partitioning Inputs: 1. Non-empty state space, S 2. Unnormalized weight function, w : S ! R?0 3. Family of upper bounds for w, ZUB w (S) : D 2S! R?0 4. Refi nement function, Refi ne : P ! 2P, where P is the set of all partitions of S Output: A sample i 2 S distributed as i w(i) P i2Sw(i

41、). if S = a then Return a P = S ; ub ZUB w (S) repeat Choose a subset S 2 P to refi ne: Si1, ,Si i K i=1 Refi ne(S) for all i 2 1, ,K do ubi Pi j=1Z UB w (Sij) j argminiubi; P (P S) Sj 1, ,S j j ; ub ub ? Z UB w (S) + ubj until ub ZUB w (S) Sample a subset Si2 P with prob. ZUB w (Si) ZUB w (S) , or

42、sample slack with prob. 1 ? ub ZUB w (S) if Sm2 P is sampled then Recursively call ADAPART(Sm,w,ZUB w ,Refi ne) else Reject slack and restart with the call to ADAPART(S,w,ZUB w ,Refi ne) Algorithm 1 describes our proposed method, ADAPART. It formalizes the adaptive, iterative parti- tioning strategy

43、 and also specifi es how the partition tree can be created on-the-fl y during sampling without instantiating unnecessary pieces. In contrast to Algorithm A.1, ADAPARTdoes not take a fi xed partition treeTas input. Further, it operates with any (not necessarily nesting) upper bounding method for (sub

44、sets of) the state space of interest. Figure 1 illustrates the difference between our adaptive partitioning strategy and a fi xed partitioning strategy. We represent the entire state space as a 2 dimensional square. The left square in Figure 1 illustrates a fi xed partition strategy, as used by 32 .

45、 Regardless of the specifi c weight function defi ned over the square, the square is always partitioned with alternating horizontal and vertical splits. To use this fi xed partitioning, an upper bound must be proven to nest for any weight function. In contrast, our adaptive partitioning strategy is

46、illustrated by the right square in Figure 1, where we choose horizontal or vertical splits based on the particular weight function. Note that slack is not shown and that the fi gure illustrates the complete partition trees. Fixed vs. Adaptive Partitioning Figure 1: Binary partitioning of a square in

47、 the order black, blue, orange, green. Left: each subspace is split in half ac- cording to a predefi ned partitioning strat- egy, alternating vertical and horizontal splits. Right: each subspace is split in half, but the method of splitting (vertical or horizontal) is chosen adaptively with no prede

48、fi ned order. This fi gure repre- sents tight upper bounds without slack. ADAPARTuses a function Refi ne, which takes as input a subsetSof the state spaceS, and outputs a collection ofK ? 1different ways of partitioningS. We then use a heuristic to decide which one of theseKpartitions to keep. In Fi

49、gure 1, Refi netakes a rectangle as input and outputs 2 partitionings, the fi rst splitting the rectangle in half horizontally and the second splitting it in half vertically. ADAPARTworks as follows. Given a non-negative weight functionwfor a state spaceS, we start with the trivial partitionPcontain

50、ing only one subsetall ofS. We then call Refi neonS, which givesK ? 1possible partitions ofS. For each of theKpossible partitions, we sum the upper bounds on each subset in the partition, denoting this sum asubifor the thei-th partition. At this point, we perform a local optimization step and choose

51、 the partitionjwith the tightest (i.e., smallest) upper 4 bound,ubj. The rest of theK ? 1options for partitioningSare discarded at this point. The partition P is refi ned by replacing S with the disjoint subsets forming the j-th partition of S. This process is repeated recursively, by calling Refi n

52、eon another subsetS 2 P, until the sum of upper bounds on all subsets inPis less than the upper bound onS. We now have a valid nesting partitionPofSand can perform rejection sampling. Similar to Algorithm A.1, we draw a random sample fromP slack, where eachSi2 Pis chosen with probability ZUB w (Si)

53、ZUB w (S), and slackis chosen with the remaining probability. If subsetSm2 Pis sampled, we recursively call ADAPART onSm. Ifslackis selected, we discard the computation and restart the entire process. The process stops when Smis a singleton set a, in which case a is output as the sample. ADAPARTcan

54、be seen as using a greedy approach for optimizing over possible partition trees of Sw.r.t.ZUB w . At every node, we partition in a way that minimizes the immediate or “l(fā)ocal” slack (among theKpossible partitioning options). This approach may be sub-optimal due to its greedy nature, but we found it t

55、o be effi cient and empirically effective. The effi ciency of ADAPARTcan be improved further by tightening upper bounds whenever slack is encountered, resulting in an adaptive2 rejection sampler 19 (please refer to Section A.2 in the Appendix for further details). 3.1Estimating the Partition Functio

56、n Armed with a method, ADAPART , for drawing exact samples from a distribution defi ned by a non- negative weight functionwwhose partition functionZwis unknown, we now outline a simple method for using these samples to estimate the partition functionZw. The acceptance probability of the rejection sa

57、mpler embedded in ADAPARTcan be estimated as p = accepted samples total samples p = Zw ZUB (2) which yields pZUBas an unbiased estimator ofZw. The number of accepted samples out ofTtotal samples is distributed as a Binomial random variable with parameterp = Zw ZUB. The ClopperPearson method 16 gives

58、 tight, high probability bounds on the true acceptance probability, which in turn gives us high probability bounds onZw. Please refer to Section A.3 in the Appendix for the unbiased estimator of Zwwhen performing bound tightening as in an adaptive rejection sampler. 4Adaptive Partitioning for the Pe

59、rmanent In order to use ADAPARTfor approximating the permanent of a non-negative matrixA, we need to specify two pieces: (a) the Refi nemethod for partitioning any given subsetSof the permutations defi ned byA, and (b) a function that upper bounds the permanent ofA, as well as any subset of the state space (of permutations) generated by Refi ne. 4.1 Refi ne for Permutation Partitioning We implement the Refi nemethod for

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論