SIFT的講解.ppt_第1頁
SIFT的講解.ppt_第2頁
SIFT的講解.ppt_第3頁
SIFT的講解.ppt_第4頁
SIFT的講解.ppt_第5頁
已閱讀5頁,還剩54頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Contents,Overall picture Region detectors Scale invariant detection Localization Orientation assignment Region description SIFT approach Local Jet Storing and matching State of the art - Video Google,Our goal,Detecting repeatable image regions Obtaining reliable and distinctive descriptors Searching

2、 an image database for an object efficiently,?,Invariance vital,Scale Rotation Orientation Illumination Noise Affine,Region detectors,Harris points - Invariant to rotation Two significant eigenvalues indicate an interest point Harris-Laplace Invariant to rotation and scale Uses Laplacian of Gaussian

3、 operator SIFT - Scale space extrema using Difference of Gaussian,Scale Invariant Detection,Consider regions (e.g. circles) of different sizes around a point Regions of corresponding sizes will look the same in both images,Scale Invariant Detection,The problem: how do we choose corresponding circles

4、 independently in each image?,Scale Invariant Detection,A “good” function for scale detection has one stable sharp peak,Scale-space,Definition: where Keypoints are detected using scale-space extrema in difference-of-Gaussian function D Efficient to compute Close approximation to scale-normalized Lap

5、lacian of Gaussian,Image space to scale space,k4 k3 k2 k ,Relationship of D to,Diffusion equation: Approximate G/: giving, Therefore, When D has scales differing by a constant factor it already incorporates the 2 scale normalization required for scale-invariance,Local extrema detection,Find maxima a

6、nd minima in scale space,Frequency of sampling in scale,Lowe prefers to use 3 scale samples per octave,Localization,3D quadratic function is fit to the local sample points Start with Taylor expansion with sample point as the origin where Take the derivative with respect to X, and set it to 0, giving

7、 is the location of the keypoint This is a 3x3 linear system,Localization,Derivatives approximated by finite differences, example: If X is 0.5 in any dimension, process repeated,Being picky!,Contrast (use prev. equation): If | D(X) | 0.03, throw it out Edge-iness: Use ratio of principal curvatures t

8、o throw out poorly defined peaks Curvatures come from Hessian: No need to explicitly calculate eigenvalues. We only need their ratio!,Orientation assignment,Descriptor computed relative to keypoints orientation achieves rotation invariance Precomputed along with mag. for all levels (useful in descri

9、ptor computation) Multiple orientations assigned to keypoints from an orientation histogram Significantly improve stability of matching,Choosing the right image descriptors,Distribution-Based Luminance based approaches Histograms of pixel intensities and location SIFT Based on gradient distribution

10、in the region Geometric based approaches Shape context Spatial-Frequency Techniques Fourier transform based No spatial information Gabor filters and wavelets Large number of filters,Choosing the right image descriptors,Differential descriptors Local Jets - Set of image derivatives Steerable filters

11、steering derivatives in the direction of the gradient Miscellaneous Using generalized moment invariants characterize shape and intensity distribution,Who wants to be a Millionaire?,a. Local intensity histogram,Which is the most popularly used image descriptor ?,c. Local Jets,d. Fourier transform bas

12、ed,b. SIFT,Local image descriptor - SIFT,Local image descriptor - SIFT,Weight magnitude of each sample point by Gaussian weighting function, =0.5*width Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects) Allows for significant shift in gradient positions,Illu

13、mination invariance for SIFT,Affine changes Normalizing vector to unit length accounts for overall brightness change Non-linear changes Occur due to camera saturation / viewpoint changes Thresholding values in the unit feature vector to 0.2 Re-normalizing Less importance to large gradients More impo

14、rtance to distribution of orientations,Width of SIFT desriptor,2 parameters to be obtained Number of orientations in histogram Size of histogram array Optimal size obtained experimentally,Stability as a function of affine distortion,The approach is not truly affine invariant Initial features located

15、 in a non-affine manner,Local Image Descriptor Local Jet,Image in a neighborhood of a point can be described by the set of its derivatives Local jet of order N at a point x = (x1,x2) is defined using convolution of image I with the Gaussian derivatives Complete set of invariants is computed that loc

16、ally characterizes the signal By stacking invariants in a vector,Local Image Descriptor Multiscale approach,Vector of invariants are calculated at different scales Half-octave quantization is used Difference between consecutive sizes 20% varies between 0.48 to 2.07,Longer vectors decrease the probab

17、ility of repeatability Global features are sensitive to extraneous features or partial visibility Solution For each interest point, select p nearest features For matching, the constraint of angle between line joining neighboring points is added Assumption that 50% of the points will match using thes

18、e semi-local constraints,Local Image Descriptor Semilocal Constraints,Semilocal Constraints,Comparison of SIFT and Local Grayvalue Invariants,Storing,A set of keypoints are obtained from each reference image Each such keypoint has a graphical descriptor which is a 128 components vector (4*4*8) All s

19、uch (keypoint, vector) pairs corresponding to a set of reference images are stored in a set,Matching,Test image gives a new set of (keypoint, vector) pair For each such pair, we find the nearest (top 2) descriptors in our database set,Acceptance of a match,Match accepted IF Ratio of distance to firs

20、t nearest descriptor to that of second threshold,Complexity,Initial complexity: Number of features in the query image * total number of features in the database Reason: Because each keypoint(feature) is to be matched with all the features in the database to give the best two matches Solution: k-d Tr

21、ees!,Storage using k-d trees,The set is stored using a k-d tree (in both Schmidt Mohr and Lowe techniques),K-d Trees,The elements are stored in the leaves. The other nodes are divisions of the space in some dimension. Fixed size one-dimensional buckets are used Each dimension is accessed sequentiall

22、y Depth of the tree is at most the number of dimensions of stored vectors,New complexity!,Number of features of the query image,Which is the most popularly used image descriptor ?,c. Local Jets,b. SIFT,Update and demo,STATE OF THE ART,Video Google NOT ! A text retrieval approach to object matching i

23、n videos Josef Sivic and Andrew Zisserman,Text retrieval overview,Documents are parsed into words Common words are ignored (the, an, etc) This is called stop list Words are represented by their stems walk, walking, walks walk Each word is assigned a unique identifier The vocabulary contains K words

24、Each document is represented by a K components vector of words frequencies,Parse and clean,“ Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object. categories .” Representation, detection and learning are the main issue

25、s that need to be tackled in designing a visual system for recognizing object categories Represent detect learn main issue need tackle design visual system recognize object category,Creating the database,Inverted file - Index,Creating a document vector ID,Querying,Parsing the query to create query v

26、ector Query: “Representation learning” Query Doc ID = (1,0,1,0,0,) Retrieve all documents ID containing one of the Query words ID (Using the invert file index) Calculate the distance between the query and document vectors (angle between vectors) Rank the results,Using the text search as an analogy,B

27、asic idea: Build a visual vocabulary based on a large set of images. Given a query image, search through the database in a manner similar to the text search.,Again . Detection and Description,Detection finding invariant regions Description using the SIFT descriptor,Building the “Visual Stems”,Cluste

28、r descriptors into K groups using K-mean clustering algorithm Each cluster represent a “visual word” in the “visual vocabulary” Result: Between 10000 and 20000 clusters used,Example clusters,Visual “Stop List”,The most frequent visual words that occur in almost all images are suppressed,After stop l

29、ist ,Before stop list,Ranking Frames,Distance between vectors (Like in words/Document) Spatial consistency (= Word order in the text),The Visual Analogy,Document,Frame,Descriptor,Word,Text,Visual,Query,Example searches,Object query http:/www.robots.ox.ac.uk/vgg/research/vgoogle/how/results/bolle/bol

30、le.html http:/www.robots.ox.ac.uk/vgg/research/vgoogle/how/results/poster/poster.html Scene Query http:/www.robots.ox.ac.uk/vgg/research/vgoogle/how/examples/example_scene.html,Open issues,Automatic ways for building the vocabulary are needed Ranking of retrieval results method as Google does Extension to non rigid objects, like faces Using this method for higher level analysis of movies,References,David G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論