數(shù)據(jù)挖掘_Wikilens Data Set(WikiLens數(shù)據(jù)集)_第1頁
數(shù)據(jù)挖掘_Wikilens Data Set(WikiLens數(shù)據(jù)集)_第2頁
數(shù)據(jù)挖掘_Wikilens Data Set(WikiLens數(shù)據(jù)集)_第3頁
數(shù)據(jù)挖掘_Wikilens Data Set(WikiLens數(shù)據(jù)集)_第4頁
數(shù)據(jù)挖掘_Wikilens Data Set(WikiLens數(shù)據(jù)集)_第5頁
已閱讀5頁,還剩1頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領

文檔簡介

1、Wikilens Data Set(WikiLens數(shù)據(jù)集)數(shù)據(jù)摘要:WikiLens was a generalized collaborative recommender system that allowed its community to define item types (e.g. beer) and categories (e.g. microbrews, pale ales, stouts), and then rate and get recommendations for items.It was taken offline in 2009 due to lack of

2、system maintenance and support.This data set was extracted in February 2008.中文關鍵詞:WikiLens,推薦系統(tǒng),項目類型,類別,英文關鍵詞:WikiLens,recommender system,item types,categories,數(shù)據(jù)格式:TEXT數(shù)據(jù)用途:Information ProcessingClassification數(shù)據(jù)詳細介紹:Wikilens Data SetsWikiLens was a generalized collaborative recommender system that

3、allowed itscommunity to define item types (e.g. beer) and categories (e.g. microbrews, pale ales, stouts), and then rate and get recommendations for items.It was taken offline in 2009 due to lack of system maintenance and support.This directory contains a dump.txt.gz for this WikiLens instance.This

4、file is a gzip-ed output of a mysqldump command with the 'latin1'charset, after suitable erasing of private data.The intent is for this dump to have all data you could get byspidering the site.The easiest way to see the data is to install MySQL (see create a database, and loadthis dump file,

5、 e.g.zcat dump.txt.gz | mysql -uuser -ppassword -Dmy_databaseOtherwise, the text dump is human-readable, so it is possible to write tools to parse it. I wouldn't recommend it.The dump has the following tables:category - Map items to categorieschefmoz - Cache of Chefmoz import data for the Restau

6、rant category. This is EMPTY because otherwise it is very large.link - Cache of wiki page linkslogging - Log of actions taken on the wiki. This is EMPTY for privacy. member - EMPTY.nonempty - Cache of page ids of pages that have some content.page - Page data.page_urn - Mapping of pages to URNs for r

7、atings.pref - User preferences. This is EMPTY for privacy.rating - Ratings of URNs (often pages, mapped through page_urn). NOTE: rateepage is a URN id, not a page id.recent - Cache of page ids of pages recently changed.session - Cache of user sessions for the wiki. This is EMPTY for privacy. urn - U

8、RN (Universal Resource Identifier) ids.user - EMPTY.version - Page data for every version of a page.These tables are mostly the same as PhpWiki 1.3.9 (see). The new tables are category,page_urn, rating, and logging.* WARNING *The easiest mistake to make while looking at the data is to join the ratee

9、page field of the rating table and the id table of page.rateepage is a page id, right? NOT SO. The rating.rateepage field is actually the id of a URN, NOT a page. That field name has not been changed to something reflecting URN simply due to lack of time to do it correctly (including database migrat

10、ion upgrades).Look carefully at the example queries below to see how to use the various fields.* WARNING * Example queriesThe words "item" and "ratee" (the object of a rating action) are used synonymously below. Similarly for "user" and "rater".Here are some e

11、xample queries to get data:1. Select all ratings. Columns are- Ratee (item) page id- Ratee (item) page name (truncated to 25 characters)- Rater page id- Rater page name- URN id- Rating value- Rating timestampselect p.id, left(p.pagename, 25), r.raterpage,rp.pagename as rater, r.rateepage, r.ratingva

12、lue as rat, r.tstamp from page p, page rp, rating r, page_urn pu, urn uwhere pu.pagename = p.pagename and pu.urn = u.urn and r.raterpage = rp.idand r.rateepage = u.idorder by p.pagename2. Select all ratings of an item called "Book_Foo"select p.id, left(p.pagename, 25), r.raterpage,rp.pagen

13、ame as rater, r.rateepage, r.ratingvalue as rat from page p, page rp, rating r, page_urn pu, urn uwhere pu.pagename = p.pagename and pu.urn = u.urn and r.raterpage = rp.idand r.rateepage = u.id and p.pagename like 'Book_Foo' order by p.pagename3. Select all ratings of a user called "Use

14、r_Bar"select p.id, left(p.pagename, 25), r.raterpage,rp.pagename as rater, r.rateepage, r.ratingvalue as rat from page p, page rp, rating r, page_urn pu, urn uwhere pu.pagename = p.pagename and pu.urn = u.urn and r.raterpage = rp.idand r.rateepage = u.id and rp.pagename like 'User_Bar'

15、order by rp.pagename4. Select number of things in the "Book" category:select count(*) cnt from category c, page pwhere c.category = p.id and p.pagename = 'Book'5. The number of items in any categoryselect count(*) from category6. The number of usersselect count(*) cntfrom category

16、c, page pwhere c.category = p.id and p.pagename = 'User'7. The number of ratingsselect count(*) from rating8. The number of ratings per monthselect left(tstamp, 6) as yearmonth, count(*) from ratinggroup by yearmonthorder by yearmonth asc9. Pages per category for every categoryselect pagename, count(*) cntfrom category c, page pwhere c.category = p.idgroup by categoryorder by cnt desc10. Ratings per category for every categoryselect left(cp.pagename, 30) cat, cou

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論