版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領
文檔簡介
1、外文資料:information management systemwiliam k.thomson u.s.aabstract:an information storage, searching and retrieval system for large (gigabytes) domains of archived textual dam. the system includes multiple query generation processes, a search process, and a presentation of search results that is sorte
2、d by category or type and that may be customized based on the professional discipline(or analogous personal characteristic of the user), thereby reducing the amount of time and cost required to retrieve relevant results.keyword:information management retrieval system object-orientedl.intruductionthi
3、s invention relates to an information storage, searching and retrieval system that incorporates a novel organization for presentation of search results from large (gigabytes) domains of archived textual data.2.backgroudn of the inventiononline information retrieval systems are utilized for searching
4、 and retrieving many kinds of information. most systems used today work in essentially the same manner; that is, users log on (through a computer terminal or personal microcomputer, and typically from a remote location), select a source of information (ie, a particular database) which is usually som
5、ething less than the complete domain, formulate a query, launch the search, and then review the search results displayed on the terminal or microcomputer, typically with documents (or summaries of documents) displayed in reverse chronological order. this process must be repeated each time another so
6、urce (database) or group of sources is selected (which is frequently necessary in order to insure all relevant documents have been found).additionally, this process places on the user the burden of organizing and assimilating the multiple results generated from the launch of the same query in each o
7、f the multiple sources (databases) that the user needs (or wants) to search. present systems that allow searching of large domains require persons seeking information in these domains to attempt to modify their queries to reduce the search results to a size that the user can assimilate by browsing t
8、hrough them (thus, potentially eliminating relevant results).in many cases end users have been forced to use an intermediary (i.e., a professional searcher) because the current collections of sources are both complex and extensive, and effective search strategies often vary significantly from one so
9、urce to another. even with such guidance, potential relevant answers are missed because all potentially relevant databases or information sources are not searched on every query. much effort has been expended on refining and improving source selection by grouping sources or database files together.
10、significant effort has also been expended on query formulation through the use of knowledge bases and natural language processing. however, as the groupings of sources become larger, and the responses to more comprehensive search queries become more complete, the person seeking information is often
11、faced with the daunting task of sifting through large unorganized answer sets in an attempt to find the most relevant documents or information.3-summary of the inventionthe invention provides an information storage, searching and retrieval system for a large domain of archived data of various types,
12、 in which the results of a search are organized into discrete types of documents and groups of document types so that users may easily identify relevant information more efficiently and more conveniently than systems currently in use. the system of the invention includes means for storing a large do
13、main of data contained in multiple source records, at least some of the source records being comprised of individual documents of multiple document types; means for searching substantially all of the domain with a single search query to identify documents responsive to the query; and means for categ
14、orizing documents responsive to the query based on document type, including means for generating a summary of the number of documents responsive to the query which fall within various predetermined categories of document types.the query generation process may contain a knowledge base including a the
15、saurus that has predetermined and embedded complex search queries, or use natural language processing, or fuzzy logic, or tree structures, or hierarchical relationship or a set of commands that allow persons seeking information to formulate their queries.the search process can utilize any index and
16、search engine techniques including boolean, vector, and probabilistic as long as a substantial portion of the entire domain of archived textual data is searched for each query and all documents found are returned to the organizing process.the sorting/categorization process prepares the search result
17、s for presentation by assembling the various document types retrieved by the search engine and then arranging these basic document types into sometimes broader categories that are readily understood by and relevant to the user.the search results are then presented to the user and arranged by categor
18、y along with an indication as to the number of relevant documents found in each category. the user may then examine search results in multiple formats, allowing the user to view as much of the document as the user deems necessary.4.brief description of the drawingsfig. 1 is a block diagram illustrat
19、ing an information retrieval system of the invention;fig. 2 is a diagram illustrating a query formulation and search process utilized in the invention;figue 3user.user.efnursaidsaidsu0 selectsur< selects concept to seirctcse 1conceptto seirctcse 1fxtcuttd by $mf(h proc«$ngfxtcuttd by $mf(h p
20、roc«$ngrauitt as presented 壞 c*tegory:ust hxmat ts pretecneduser selects c*w»v vi ilsvsekxu addtmnai ftxmau (otmaiiadbtfc shokl rmxamwgeda cdmuftf) r.2m 1”4akubw"2 jj0f mi*fig. 3 is a diagram illustrating a sorting process for organizing and presenting search results. pt 9 j】1iiibbl.沖
21、jif1jj11cmi4hiiii.eeujoasmdu 一25.best mode for carrying out the inventionas is illustrated in the block diagram of fig. 1 、 the information retrieval system of the invention includes an input/output process ,a query generation process, a search process that involves a large domain of textual data (t
22、ypically in the multiple gigabyte range), an organizing process, presentation of the information to the user, and a process to identify and characterize the types of documents contained in the large domain of data.turning now to fig. 2, the query generation process preferably includes a knowledge ba
23、se containing a thesaurus and a note pad, and preferably utilizes embedded predefined complex boolean strategies. such a system allows the user to enter their description of the information needed using simple words/phrases made up of nnaturalh language and to rely on the system to assist in generat
24、ing the full search query, which would include, e.g.,. synonyms and alternate phraseology. the user can then request, by a command such as "vi co r to view the complete document selected from the list, giving, in this case, complete information about the identity and credentials of the expert.f
25、ig. 3 illustrates how five typical sources of information (i.e., source records) can be sorted into many document types and then subsequently into categories. for example, a typical trade magazine may contain several types of information such as editorials, regular columns, feature articles, news, p
26、roduct announcements, and a calendar of events. th叫 the trade magazine (ie, the source record) may be sorted into these various document types, and these document types in turn may be categorized or grouped into categories contained in one or more sets of categories; each document type typically wil
27、l be sorted into one category within a set of categories, but the individual categories within each set will vary from one set to another. for example, one set of categories may be established for a first characteristic type of user, and a different set of categories may be established for a second
28、characteristic type of user. when a user corresponding to type #1 executes a search, the system automatically utilizes the categories of set #1, corresponding to that particular type of user, in organizing the results of the search for review by the user. when a user from type #2 executes a search,
29、however, the system automatically utilizes the categories of set #2 in presenting the search results to the user.the information storage, searching and retrieval system of the invention resolves the common difficulties in typical on-line information retrieval systems that operate on large (e.g., 2 g
30、igabytes or more) domains of textual data, query generation, source selection, and organizing search results. the information base with the thesaurus and embedded search strategies allows users to generate expert search queries in their own unaturaln language. source (i.e., database) selection is no
31、t an issue because the search engines are capable of searching substantially the entire domain on every query. moreover, the unique presentation of search results by category set substantially reduces the time and cost of performing repetitive searches in multiple databases and therefore of efficien
32、tly retrieving relevant search results.while a preferred embodiment of the present invention has been described, it should be understood that various changes, adaptations and modifications may be made therein without departing from the spirit of the invention and the scope of the appended claims.屮文譯
33、文:信息管理系統(tǒng)wiliam k.thomson u.s.a摘要:一個信息存儲,查詢和檢索系統(tǒng)主要應用于大(千兆字節(jié))的需要存檔的文字領域。該 系統(tǒng)包括多個查詢產(chǎn)生過程和一個搜索過程。而查詢的結(jié)果i般是按類別和類型進行排序 的,檢索字段是由個人決定的,在查詢的過程中,可能基于這個搜索結(jié)果查看到多個相關 的信息(或類似的用戶個人特點介紹),從而減少了搜索結(jié)果是所需的吋間和費用。 關鍵詞:信息管理;檢索系統(tǒng);面向?qū)ο?. 簡介信息的存儲,查詢和檢索系統(tǒng),主要應用原文檔數(shù)據(jù)比較大的文檔,利用搜索條件和 索引字段可以快速查詢結(jié)果。2. 開發(fā)背景網(wǎng)上查詢系統(tǒng)主要用于查詢和檢索在線的各種各樣的信息。今天所
34、使用的多數(shù)系統(tǒng)實 際上采用的是同一方式。也就是說,用戶登錄(通過計算機終端或個人微機,或者是遠程 登錄),選擇一個信息源(比如一個特定的數(shù)據(jù)庫),通常是一些不完整的檢索條件,開始 查詢,啟動搜索,然后查詢結(jié)果將顯示在計算機終端或個人微機上,且查詢結(jié)果一般按照 時間的順序顯示。在查詢過程中,會不斷的重復查詢每一個數(shù)據(jù)來源或一組數(shù)據(jù)源,為了 確保搜索出所有相關的文件,這個重復是非常必要的。另外,這個查詢過程也給用戶帶來 一定的負擔,他要根據(jù)從同一個數(shù)據(jù)源查詢出的多個結(jié)果,進行歸納和總結(jié)。而目前的系 統(tǒng)可以搜尋大的數(shù)據(jù),在這過程中要求人們尋求信息或試圖修改他們的查詢條件,以減少 不必要的搜索結(jié)果(消
35、滅潛在的相關結(jié)果),使用戶查詢到真正要查的數(shù)據(jù)。在許多情況 下,用戶被迫使用中介(例如專業(yè)的搜索引擎),因為當前收藏的來源是復雜和廣泛的, 并且有效的搜索策略經(jīng)常從一個數(shù)據(jù)來源變化到另一個。即使你按照這樣操作,也有可能 錯過相關的答案,因為所有可能相關的數(shù)據(jù)庫或信息來源并不在每一次搜索查詢中。所以 就要付出很大的努力改善和提高數(shù)據(jù)源的選擇,更大的努力在操作查詢時所制定的數(shù)據(jù)庫 語言。然而,當面對變得更大來源分組或需要更加全面的查詢結(jié)果時,這個問題就更加明 顯,人們尋找的信息經(jīng)常面對大量未組織的結(jié)果集合,這樣就需要增加過濾查詢的重要任 務。3. 系統(tǒng)概要該系統(tǒng)主要應用于對大量數(shù)據(jù)進行信息存儲,查
36、詢和檢索,查詢的結(jié)果將被導出成文 件類型,比fi前的系統(tǒng)更方面,容易的找到用戶想要查詢的有關數(shù)據(jù)。該系統(tǒng)不僅包括存 儲廣泛數(shù)據(jù)領域的復合數(shù)據(jù)源記錄,還包括多個文件類型的某些原始記錄。該方式提供了搜索大數(shù)據(jù)領域所進行的一次唯一辨認文件的重要查詢部分;還提供了文件重要部分的查 詢,以及包括對文件數(shù)量的統(tǒng)計和屬于各種各樣的預先確定類別的文件查詢。查詢創(chuàng)建過程包含一個知識庫,該知識庫包括被預先確定和嵌入復雜查詢的分類詞典, 或者是自然語言的處理,或者模糊邏輯,或者樹型結(jié)構(gòu),或者等級關系,或者是一套尋求 信息的公式化查詢命令。搜索的過程可能利用到所有的索引和搜索引擎技術(shù),包括布爾,傳播媒介,機率查詢。
37、只要每次查詢到一個原文歸檔數(shù)據(jù)的固有部分,所有建立的文檔就能返回到其組織過程。排序或分類的過程是通過調(diào)用搜索引擎檢索查詢的結(jié)果,從而為引入各種各樣的基本 文件類里做準備,然后組織安排這些容易被理解且與用戶密切相關的基本文件類型。然后 提供給相對于用戶相關查詢的結(jié)果與在該查詢結(jié)果中的每個類別相關文檔數(shù)量的統(tǒng)計。用 戶可以以多種形式來檢查查詢的結(jié)果,并且用戶可以根據(jù)自己的需要來查看相關的文件。4. 圖例簡要說明圖1是信息查詢系統(tǒng)總流程圖;oxene xtcjrcr tm< wtnlumr wtecti(rc mor 7 x0eftpctsotqtion to umr a category s
38、etoiimy gtrtribon proctwsearchstarchprocessinqco<npl«xprocessinqco<npl«xresets search orgatd iftfo groups of documtm typeslaqe domain (exwmdita stored mtoctronic formpi ocass s kum wfy4oc4jmc<xemimcsxtru圖2是系統(tǒng)制定查詢和搜索過程圖;屯§oe uvea fiisuwqm<00 iau13ww3 <qsw5u5 0h$15 -n8038
39、3總resold右吾ebps gsmuj 55-asn-asn一 一 一 一document typng processimcumo o)mdu|*wrj wmt屮1 ff 兀wj*nlf y叫- *nwrt i .*二*mvw«li 恤f>«<s.*-*w tm>»w*yknrg m i 4s.ztttdxiiumunxooxsqa proemnr咪壻二、昭昵礫一?f®h>s屈劃毆"羽5. 該系統(tǒng)的最佳模式正如圖1所說明的那樣,信息檢索系統(tǒng)的開發(fā)包括一個輸入、輸出過程,一個查詢創(chuàng) 建過程,一個大量數(shù)據(jù)范圍的查詢過程(典
40、型地在多個千兆字節(jié)范圍),一個用戶信息的組 織過程,以及一個辨認和描繪在大數(shù)據(jù)領域中文件的類型。如圖2,查詢生成過程包括分類詞詞典和筆記的一個知識庫和運用嵌入被定定義的復 雜戰(zhàn)略。這樣系統(tǒng)允許用戶輸入簡單的詞或詞組,并且需要的他們的信息的描述由“自然” 語言組成和依靠系統(tǒng)協(xié)助引起充分的查詢,將包括同義詞和供選擇文詞。用戶發(fā)出一個命 令然后請求,例如“vi co 1”,查驗從名單挑選的完全文件,在這種情況下,給關于身 分專家的完全信息和證件。圖3說明了五種一般的信息源(即原始記錄)可以被寫入多數(shù)類型的文檔,隨后被寫入 類。例如,一本典型的商業(yè)雜志也許包含信息的兒個類型,例如社論、規(guī)則專欄、特寫、
41、 新聞、產(chǎn)品公告和事件日歷。因此,商業(yè)雜志(即原始記錄)也許被排序入各種各樣的文件 類型和這些文件類型也許反過來被分類或被編組入一個或更多套包含的類別,每個文件 類型在一套將典型地被排序入一個類別之內(nèi),但各自的類別在每個集合之內(nèi)從一個集合將 變化到另一個。例如,一套類別為用戶的第一個典型類型建立,并且不同的套類別也許為 用戶的第二個典型類型建立。當對應類型#1的用戶執(zhí)行一次查詢時,系統(tǒng)為回顧自動地運 用集合#1類別,對應于用戶的那個特殊類型,在由用戶組織查詢的結(jié)果。當一名用戶從類 型#2執(zhí)行一次查詢時,系統(tǒng)提出查詢結(jié)果自動地運用集合#2類別對用戶。信息存儲、搜索和檢索系統(tǒng)的開發(fā)解決了原文數(shù)據(jù)、
42、查詢方案、資源選擇和組織查詢 結(jié)果等大容量數(shù)據(jù)范圍(即二十億字節(jié)或更多)的在線信息檢索系統(tǒng)的基本難題?;诜?類詞典和嵌入搜索策略的信息庫,允許用戶使用“自然”語言來進行專業(yè)的信息查詢。數(shù) 據(jù)來源(如數(shù)據(jù)庫)的選擇已不再是個問題,因為搜索引擎能夠在每次搜索時可以搜索到整 個數(shù)據(jù)域。查詢結(jié)果的獨特類設置介紹不但極大地減少了反復查詢多個數(shù)據(jù)庫所付出的時 間和費用,并且可以做到高效率檢索相關的查詢結(jié)果。當現(xiàn)有開發(fā)系統(tǒng)被具體化描述時,應該不能摒棄該開發(fā)系統(tǒng)的精髓和附加規(guī)范,便可 以了解到所開發(fā)的系統(tǒng)中各式各樣的變化、適應和改動。五分鐘搞定5000字畢業(yè)論文外文翻譯,你想要的工具都在這里!在科研過程中閱
43、讀翻譯外文文獻是一個非常重要的環(huán)節(jié),許多領 域高水平的文獻都是外文文獻,借鑒一些外文文獻翻譯的經(jīng)驗是非常 必要的。由于特殊原因我翻譯外文文獻的機會比較多,慢慢地就發(fā)現(xiàn) 了外文文獻翻譯過程中的三大利器:google“翻譯,瀕道、金山詞霸(完 整版本)和cnki“翻譯助手”。具體操作過程如下:1 先打開金山詞霸自動取詞功能,然后閱讀文獻;2遇到無法理解的長句時,可以交給google處理,處理后的結(jié) 果猛一看,不堪入目,可是經(jīng)過大腦的再處理后句子的意思基本就明tt;3如果通過google仍然無法理解,感覺就是不同,那肯定是對 其中某個“常用單詞”理解有誤,因為某些單詞看似很簡單,但是在文 獻中有特殊的意思,這時就可以通過cnki的“翻譯助手”來查詢相關 單詞的意思,由于cnki的單詞意思都是來源與大量的文獻,所以它 的吻合率很高。另外,在翻譯過程中最好以“段落,'或者“長句”作為翻譯的基本單 位,這樣才不會造成“只見樹木,不見森林叩勺誤導。四大工具:1、google 翻譯: toolsgoogle,眾所周知,谷歌里面的英文文獻和資料還算是比較詳實 的。我利用它是這樣的。一方面可以用它查詢英文論文,當然這方面 的帖子很多,大家可以搜索,在此不贅述?;氐轿易约赫f的翻譯
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 鋼鐵廠建設鋼筋工施工合同
- 高速公路服務區(qū)小青瓦施工協(xié)議
- 高鐵綠化帶改造承包合同
- 酒店建設硬裝合同
- 垃圾處理供貨施工合同范本
- 股份受讓協(xié)議三篇
- 股票交易所行紀合同(2篇)
- 外場試驗保密協(xié)議書
- 公司個人互賠協(xié)議書
- 土地出讓合同中關于納稅額的約定
- 體驗式家長會的實施與開展
- 《標準工時培訓》課件
- 射擊館建設方案
- 應用寫作-消息和通訊
- 華為公司客戶滿意度管理
- 四年級綜合實踐活動上三:學校中遵守規(guī)則情況調(diào)查教學課件
- 2023-2024學年江蘇省淮安市數(shù)學高一上期末復習檢測試題含解析
- 中學首席名師、名師、骨干教師、教壇新秀評選方案
- 國際物流運輸管理智慧樹知到課后章節(jié)答案2023年下上海海事大學
- 犯罪學智慧樹知到課后章節(jié)答案2023年下山東警察學院
- 03K132 風管支吊架圖集
評論
0/150
提交評論