翻譯以.原文和在同一文件中前一致性或延遲性一種基于狀態(tài)機(jī)

上傳人：我*** IP屬地：北京上傳時(shí)間：2022-11-30 格式：DOCX 頁(yè)數(shù)：47 大?。?70.53KB 積分：9.6 舉報(bào) 版權(quán)申訴

翻譯以.原文和在同一文件中前一致性或延遲性一種基于狀態(tài)機(jī)_第2頁(yè)

翻譯以.原文和在同一文件中前一致性或延遲性一種基于狀態(tài)機(jī)_第3頁(yè)

翻譯以.原文和在同一文件中前一致性或延遲性一種基于狀態(tài)機(jī)_第4頁(yè)

翻譯以.原文和在同一文件中前一致性或延遲性一種基于狀態(tài)機(jī)_第5頁(yè)

免費(fèi)預(yù)覽已結(jié)束，剩余42頁(yè)可下載查看

 下載本文檔

版權(quán)說(shuō)明：本文檔由用戶(hù)提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

方法作者：XuWang,Hailong摘CAPPACELC宣稱(chēng)在分布式副本系統(tǒng)中存在一些性能評(píng)量的折RSMRSM-d，我們建立了概率模型去量化一致性和延遲性。一致性和延遲性都受d的影響，而d表示協(xié)關(guān)鍵詞：一致性,寫(xiě)沖突,延遲性, 引[4]進(jìn)一步解釋了CAP，并提出了一致性與可用性，一致性與延遲之間的兩種權(quán)衡。系統(tǒng)總是可寫(xiě)的并保持最終一致性，如Dynamo[5]，Cassandra[6]。然而其他系統(tǒng)如Chubby[7]，Zookeeper[8]必須有段很短的時(shí)間來(lái)從錯(cuò)誤中進(jìn)行恢復(fù)來(lái)保證各副本管弱一致性已經(jīng)在一些商業(yè)系統(tǒng)中運(yùn)用并且被許多最求高可用性和低延遲的開(kāi)發(fā)者bing搜索業(yè)務(wù)，21.8%4.3%；由于延時(shí)從400ms增加到900ms,谷歌在搜索領(lǐng)域的業(yè)務(wù)量下降了25%；100ms的延遲增加使亞馬遜的銷(xiāo)售額下降了一個(gè)百分點(diǎn)[13]。因此在系統(tǒng)設(shè)計(jì)中延遲是一項(xiàng)很重要的考慮RSMs[14]Quorum[15]系統(tǒng)是分布式系統(tǒng)和數(shù)據(jù)庫(kù)領(lǐng)域中比較流行的兩種容WR兩種副本集合。W集合用于寫(xiě)操作，R用于讀操作。在隨機(jī)時(shí)間通過(guò)RSMsquorum系統(tǒng)提供RSMsquorumRSMs的寫(xiě)RSMs一致性和延遲之間，基于不一致性的定量結(jié)果做出一個(gè)合理理性的權(quán)衡。束[19,20]和限制副本的寫(xiě)分離[21,22]RSMs的寫(xiě)不在這篇論文中，提出了一個(gè)定量的概率分析方法來(lái)計(jì)算RSMs的不一致性和延一點(diǎn)，調(diào)查和分析了三種代表性的RSM實(shí)現(xiàn)，分別為非一致全序廣播[23,24]，分調(diào)查和分析[27-29]。然后，我們提出了一種通用系統(tǒng)模型RSM-d，d的取值范圍為[1-n]（n為副本數(shù)。D表示提交一個(gè)寫(xiě)操作之前協(xié)調(diào)者必須收到的ACKs數(shù)。模型表明（1）在d∈[1,[n/2]+1]時(shí)一致性隨著d的增加而增強(qiáng)（2）當(dāng)d大于[n/2]+1d的增加而單調(diào)遞增。第四，將寫(xiě)不一致性的量化結(jié)RSM-d的延遲聯(lián)系起來(lái)，RSM-d的基礎(chǔ)上，我們?yōu)閷?xiě)沖突構(gòu)建了一個(gè)概率模型，寫(xiě)沖突是造成不一致性都受到一個(gè)共同因素的影響，這個(gè)共同因素就是d。背RSMs的背景，RSMs致力于容錯(cuò)在分布式系統(tǒng)中得到ChandraToueg[30]已經(jīng)證明全序廣播和分布一致性問(wèn)題是可解決的，在這次研究中，我們也遵循這些假設(shè)。這就意味著我們討論的RSMs有2f+1個(gè)副本，能f個(gè)崩潰錯(cuò)誤，并且最終能夠偵測(cè)到崩潰的節(jié)點(diǎn)。然而，實(shí)際的錯(cuò)誤探測(cè)器在一定時(shí)間內(nèi)必須輸出是否崩潰的結(jié)果，而不是長(zhǎng)時(shí)間等待一個(gè)最終結(jié)論。[31]將在[23-29]調(diào)查中，RSMs通過(guò)非一致全序廣播，分布式一致和一致全序廣播實(shí)現(xiàn)。如圖2.1所示，這三種具有代表性的RSM實(shí)現(xiàn)時(shí)這樣實(shí)現(xiàn)的：一個(gè)獨(dú)一的序列號(hào)并且將其提交給所有的副本。當(dāng)寫(xiě)操作消息抵達(dá)副本，副本會(huì)記錄下這個(gè)寫(xiě)操作，并且根據(jù)序列號(hào)進(jìn)行提交，然后回復(fù)協(xié)調(diào)者。只要協(xié)調(diào)者受到第一個(gè)回復(fù)，它就會(huì)向客戶(hù)端發(fā)送響應(yīng)信息。任意時(shí)刻如果協(xié)調(diào)者被認(rèn)RSMs的分布式一致，一旦協(xié)調(diào)者收到一個(gè)寫(xiě)操作命令，首先會(huì)1[n/2]+1n RSM-d系統(tǒng)模型，這個(gè)模型RSM模型的抽象表示。因RSMs，通過(guò)調(diào)查研究它們來(lái)獲取盡管已經(jīng)在第二部分介紹過(guò)了全序廣播算法，也僅是全序廣播綜合調(diào)查結(jié)果5外兩種是基于全序邏輯時(shí)鐘[16]的歷史交流和目的地協(xié)議實(shí)現(xiàn)。RSM和分布式一致的解決方案都是目的地協(xié)議。Paxos用于對(duì)單個(gè)寫(xiě)操作排序的協(xié)商,multi-paxos針3.1RSM-dPaxos算法進(jìn)行全序廣播。原因是：1）固定的定序器算法在所有的全序廣播里的延遲是最少的。2）Paxos在3）在動(dòng)態(tài)定序器中的標(biāo)示符，基于特權(quán)的算法，在通信Paxos算法相結(jié)合建立一個(gè)統(tǒng)一的系統(tǒng)RSM-d。這是切實(shí)可行的，因?yàn)槿驈V播和分布式一致是基本相等的。如圖2dd屬于[1;n]d副本也包括了協(xié)調(diào)器，也就是一果d=1,RSM-d降低為不均一全序廣播，協(xié)調(diào)器可以直接進(jìn)入提交階段。如果d=[n=2]+1,RSM-dPaxosd=n，RSM-d變成為一致的全序廣播。我們不只是涉及到了這三個(gè)代表性的RSMs,而且將深入探討其他的在過(guò)去被忽視的RSM-dRSMRSMs的假設(shè)。RSMs假設(shè)同時(shí)崩潰的效檢測(cè)器將開(kāi)始選擇新的協(xié)調(diào)器，并且只在歷史記錄的基礎(chǔ)上更改錯(cuò)誤。在RSM-d有多個(gè)協(xié)調(diào)器，如Multi-RingPaxos，他與Paxos相協(xié)調(diào)。寫(xiě)沖突概RSM-d，如ACK數(shù)??≥[??/2]+1，所有的寫(xiě)計(jì)算被提交到他們普通序和障修復(fù)后，f??[??/2]有故障出現(xiàn)，副本仍將保持一致性。然后如果??[??/21，ACK消息的圖4.1顯示了當(dāng)至少d節(jié)點(diǎn)接收到了“PROPOSE產(chǎn)生故障時(shí)，寫(xiě)日志丟失的方案。假設(shè)????是一個(gè)寫(xiě)計(jì)算所有日志丟失的可能。如d（b）之下，????=????。4.1因?yàn)闅v史記錄只寶庫(kù)了日志歷史，所以????也是在歷史記錄中的一個(gè)寫(xiě)計(jì)算丟????= 如今我們不考慮條件（c，但（a）和（b）仍保留。這意味著我們?cè)试S提交的)????????=∑??????1????????[??????=????????????= ∑ ??????1???????? 現(xiàn)在我們進(jìn)一步的不考慮（b），并只考慮（a。因此我們應(yīng)該考慮到不同步的度作用是隨機(jī)數(shù)據(jù)下的????????。為了便捷，我們用以下公式表示：????=??????∑??????1?????d<x<n。明顯的可以看出方程式（2）可以被寫(xiě)成??????????。當(dāng)??????時(shí)，寫(xiě)沖突的條件幾率也是如此。因此我們可以通過(guò)對(duì)每一個(gè)可能的??=??條件幾率之??????=∑??????????

=∑??????????????∑??????1????? 如上述的，在不同條件下協(xié)調(diào)器選擇和故障恢復(fù)，等式（1（2）和（3）都代??????= =∑??????????4.3圖4.3顯示了寫(xiě)重復(fù)的場(chǎng)景。當(dāng)一個(gè)不準(zhǔn)確的故障檢測(cè)器對(duì)協(xié)調(diào)者存在錯(cuò)誤懷疑，至少存在一個(gè)調(diào)解節(jié)點(diǎn)。然而，如果dn2Pno表示沒(méi)有相同節(jié)點(diǎn)的回復(fù)確認(rèn)消息的兩個(gè)節(jié)點(diǎn)集合的概率。除去兩個(gè)協(xié)調(diào)者本身，其他兩個(gè)隨機(jī)(d1)節(jié)點(diǎn)集可能不重疊，且都是不同的節(jié)點(diǎn)。因此，Pno是兩個(gè)隨機(jī)(d1)節(jié)點(diǎn)集數(shù)，沒(méi)有重疊和排除兩個(gè)協(xié)調(diào)者除以所有兩個(gè)隨機(jī)節(jié)點(diǎn)集(??? ??????

=??? ??? ???

?????????圖4.4所示，主要工作如下：在時(shí)距Te4.4分布函數(shù)分別為??和?????2???1是它們的兩個(gè)樣本，分別代表兩個(gè)連續(xù)心跳信息的延遲信息?！案北驹诙〞r(shí)器超時(shí)前沒(méi)有接收到新的心跳信息”意味著???2+????1+????2+????1+??，副本仍然認(rèn)為它崩潰了。兩個(gè)節(jié)點(diǎn)之間的這種錯(cuò)誤懷疑的概率由Pfs表示：????8=???????2+??>???1+=???????2????1>???=????8

??Var(T)T的方差。如果我們不知道T的分布，上述不等式可用????8的上限Pgfs中，對(duì)協(xié)調(diào)者錯(cuò)誤懷疑的概率可以通過(guò)加和n-f-1活躍副本的概率，乘以至少有一個(gè)活躍副本錯(cuò)誤懷疑協(xié)調(diào)者的n-f-1活躍副本來(lái)計(jì)算：

???

=∑(

) 1? 1?1?Pgfs的流傳式故障檢測(cè)器（由另一個(gè)符號(hào)Pgs-gfs表示）有一點(diǎn)不同： ??? ???1???8?????8=∑

) 1? Pwd表示寫(xiě)重復(fù)概率。寫(xiě)重復(fù)發(fā)生如果（1）協(xié)調(diào)者正常工作但（2）其他活認(rèn)信息不重疊，即，Pwd等于這三個(gè)事件的概率的乘積：????=1????? 但難以準(zhǔn)確計(jì)算Pno，因?yàn)樗Q于特定協(xié)調(diào)者選定的持續(xù)時(shí)間，故障恢復(fù)算法，[]??????=????+????=∑??????+????+1????? 雖然??????的最終的結(jié)果給出了，我們應(yīng)該在不同的情況下采用它。例如，時(shí)正如圖5.1所示，我們已經(jīng)在即將使用的表5.2??????=????+??????+??????=??=????=????=0。若不考慮這種序。例如，????1≤????2≤?≤???????1。????k小的????????????????“COMMIT”消息從協(xié)調(diào)者到副本i所需時(shí)??????????????=?????1+?????? 當(dāng)d2????1=??????????????????????+?????? ??????????????????????+min(????)+????????+??5.1接下來(lái)我們關(guān)注信息的推遲而忽略登陸時(shí)間??????（或者可以認(rèn)為是時(shí)間非常短。因????和??????都是隨機(jī)變量T的樣本（T的定義在第IV部分，遵循概率密度函遵循概率密度函 ??????=????+??

???? ???????（f(t)IV部分）和累計(jì)分布函數(shù)G(t)。??????的pdf和cdf分別由??? 和????t1≤k≤n?1表示。直觀上來(lái)說(shuō)，對(duì)于足夠小的正實(shí)數(shù)??，??????∈[t,t+ε]，只要存在一個(gè)??????∈[tt+ε]，使得??????取值（k-1）t，對(duì)于其他??????取值??1????????(??????∈[,+???= ) ?????[t, ???

???1???????? +ε ???2=???1??(????∈[t,t+ε]) ) ???? ???? +ε ??? ???=???1 )?????? ???1?????? +ε??? ???=???1

) ???1?????? +ε??? ???=???1 ) ???11? + ??? =??????????(????????[t,t+ε])=???1 ??? ???11? ???通過(guò)方程（7），我們可以猜想??????1????????≥1:????????+1????? = +min??????)???(?????1+min??????= ??????1以上兩個(gè)minLCi被抵消，因?yàn)樘峤浑A段沒(méi)有變化，并且????0=0f(t)t0時(shí)，f(t)=0t0時(shí)，g(t)=0，G(t)=0,G+∞=1。如果d=1，那么??????2????? =?????? = ???1 1? ???2= ???1? 0= 1? ???1???

1? ???1= 1? ?

????????00若假設(shè) 存在，那么?? =∫+∞ ??是一個(gè)常量。在這種情況下很容0以至于：??????2????? 如果??≥2

???????????1=0+∞1? ????????+1????? = )???????=∫ ???1 (???2) )?1(1? ???

??? ???

??? ?1 ( ) (1? ??= ) ? ? ???

???

=???

) 0

(1?

???

????類(lèi)似的，如果??

???? =??

??+1?

=(???1)∫???

?11? ??? ??≥讓???????=(???1

? (1?

?? ????????+1????? =????? ??≥ 顯然???????>0。所以方程（8）f(x)n的數(shù) 一致性和延遲之間的長(zhǎng)期運(yùn)行的RSMs，系統(tǒng)的一致性象征著有序提交的寫(xiě)操作比率，這能通過(guò)1-PwcE(Lw(d))進(jìn)行估算。在這部分，B是系統(tǒng)總效益（比如，收入或用戶(hù)體驗(yàn)，BcBl是一致性和延遲對(duì)系統(tǒng)效益的影響值。舉例來(lái)說(shuō)，B能看做網(wǎng)上商店的總收入，強(qiáng)一致性的事務(wù)性購(gòu)買(mǎi)會(huì)使收入增加，而高延遲將導(dǎo)致用戶(hù)減少和收入減少。ξ表示其他因素對(duì)總效益的????=????+??+??=??(1?????????)+??+??????????? +=??+??+??+?????????????+????????=?????????????+????????=??????????? ???????????+???????? ???????=??????????? ???其中??=??+??+??+??并且????=??????????+????????????????? 盡管??，??，??和??????1d的取值，對(duì)于一V(d)是上述等式的可變部分。V(d)取最小值時(shí)B(d)取最大值。????=??????????? ?????≤??????????? ?????′=????′=的d的值。B(1)=???????????1+???????? ??([2]+1)=???????????[2]+????=???????????通過(guò)上述方程組，我們能得到??，??，??d取值是一定的，而實(shí)的失敗率，RNSanti-entropygossip-style故障檢測(cè)會(huì)使系統(tǒng)更復(fù)雜和更不確定。在本節(jié)中，我們關(guān)注的是驗(yàn)證我們的結(jié)果，從而減少RSM-d實(shí)驗(yàn)的假設(shè)變量數(shù)量的最低值，擴(kuò)大應(yīng)用范圍。因此不考慮anti-entropy和gossip-style的故障檢測(cè)。為????和????。因此我們估計(jì)??????=????????/10000000。與此同時(shí)，我們計(jì)算延遲的平均值來(lái)估計(jì)????d的期望。s0.05（20ms，0.1（10ms）0.2（5ms被分別設(shè)置到（1000ms，2000ms）和（500ms，1000ms。在以上輸入?yún)?shù)下RMSE=0.0009%，stddev.=0.0052%.dn21,我們觀察到????和????為了證明方程(8均布fx)循=0:01;0:2;0:05;:1;:2，該分布已傳入相應(yīng)信息。我們執(zhí)行一個(gè)空寫(xiě)(沒(méi)有操作)保證日志記錄和寫(xiě)入提交的時(shí)間可以忽略。我們對(duì)每個(gè)??∈,9和??∈,??進(jìn)行測(cè)試，得到延遲時(shí)間的平均值并預(yù)測(cè)方程(8)給出的延遲期望值。在所有情況下將?????? ?????1 的觀測(cè)均0.013ms和t.d0.019ms,符合們對(duì)延遲的預(yù)期。dλ=0:01,Pc=2%,Te=1000ms，To=2000ms。7.1所示，xn(2<n<9)和相應(yīng)的應(yīng)答d(1<d<[n=2]),y軸表示的是寫(xiě)入沖突的概率。為了強(qiáng)調(diào)不同的差異，我們用log??????來(lái)代替??????。我們每步下降至少一個(gè)數(shù)量級(jí)。注意到當(dāng)d>[n/2]時(shí)，寫(xiě)入沖突的概率??????為0，所以并d方程8揭示了寫(xiě)延遲與d的關(guān)系。在實(shí)驗(yàn)中，我們想要更直觀的展示出寫(xiě)延遲df(x)被設(shè)置為λ=0.01。同時(shí)，還執(zhí)行一個(gè)空寫(xiě)入擾最終的結(jié)果，因?yàn)????1包括的所有值將被刪除?！蔥和相應(yīng)數(shù)量的ACKs數(shù)??∈[1,??/2]，y軸表示代替????1的延遲????????????d的增大而增大。在這里我們關(guān)注??∈[2,??/2]，實(shí)際上，在??∈[1,??]之間的任意值??????dd值的降低將導(dǎo)致n值的增大。原因是協(xié)調(diào)器會(huì)存在更多的ACKs去選擇最快的d。7.2d一致v.s.延正如第六節(jié)所定義的一致性等于1???????，現(xiàn)在我們結(jié)合以上兩個(gè)實(shí)驗(yàn)的結(jié)果，7.3所示，x軸表示代替????1的延遲??????，y軸表示1??????d3n3，5，7和其相應(yīng)∈[??/2]n和??1??/2]，它表明，更強(qiáng)的一致性必然導(dǎo)致糟糕的7.3V.S.RSMs系統(tǒng)作為其銷(xiāo)售系統(tǒng)。雖然難的。但是，如果我們不想知道系統(tǒng)的精確值的好處????，我們只需要通過(guò)統(tǒng)計(jì)方法知道一致性效益比??和延遲效益比??，最后確定??′的最大收益值????′。價(jià)格是V。一方面，如果每個(gè)正常的(一致)購(gòu)買(mǎi)可以帶來(lái)20%的價(jià)格,和每一個(gè)異常(不一致)購(gòu)買(mǎi)帶來(lái)雙重?fù)p失的價(jià)格(如賠償訂單沖突)。產(chǎn)生效益的為20%????1????????2????1?1???????，因此收益率的一致性為??=2.2????。另一方面，用戶(hù)相1001%。也就是說(shuō)，收益的比例延遲??為0.01%????。因此，我們有??=2.2????和??=0.01%????。7.4第六節(jié)表明，整體利益????的最大值為其變量????=????2.2??????d0.01%????????????? 有最小值。在??∈[1,??]+1]和??∈[2,9]2經(jīng)計(jì)算出????11????部分描述了變量整體利益????。例如，n=3，d=2，????值為0.125????45d=2時(shí)????有最小值，????Paxos實(shí)現(xiàn)。因相關(guān)較具有代表性的實(shí)現(xiàn)是非一致全序廣播，PaxosRSMs已經(jīng)RSMs一致性已經(jīng)被大量研究，但依舊在最近吸引著人們的目光。根據(jù)CAP[3]和RSMs中，我們的工作通過(guò)考慮崩潰錯(cuò)誤時(shí)寫(xiě)沖突發(fā)生的概率來(lái)量化不一致討Quorum系Quorum系統(tǒng)定義了兩類(lèi)副本集合，一類(lèi)是W，另一類(lèi)是讀R。它們能被配ONE(W=1),QUORUM(W=[n/2]+1ALL(W=n)或者是任意值。這和序，而RSMs為全序排列。所以一些在RSMs系統(tǒng)中引起寫(xiě)沖突的情況不一定適用于Quorum系統(tǒng)。在現(xiàn)今的實(shí)際quorum系統(tǒng)中，一般把第二種寫(xiě)看做一個(gè)新的寫(xiě)操RSMs在對(duì)同步提交寫(xiě)操作和副本歷史記錄狀態(tài)時(shí)懷疑協(xié)調(diào)者后會(huì)進(jìn)行協(xié)調(diào)者10現(xiàn)的變體，我們首先提出了一種叫做RSM-d的通用模型來(lái)統(tǒng)一描述大多數(shù)運(yùn)用參考.SummaryoftheAmazonEC2andAmazonRDSServiceDisruptionintheUSEastRegion.April2011J.Dean.Designs,lessons,andadvicefrombuildinglargedistributedsystems.KeynotefromLADIS2009.E.A.Brewer.TowardsRobustDistributedSystems.InPODC,pages7C7,D.J.Abadi.Consistencytradeoffsinmoderndistributeddatabasesystemdesign:CAPisonlypartofthestory.IEEEComputer,45(2):37C42,2012.G.DeCandia,D.Hastorun,M.Jampani,G.Kakulapati,A.Lakshman,A.Pilchin,S.Sivasubramanian,P.Vosshall,andW.Vogels.Dynamo:Amazonshighlyavailablekey-valuestore.InSOSP2007.TheApacheCassandraProject./[7]M.Burrows,”TheChubbylockserviceforloosely-coupleddistributedsystems,”inOSDI’06:Proceedingsofthe7thsymposiumonOperatingsystemsdesignandimplementation,2006,pp.P.Hunt,M.Konar,F.P.Junqueira,andB.Reed,ZooKeeper:Wait-freecoordinationforInternet-scalesystems,inUSENIXATC10:Proceedingsofthe2010USENIXAnnualTechnicalConference.USENIXAssociation,2010.W.Vogels.Eventuallyconsistent.Commun.ACM,52:40C44,JanuaryK.Birman.ReliableDistributedSystemsTechnologies,WebServicesandApplications.Springer,2005.E.SchurmanandJ.Brutlag.Performancerelatedchangesandtheiruserimpact.PresentedatVelocityWebPerformanceandOperationsConference,June2009.VelocityandtheBottomLine. /site/glinden/Home/StanfordDataMining.2006-11-29.ppt.29F.Schneider.Implementingfault-tolerantservicesusingthestatemachineapproach:atutorial.ACMComputingSurveys,22(4),1990.Merideth,MichaelG.,andMichaelK.Reiter.Selectedresultsfromthelatestdecadeofquorumsystemsresearch.Replication,SpringerBerlinHeidelberg,2010,185-206.Lamport,L.Time,clocks,andtheorderingofeventsinadistributedsystem.ACMCommunications,21(7),pp.558-565,1978.D.Malkhi,M.Reiter,A.Wool,andR.Wright.Probabilisticquorum.systems.InformationandCommunication,(170):184C206,2001.PeterBailis,ShivaramVenkataraman,JosephM.Hellerstein,MichaelFranklin,IonStoica.ProbabilisticallyBoundedStalenessforPracticalPartialQuorums.InVLDB2012S.S.Chawathe,H.Garcia-Molina,andJ.Widom.FlexibleConstraintManagementforAutonomousDistributedDatabases.IEEEDataEng.Bull.,17(2):23C27,1994.A.GuptaandS.Tiwari.Distributedconstraintmanagementforcollaborativeengineeringdatabases.InCIKM,pages655C664,1993.H.YuandA.Vahdat.DesignandEvaluationofaContinuousConsistencyModelforReplicatedServices.InOSDI,pages305C318,2000.S.Shah,K.Ramamritham,andP.J.Shenoy.ResilientandCoherencePreservingDisseminationofDynamicDataUsingCooperatingPeers.IEEETrans.Knowl.DataEng.,16(7):799C812,2004XavierDefago,AndreSchiper,andPeterUrban.Totalorderbroadcastandmulticastalgorithms:Taxonomyandsurvey.ACMComput.Surv.36,4(December2004)XavierDefago.Comparativeperformanceanalysisoforderingstrategiesinatomicbroadcastalgorithms.IEICETrans.onInformationandSystems.December2003L.Lamport.PaxosMadeSimple.ACMSIGACTNews,32(4):18C25,DecemberP.Marandi,M.Primi,N.Schiper,andF.Pedone.RingPaxos:Ahigh-throughputatomicbroadcastprotocol.InternationalConferenceonDependableSystemsandNetworks(DSN),2010,pp.527-536.L.Lamport.GeneralizedConsensusandPaxos.TechnicalReportMSRTR-2005-33,MicrosoftResearch,2005,B.KemmeandG.Alonso.Databasereplication:ataleofresearchcommunities.PVLDB,ParisaJaliliMarandiMarcoPrimiFernandoPedone.Multi-RingPaxos.InDSNChandra,T.,Toueg,S.:UnreliableFailureDetectorsforReliableDistributedSystems.JournaloftheACM43(2),225C267(1996).W.Chen.OntheQualityofServiceofFailureDetectors.IEEETransactionsonComputer.May2002.DiogoBecker,FlavioJunqueira,andMarcoSerafini.LeaderElectionforReplicatedServicesUsingApplicationScores.InMiddleware2011.B.Lampson.TheABCDsofPaxos.InProceedingsofthe20thACMSymposiumonPrinciplesofDistributedComputing(PODC01),page13.ACMPress,2001.IditKeidarandSergioRajsbaum.Onthecostoffault-tolerantconsensuswhentherearenofaults-atutorial.TechnicalReportMIT-LCS-TR-821,LaboratoryforComputerScience,MassachusettsInstituteTechnology,Cambridge,MA,02139,May2001.alsopublishedinSIGACTNews32(2)(June2001).A.Demers,D.Greene,C.Hauser,W.Irish,J.Larson,S.Shenker,H.Sturgis,D.Swinehart,andD.Terry.Epidemicalgorithmsforreplicateddatabasemaintenance.InPODC1987.MarinBertier,OlivierMarin,PierreSens,ImplementationandPerformanceEvaluationofanAdaptableFailureDetector,Proceedingsofthe2002InternationalConferenceonDependableSystemsandNetworks,p.354-363,June23-26,2002.M.RaynalandF.Tronel.GroupMembershipFailureDetection:ASimpleProtocolandItsProbabilisticAnalysis.DistributedSystemsEng.J.,vol.6,no.3,pp.95-102,R.vanRenesse,Y.Minsky,andM.Hayden.Agossip-stylefailuredetectionservice.InProceedingsofInternationalConferenceandDistributedSystemsPlatformsandOpenDistributedProcessing(IFIP),1998.J.Gray,P.Helland,P.ONeil,andD.Shasha.Thedangersofreplicationandasolution.InSIGMOD1996.B.F.Cooper,R.Ramakrishnan,U.Srivastava,A.Silberstein,P.Bohannon,H.-A.Jacobsen,N.Puz,D.Weaver,andR.Yerneni.PNUTS:Yahoo!sHostedDataServingPlatform.PVLDB,1:1277C1288,August2008.W.Lloyd,M.J.Freedmand,M.Kaminsky,andD.G.Andersen.Dontsettleforeventual:Scalablecausalconsistencyforwide-areastoragewithCOPS.InSOSP2011.ConsistencyorLatency?AQuantitativeAnalysisofReplicationSystemsBasedonReplicatedStateXuWang,HailongSun,TingDeng,JinpengSchoolofComputerScienceandEngineeringBeihangUniversityBeijing,ChinaEmail:{wangxu,sunhl,dengting,huaijp}—ExistingtheorieslikeCAPandPACELChaveclaimedthattherearetradeoffsbetweensomepairsofper-formancemeasuresindistributedreplicationsystems,suchasconsistencyandlatency.However,currentsystemstakeaveryvagueviewonhowtobalancethosetradeoffs,e.g.eventualconsistency.Inthiswork,weareconcernedwithprovidingaquantitativeanalysisonconsistencyandlatencyforwidely-usedreplicatedstatemachines(RSMs).BasedonourpresentedgenericRSMmodelcalledRSM-d,probabilisticmodelsarebuilttoquantifyconsistencyandlatency.Weshowthatbothareaffectedbyd,whichisthenumberofACKsreceivedbythecoordinatorbeforecommittingawriterequest.Andwefurtherdefineapayoffmodelthroughcombiningtheconsistencyandlatencymodels.Finally,withMonteCarlobasedsimulation,wevalidateourpresentedmodelsandshowtheeffectivenessofoursolutionsintermsofhowtoobtainanoptimaltradeoffbetweenconsistencyandlatency.—consistency;writeconflict;latency;replicatedPracticaldistributedsystemsinCloudandBigDatasolu-tionsoftenfacemanychallengessuchashighscalabilityandavailability,whicharebuiltonmassivecommoditycomput-ers,disks,networkdevicesandundercomplexmanagementtasks[1,2].Toservemoreusersandprovidehighenoughavailability,servicesanddataaretypicallyreplicatedacrossmultiplevirtualmachines(VMs),physicalmachinesandevengeographically-distributedclusters.TheCAPtheorem[3]showstheimpossibilityofobtainingallthreeofconsistency,availabilityandpartitiontolerancesimultaneously.AndPACELC[4]furtherinterpretsCAP,whichclaimstwotradeoffsincludingconsistency&availabilityandconsistency&latency.Ontheonehand,peoplemakeatradeoffbetweenconsistencyandavailability.Inordertoprovideextremelyhighavailability,manysystemsarealwayswritablewitheventualconsistency,suchasDynamo[5]andCassandra[6];whileothers,likeChubby[7]andZookeeper[8],mustexperienceashortperiodofunavailabilitytorecoverfromfailuresforguaranteeingstrongconsistencyofdifferentreplicas.Ontheotherhand,peoplefocusonthetradeoffbetweenconsistencyandlatency.Forexample,eventuallyconsistentreads,nomatterhowfasttheyare,cannotboundthestalenessasthenewestversionswillbeeventually[9];andstrongconsistencyleadstohigherlatencyfor

duetothewriteuniformity[10].Althoughweakconsistencyhasbeenusedinsomecommercialsystemsandisacceptablebymanypractitionersforhigheravailabilityandlowerlatency,itisnecessarytoboundtheinconsistencyandtoknowhowinconsistencytheweakconsistencyis.Sincelatencyimpactstheend-userexperienceofanIn-ternetapplication,ithas eanimportantsystemmetricformodernserviceproviders[11].Somestatisticalresultspre-sentedin[12]showthatend-usersareverysensitivetosystemlatency.Forinstance,atMicrosoftBing,a2-secondslowdownreducesqueries/userby1.8%andrevenue/userby4.3%;alatencyincreasefrom400msto900mswithGooglesearchresultsina25%dropoffinpagesearches;andhasa1%dropinsaleswitha100mslatencyincrease[13].Thereforelatencyisanimportantfactorinsystemdesign.Inthiswork,wefocusonthetradeoffbetweenconsistencyandlatencyindistributedreplicationprotocols.Twopopularfault-toleranceapproachesindistributedsys-temsanddatabasesarereplicatedstatemachines(RSMs)[14]andquorumsystems[15].RSMsdescribedesirablereplica-tionsemanticstomakeoperationscommittedintotalorder.QuorumsystemsdefinesetsofreplicasWandR,whereWisusedforwriteandRforread.Andtheycommitwritesbyvectorclocks[16]withsemanticreconciliation[5]incausalorder.Forquorum-likesystems,severalworks[17,18]boundreadstalenessfromtheaspectsofdataversions,staletimeandstalenessprobabilitytodescribetheinconsistency,andthendiscussthetradeoffbetweenconsistencyandlatency.However,littlehasbeendoneonquantifyinginconsistencyandlatencyforRSMs,whiletheyaredesignedtoproviderelativelystrongerconsistencythanquorumsystems.ForreadoperationsofRSMs,wecanborrowanalysistechniquesfromthequantitativereadstalenessinquorumsystems.ButforwriteoperationsofRSMs,weneedanewanalyticalandquantitativemethodtomeasurethedegreeofwriteinconsistency.ThenwecanrationallymakeatradeoffbetweenconsistencyandlatencyforRSMsbasedonthequantitativeresultofwriteAlthoughtherearesomeexistingalternativemodelstodescribewriteinconsistencyintheabsenceofcrashfailures,e.g.,writeconsistencyconstraint[19,20]andlimitedwritedivergencyofreplicas[21,22],wewanttoknowthedegreeofwriteinconsistencyforRSMswhenencountering978-1-4799-0181-4/13/$31.00?2013failures,ratherthanhowtosatisfyconstraintsorboundthemaximumdeviationofadataitem.ThereforewequantifywriteinconsistencyforRSMsbytheprobabilityofwriteconflictwhichrepresentstheprobabilityofawritecommittedinanabnormalorder.Inthispaper,wepresentaquantitativeprobabilisticanaly-sisapproachtomeasuringtheconsistencyandlatencyofRSM-s,andfurtherprovideasolutionformakingtradeoffbetweenconsistencyandlatencytoachievethemaximalsystembenefit.First,wehavesurveyedandanalyzedthreerepresentativeRSMimplementationsincludingnon-uniformtotalorderbroadcast[23,24],distributedconsensus(Paxos)[25,26]anduniformtotalorderbroadcast[23,24],aswellastheirvariants[27-29].ThenweproposeagenericsystemmodelRSM-d,whered∈[1,(nisthenumberofreplicas)representsthenumberofthatmustbereceivedbeforecommittingawrite.RSM-dcandescribethethreerepresentativeoneswhered=1,[n/2]+1,nrespectively.Second,wemeasurethewriteinconsistencyofRSM-dbasedonaprobabilisticmodel.Thewriteincon-sistencyisquantifiedbytheprobabilityofwriteconflictwhichrepresentstheprobabilityofanywritecommittedinanabnormalorder.Ourprobabilisticmodelshowsthat(1)consistencyincreaseswhendrisesifd∈[1,[n/2]+1]andstrongconsistencyisalwaysguaranteedifd∈[[n/2]+1,n].Third,weevaluatethelatencyofRSM-dthroughitsexpectationwhichshowsthatthelatencystrictlymono-tonicallyincreaseswithrespecttod.Fourth,combiningthequantitativeresultsofwriteinconsistencyandlatencyofRSM-d,weprovideasolutionfortradeoffbetweenconsistencyandlatencytoachievethemaximalsystembenefit.Finally,throughMonteCarlobasedeventdrivensimulations,wevalidateoursolutionfortradeoffbetweenconsistencyandlatencyfromtheaspectofoverallsystembenefit.WemakethecontributionsasWepresentagenericsystemmodelRSM-dforRSMstogiveanunifieddescriptionofmajorreplicationprotocolsusingRSMs;OnthebasisofRSM-d,webuildaprobabilisticmodelforwriteconflictthatisoneofthekeyfactorstocauseinconsistency,andaprobabilisticmodelforcharacterizinglatencyaswell.Wefindoutthatbothconsistencyandlatencyareaffectedbythecommonfactord.Wefurtherdefineapayoffmodeltomaketradeoffbetweenconsistencyandlatencyforachievingthemaximumsystembenefitthroughcombiningthewriteconflictmodelandlatencymodelfromtheperspectiveofaserviceprovider;WithMonteCarlobasedeventdrivensimulations,wevalidateourquantitativeresultsandshowtheeffectivenessofourpresentedsolutionsintermsofhowtoobtainanoptimaltradeoffbetweenconsistencyandlatency.Theremainderofthispaperisstructuredasfollows.SectionIIintroducesthebackground.SectionIIIpresentsoursystemmodel.SectionIVdescribeshowtoquantifytheprobabilityofwriteconflict.Thecalculationoflatency

presentedinSectionV.SectionVIshowsthetradeoffofconsistencyandlatency.SectionVIIprovidesourexperimentalresults.RelatedworkanddiscussionarepresentedinSectionInthissection,wepresentthebackgroundofRSMsarewidelyusedindistributedsystemswiththetargettotoleratefailures.Failuremodesfallintonon-Byzantine(fail-stop,crashandmessageloss)andByzantine(signedmessagesornot).Withthecomplexitythatatleast3f+1nodesfortoleratingfByzantinefailuresandthelowoccurrenceprobabilityofByzantinefailures,commonRSMsonlyconsidernon-Byzantineones.Andmessagelossfailurescanbeeasilyeliminatedbynetworkprotocols,sopractitionersusuallysaythatRSMscantoleratecrashfailures.ChandraandToueg[30]haveprovedthatthetotalorderbroadcastandthedistributedconsensusproblemsaresolvableandequivalenttoeachotherunderWfailuredetectorsandatmost[n/2]simultaneouscrashfailures.Inthiswork,wealsofollowtheseassumptions.ItmeansthatwediscussRSMswhichtolerateuptofcrashfailureswith2f+1replicasandcaneventuallydetectthecrashofnodes.However,practicalfailuredetectorsmustoutputaresult(crashornot)inaboundedtimeotherthanalong-waitingforeventualconclusions.[31]classifiesthequalityofserviceofpracticalfailuredetectorsintotwotypes:speed,i.e.,howfastafailuredetectordetectsacrash;andaccuracy,i.e.,howwellitavoidsfalsesuspicion.ThereforepracticalfailuredetectorsofRSMshaveaprobabilitytofalselysuspectonanormalAssurveyedin[23-29],RSMsaretypicallyimplementedbynon-uniformtotalorderbroadcast,distributedconsensusanduniformtotalorderbroadcast.AsshowninFigure1,thethreerepresentativeRSMimplementationsworkasfollows:Innon-uniformtotalorderbroadcast,onceacoordinatorreceivesawrite,itassignsthewriteauniquesequencenumberandthendirectlycommitsittoallreplicas.Whenthewritearrivesatareplica,thereplicawillrecordthewrite,andcommititaccordingtoitssequencenumberandthenrespondtothecoordinator.Oncethecoordinatorhasreceivedthefirstresponse,itwillreplytotheclient.Notethatanytimeacoordinatorissuspectedascrashedbyfailuredetectors,thenewcoordinatorwillbeelected[32]andrecoversfailuresbasedonthehistoricalcommitrecords.FordistributedconsensusbasedRSMs,onceacoordi-natorreceivesawrite,atfirstitpropagatesthewritewiththegivensequencenumbertoallreplicas.Afterloggingthewrite,everyreplicawillsendanacknowledgement(ACK)messagetothecoordinator.Untilatleast[n/2]+1ACKs(includingthecoordinatoritself)arereceivedbythecoordinator,itwillcommitthewrite.Whenthecoordinatorissuspectedascrashedbyfailuredetectors,thenewcoordinatorwillbeelectedandrecoverfailuresbasedonthehistoricallogrecords(notthehistoricalcommitrecords).Uniformtotalorderbroadcastisverysimilartodistribut-edconsensus,butthecoordinatormustwaitforallnACKstobereturned.Fig.1.ThreeRepresentativeRSMAssumingwithoutlossofgeneralitythecoordinatorsendsitselfanACKmessage,wecandiscoverthatanobviousdif-ferenceamongthethreerepresentativeRSMimplementationsisthenumberofACKs(denotedbyd)thatthecoordinatorwaitsforbeforecommittingawrite.Andthevaluesofdfornon-uniformtotalorderbroadcast,distributedconsensusanduniformtotalorderbroadcastare1,[n/2]+1andn,respectively.Thuswecaneasilyconcludethatthelatenciesofthemfromlowtohigharenon-uniformtotalorderbroadcast,distributedconsensusanduniformtotalorderbroadcast.Intermsofconsistency,distributedconsensus(Paxos)hasshoweditssafety(strongconsistency)in[33].Thatis,RSMsarealwaysconsistentif[n/2]+1≤d≤n.However,fornon-uniformtotalorderbroadcast,ifacommittingwriteislostinhistoricalcommitrecordsduetocrashfailures(i.e.,allcommittednodesincludingthecoordinatorcrash)orfailuredetectiononthecoordinatorhappens,thesequencenumberheldbythewriteorusedbythefalselysuspectedcoordinatorwillbereusedtokeepsystemliveness.Thesewillleadtowriteconflictandthenresultinwriteinconsistency.RSM-d:AGENERICRSMInthissection,wewillpresentoursystemmodelRSM-d,whichisthe ionofRSMimplementations.SinceourdesiredsystemmodelshouldcoverthethreerepresentativeRSMs,wehavesurveyedthemandtrytoderivethesystemAlthoughwepresent(non-uniformanduniform)totalorderbroadcastalgorithmsinsectionII,theyareonlyoneoffiveclasses.[23]providesacomprehensivesurveyfortotalorderbroadcast,consideringallfiveclassesoforderingmechanismsandbothnon-uniformanduniformalgorithms.Thefirstthreeorderingmechanisms:fixedsequencer(showninsectionII),movingsequencerandprivilege-basedarebuiltbasedonafixedsequenceroramovingtoken;Theothertwo:communicationhistoryanddestinationagreementareimplementedbasedonatotalorderlogicalclock[16].AndthesolutionofdistributedconsensusforRSMisthesamewiththedestinationagreement,wherePaxosisusedtomakeagreementontheorderforawrite,andmulti-paxos[25]forallwrites.Essentiallyweadda’sequencer’whichcancomparetheordersofanytwowritestocausalorder,andthengaintotalorder.Basedontheanalysisabove,weareconcernedwith

Fig.2.RSM-dtotalorderbroadcastwiththefixedsequencerandthePaxosalgorithm.Thereasonsare:(1)thefixedsequenceralgorithmhasthelowestlatencyamongalltotalorderbroadcastones(onpoint-to-pointnetworks)[24];(2)Paxosisprovedtobeoptimalingeneralconsensusalgorithms[34];(3)boththetokeninthemovingsequencerandprivilege-basedalgorithms,andthelogicalclockinthecommunicationhistoryanddesti-nationagreementalgorithms,canbeseenas”anotherfixedThenwecombinetotalorderbroadcastwiththefixedsequencerandthePaxosalgorithmtoanunifiedsystemmodelRSM-d,whichisfeasiblesincetotalorderbroadcastanddistributedconsensusareessentiallyequivalenttoeachother[30].AsdepictedinFigure2,RSM-dincludestheagreementphaseandthecommitphase.Intheagreementphase,anywritemustbeb

人人文庫(kù)> 全部分類(lèi)> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

翻譯以.原文和在同一文件中前一致性或延遲性一種基于狀態(tài)機(jī)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

翻譯以.原文和在同一文件中前一致性或延遲性一種基于狀態(tài)機(jī)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔