




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
11Pandas官方教程手冊(cè)目錄Pandas官方教程1.1十分鐘搞定Pandas1.2Pandas秘籍1.3第一章1.3.1第二章1.3.2第三章1.3.3第四章1.3.4第五章1.3.5第六章1.3.6第七章1.3.7第八章1.3.8第九章1.3.9學(xué)習(xí)Pandas1.401-Lesson1.4.102-Lesson1.4.203-Lesson1.4.304-Lesson1.4.405-Lesson1.4.506-Lesson1.4.607-Lesson1.4.708-Lesson1.4.809-Lesson1.4.910-Lesson1.4.1011-Lesson1.4.11Pandas官方教程Pandas官方教程PAGE2PAGE2Pandas官方教程官方教程是官方文檔的教程頁面上的教程。名稱原文譯者十分鐘搞定pandas10MinutestopandasChaoSimplePandas秘籍Pandascookbook飛龍學(xué)習(xí)PandasLearnPandas派蘭數(shù)據(jù)在線閱讀PDF格式EPUB格式MOBI格式代碼倉庫PandasPandasPAGE10PAGE10十分鐘搞定pandas原文:10Minutestopandas譯者:ChaoSimple來源:【原】十分鐘搞定pandas官方網(wǎng)站上《10Minutestopandas》的一個(gè)簡(jiǎn)單的翻譯,原文在這里。這篇文章pandas的一個(gè)簡(jiǎn)單的介紹,詳細(xì)的介紹請(qǐng)參考:面格式引入所需要的包:In[In[1]:importpandasaspdIn[2]:importnumpyasnpIn[3]:importmatplotlib.pyplotasplt可以通過數(shù)據(jù)結(jié)構(gòu)入門來查看有關(guān)該節(jié)內(nèi)容的詳細(xì)信息。1、可以通過傳遞一個(gè)list對(duì)象來創(chuàng)建一個(gè)Series,pandas會(huì)默認(rèn)創(chuàng)建整型索引:In[4]:s=pd.Series([1,3,5,np.nan,6,8])In[5]:sOut[5]:01.013.025.03NaN46.058.0dtype:float642numpyarray,時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame:In[6]:dates=pd.date_range('20130101',periods=6)In[7]:datesOut[7]:DatetimeIndex(['2013-01-01','2013-01-02','2013-01-03','2013-01-04','2013-01-05','2013-01-06'],dtype='datetime64[ns]',freq='D')In[8]:df=pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))In[9]:dfOut[9]:ABCD2013-01-010.469112-0.282863-1.509059-1.1356322013-01-021.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2.104569-0.4949291.0718042013-01-040.721555-0.706771-1.0395750.2718602013-01-05-0.4249720.5670200.276232-1.0874012013-01-06-0.6736900.113648-1.4784270.5249883個(gè)DataFrame:CCD E F1.03 testfoo1.03trainfoo1.03 testfoo1.03trainfoo01.02013-01-0211.02013-01-0221.02013-01-0231.02013-01-02BIn[11]:df2Out[11]:A'D':np.array([3]*4,dtype='int3'E':pd.Categorical(["test","trai'F':'foo'})'B':pd.Timestamp('20130102'),'C':pd.Series(1,index=list(range(::4)),dtype='float32'),:2'),:n","test","train"]),::In[10]:df2=pd.DataFrame({'A':1.,4、查看不同列的數(shù)據(jù)類型:In[In[12]:df2.dtypesOut[12]:float64datetime64[ns]float32int32categoryobjectdtype:object5IPython自動(dòng)補(bǔ)全功能會(huì)自動(dòng)識(shí)別所有的屬性以及自定義的列,下圖中是所有能夠被自動(dòng)識(shí)別的屬性的一個(gè)子集:In[In[13]:df2.<TAB>df2.Adf2.absdf2.adddf2.aligndf2.alldf2.anydf2.appenddf2.applydf2.applymapdf2.as_blocksdf2.asfreqdf2.as_matrixdf2.astypedf2.atdf2.at_timedf2.axesdf2.Bdf2.between_timedf2.bfilldf2.blocksdf2.booldf2.boxplotdf2.Cdf2.clipdf2.columnsbinebine_firstbineMultpounddf2.consolidatedf2.convert_objectsdf2.copydf2.corrdf2.corrwithdf2.countdf2.covdf2.cummaxdf2.cummindf2.cumproddf2.cumsumdf2.D二、查看數(shù)據(jù)詳情請(qǐng)參閱:基礎(chǔ)。1、查看DataFrame中頭部和尾部的行:ABCD2013-01-010.469112-0.282863-1.509059-1.1356322013-01-021.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2.104569-0.4949291.0718042013-01-040.721555-0.706771-1.0395750.2718602013-01-05-0.4249720.5670200.276232-1.087401ABCD2013-01-040.721555-0.706771-1.0395750.2718602013-01-05-0.4249720.5670200.276232-1.0874012013-01-06-0.6736900.113648-1.4784270.524988In[14]:df.head()Out[14]:In[14]:df.head()Out[14]:In[15]:df.tail(3)Out[15]:In[16]:df.indexOut[16]:DatetimeIndex(['2013-01-01','2013-01-02','2013-01-03','2013-01-04','2013-01-05','2013-01-06'],dtype='datetime64[ns]',freq='D')In[17]:df.columnsOut[17]:Index([u'A',u'B',u'C',u'D'],dtype='object')In[18]:df.valuesOut[18]:array([[0.4691,-0.2829,-1.5091,-1.1356],[1.2121,-0.1732,0.1192,-1.0442],[-0.8618,-2.1046,-0.4949,1.0718],[0.7216,-0.7068,-1.0396,0.2719],[-0.425,0.567,0.2762,-1.0874],[-0.6737,0.1136,-1.4784,0.525]])3、describe(函數(shù)對(duì)于數(shù)據(jù)的快速統(tǒng)計(jì)匯總:ABCDcount6.0000006.0000006.0000006.000000mean0.073711-0.431125-0.687758-0.233103std0.8431570.9228180.7798870.973118min-0.861849-2.104569-1.509059-1.13563225%-0.611510-0.600794-1.368714-1.07661050%0.022070-0.228039-0.767252-0.38618875%0.6584440.041933-0.0343260.461706max1.2121120.5670200.2762321.071804In[19]:df.describe()Out[In[19]:df.describe()Out[19]:In[20]:df.TOut[20]:2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-06A 0.4691121.212112-0.8618490.721555-0.424972-0.673690B -0.282863-0.173215-2.104569-0.7067710.5670200.113648C -1.5090590.119209-0.494929-1.0395750.276232-1.478427D -1.135632-1.0442361.0718040.271860-1.0874010.5249885、按軸進(jìn)行排序DCBA2013-01-01-1.135632-1.509059-0.2828630.4691122013-01-02-1.0442360.119209-0.1732151.2121122013-01-031.071804-0.494929-2.104569-0.8618492013-01-040.271860-1.039575-0.7067710.7215552013-01-05-1.0874010.2762320.567020-0.4249722013-01-060.524988-1.4784270.113648-0.673690In[21]:df.sort_index(axis=1In[21]:df.sort_index(axis=1,ascending=False)Out[21]:In[22]:df.sort_values(by='B')Out[22]:ABCD2013-01-03-0.861849-2.104569-0.4949291.0718042013-01-040.721555-0.706771-1.0395750.2718602013-01-010.469112-0.282863-1.509059-1.1356322013-01-021.212112-0.1732150.119209-1.0442362013-01-06-0.6736900.113648-1.4784270.5249882013-01-05-0.4249720.5670200.276232-1.087401三、選擇Python/Numpy的選擇和設(shè)置表達(dá)式都能夠直接派上用場(chǎng),但是作為工pandas數(shù)據(jù)訪問方式:.at,.iat,.loc,.iloc和.ix。詳情請(qǐng)參閱多重索引/高級(jí)索引。獲取1、選擇一個(gè)單獨(dú)的列,這將會(huì)返回一個(gè)Series,等同于df.A:2013-01-010.4691122013-01-021.2121122013-01-03-0.8618492013-01-040.7215552013-01-05-0.4249722013-01-06-0.673690In[23]:df['A']Out[23]:In[23]:df['A']Out[23]:Freq:D,Name:A,dtype:float64In[24]:df[0:3]Out[24]:In[25]:df['20130102':'20130104']Out[25]:ABCD2013-01-010.469112-0.282863-1.509059-1.1356322013-01-021.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2.104569-0.4949291.071804ABCD2013-01-021.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2.104569-0.4949291.0718042013-01-040.721555-0.706771-1.0395750.271860通過標(biāo)簽選擇1、使用標(biāo)簽來獲取一個(gè)交叉的區(qū)域In[In[26]:df.loc[dates[0]]Out[26]:A 0.469112B -0.282863C -1.509059D -1.135632Name:2013-01-0100:00:00,dtype:float64In[27]:df.loc[:,[In[27]:df.loc[:,['A','B']]Out[27]:AB2013-01-010.469112-0.2828632013-01-021.212112-0.1732152013-01-03-0.861849-2.1045692013-01-040.721555-0.7067712013-01-05-0.4249720.5670202013-01-06-0.6736900.113648In[28]:df.loc[In[28]:df.loc['20130102':'20130104',['A','B']]Out[28]:AB2013-01-021.212112-0.1732152013-01-03-0.861849-2.1045692013-01-040.721555-0.7067714、對(duì)于返回的對(duì)象進(jìn)行維度縮減In[In[29]:df.loc['20130102',['A','B']]Out[29]:A 1.212112B -0.173215Name:2013-01-0200:00:00,dtype:float645、獲取一個(gè)標(biāo)量In[In[30]:df.loc[dates[0],'A']Out[30]:0.469112299907186286、快速訪問一個(gè)標(biāo)量(與上一個(gè)方法等價(jià))In[In[31]:df.at[dates[0],'A']Out[31]:0.46911229990718628通過位置選擇1、通過傳遞數(shù)值進(jìn)行位置選擇(選擇的是行)In[In[32]:df.iloc[3]Out[32]:A 0.721555B -0.706771C -1.039575D 0.271860Name:2013-01-0400:00:00,dtype:float642、通過數(shù)值進(jìn)行切片,與numpy/python中的情況類似In[In[33]:df.iloc[3:5,0:2]Out[33]:AB2013-01-040.721555-0.7067712013-01-05-0.4249720.5670203、通過指定一個(gè)位置的列表,與numpy/python中的情況類似AC2013-01-021.2121120.1192092013-01-03-0.861849-0.4949292013-01-05-0.4249720.276232In[34]:df.iloc[[In[34]:df.iloc[[1,2,4],[0,2]]Out[34]:In[In[35]:df.iloc[1:3,:]Out[35]:ABCD2013-01-021.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2.104569-0.4949291.071804In[36]:df.iloc[:,In[36]:df.iloc[:,1:3]Out[36]:BC2013-01-01-0.282863-1.5090592013-01-02-0.1732150.1192092013-01-03-2.104569-0.4949292013-01-04-0.706771-1.0395752013-01-050.5670200.2762322013-01-060.113648-1.4784276In[In[37]:df.iloc[1,1]Out[37]:-0.17321464905330858快速訪問標(biāo)量(等同于前一個(gè)方法):In[In[38]:df.iat[1,1]Out[38]:-0.17321464905330858布爾索引In[39]:df[df.A>In[39]:df[df.A>0]Out[39]:ABCD2013-01-010.469112-0.282863-1.509059-1.1356322013-01-021.212112-0.1732150.119209-1.0442362013-01-040.721555-0.706771-1.0395750.271860In[40]:df[df>0]Out[40In[40]:df[df>0]Out[40]:ABCD2013-01-010.469112NaNNaNNaN2013-01-021.212112NaN0.119209NaN2013-01-03NaNNaNNaN1.0718042013-01-040.721555NaNNaN0.2718602013-01-05NaN0.5670200.276232NaN2013-01-06NaN0.113648NaN0.5249883、使用isin()方法來過濾:Out[43]:ABCDE2013-01-010.469112-0.282863-1.509059-1.135632one2013-01-021.212112-0.1732150.119209-1.044236one2013-01-03-0.861849-2.104569-0.4949291.071804two2013-01-040.721555-0.706771-1.0395750.271860three2013-01-05-0.4249720.5670200.276232-1.087401four2013-01-06-0.6736900.113648-1.4784270.524988threeIn[41In[41]:df2=df.copy()In[42]:df2['E']=['one','one','two','three','four','three']In[43]:df2In[44]:df2[df2['E'].isin(['two','four'])]Out[44]:A B C2013-01-03-0.861849-2.104569-0.494929D1.0718042013-01-05-0.4249720.5670200.276232-1.087401Etwofour1、設(shè)置一個(gè)新的列:In[In[45]:s1=pd.Series([1,2,3,4,5,6],index=pd.date_range('20130102',periods=6))In[46]:s1Out[46]:2013-01-032013-01-042013-01-052013-01-062013-01-07123456Freq:D,dtype:int64In[47]:df['F']=s12In[In[48]:df.at[dates[0],'A']=03In[In[49]:df.iat[0,1]=04、通過一個(gè)numpy數(shù)組設(shè)置一組新值:In[In[50]:df.loc[:,'D']=np.array([5]*len(df))上述操作結(jié)果如下:ABCDF2013-01-010.0000000.000000-1.5090595NaN2013-01-021.212112-0.1732150.11920951.02013-01-03-0.861849-2.104569-0.49492952.02013-01-040.721555-0.706771-1.03957553.02013-01-05-0.4249720.5670200.27623254.02013-01-06-0.6736900.113648-1.47842755.0In[51]:dfOut[51]:In[51]:dfOut[51]:In[52]:df2=df.copy()In[53]:df2[df2>0]=-df2In[54]:df2Out[54]:ABCDF2013-01-010.0000000.000000-1.509059-5NaN2013-01-02-1.212112-0.173215-0.119209-5-1.02013-01-03-0.861849-2.104569-0.494929-5-2.02013-01-04-0.721555-0.706771-1.039575-5-3.02013-01-05-0.424972-0.567020-0.276232-5-4.02013-01-06-0.673690-0.113648-1.478427-5-5.0四、缺失值處理在pandas中,使用np.nan來代替缺失值,這些值將默認(rèn)不會(huì)包含在計(jì)算中,詳情請(qǐng)參閱:缺失的數(shù)據(jù)。1、reindex()方法可以對(duì)指定軸上的索引進(jìn)行改變/增加/刪除操作,這將返回原始數(shù)據(jù)的一個(gè)拷貝:ABCDFE2013-01-010.0000000.000000-1.5090595NaN1.02013-01-021.212112-0.1732150.11920951.01.02013-01-03-0.861849-2.104569-0.49492952.0NaN2013-01-040.721555-0.706771-1.03957553.0NaNIn[55]:df1=df.reindex(index=dates[0In[55]:df1=df.reindex(index=dates[0:4],columns=list(df.columns)+['E'])In[56]:df1.loc[dates[0]:dates[1],'E']=1In[57]:df1Out[57]:In[58]:df1.dropna(how='any')Out[58]:ABCDFE2013-01-021.212112-0.1732150.11920951.01.0In[59]:df1.fillna(value=In[59]:df1.fillna(value=5)Out[59]:ABCDFE2013-01-010.0000000.000000-1.50905955.01.02013-01-021.212112-0.1732150.11920951.01.02013-01-03-0.861849-2.104569-0.49492952.05.02013-01-040.721555-0.706771-1.03957553.05.04、對(duì)數(shù)據(jù)進(jìn)行布爾填充:ABCDFE2013-01-01FalseFalseFalseFalseTrueFalse2013-01-02FalseFalseFalseFalseFalseFalse2013-01-03FalseFalseFalseFalseFalseTrue2013-01-04FalseFalseFalseFalseFalseTruen[60n[60]:pd.isnull(df1)Out[60]:詳情請(qǐng)參與基本的二進(jìn)制操作統(tǒng)計(jì)(相關(guān)操作通常情況下不包括缺失值)1、執(zhí)行描述性統(tǒng)計(jì):In[In[61]:df.mean()Out[61]:A -0.004474B -0.383981C -0.687758D 5.000000F 3.000000dtype:float642、在其他軸上進(jìn)行相同的操作:2013-01-010.8727352013-01-021.4316212013-01-030.7077312013-01-041.3950422013-01-051.8836562013-01-061.592306In[62]:df.mean(1)Out[62In[62]:df.mean(1)Out[62]:Freq:D,dtype:float64Freq:D,dtype:float64In[65]:df.sub(s,axis='index')Out[65]:In[63]:s=pd.Series([1,3,5,np.nan,6,8],index=dates).shift(2)In[64]:sOut[64]:2013-01-01NaN2013-01-02NaN2013-01-031.02013-01-043.02013-01-055.02013-01-06NaNABCDF2013-01-01NaNNaNNaNNaNNaN2013-01-02NaNNaNNaNNaNNaN2013-01-03-1.861849-3.104569-1.4949294.01.02013-01-04-2.278445-3.706771-4.0395752.00.02013-01-05-5.424972-4.432980-4.7237680.0-1.02013-01-06NaNNaNNaNNaNNaNApplyIn[66]:df.apply(np.cumsum)In[66]:df.apply(np.cumsum)Out[66]:In[67]:df.apply(lambdax:x.max()-x.min())Out[67]:dtype:float64ABCDF2013-01-010.0000000.000000-1.5090595NaN2013-01-021.212112-0.173215-1.389850101.02013-01-030.350263-2.277784-1.884779153.02013-01-041.071818-2.984555-2.924354206.02013-01-050.646846-2.417535-2.6481222510.02013-01-06-0.026844-2.303886-4.1265493015.0A2.073961B2.671590C1.785291D0.000000F4.000000直方圖具體請(qǐng)參照:直方圖和離散化。04122132465464768494dtype:int6445622211In[68In[68]:s=pd.Series(np.random.randint(0,7,size=10))In[69]:sOut[69]:In[70]:s.value_counts()Out[70]:dtype:int64Series對(duì)象在其str屬性中配備了一組字符串處理方法,可以很容易的應(yīng)用到數(shù)組中的每個(gè)元素,如下段代碼所示。更多詳情請(qǐng)參考:字符串向量化方法。In[In[71]:s=pd.Series(['A','B','C','Aaba','Baca',np.nan,'CABA','dog','cat'])In[72]:s.str.lower()Out[72]:abcaababacaNaNcabadogcatdtype:object六、合并Pandas提供了大量的方法能夠輕松的對(duì)Series,DataFrame和Panel對(duì)象進(jìn)行各種符合各種邏輯關(guān)系的合并操作。具體請(qǐng)參閱:合并。ConcatIn[73]:df=pd.DataFrame(np.random.randn(10,4))In[74]:dfOut[74]:01230-0.5487021.467327-1.015962-0.48307511.637550-1.217659-0.291519-1.7455052-0.2639520.991460-0.9190690.2660463-0.7096611.6690521.037882-1.7057754-0.919854-0.0423791.247642-0.00992050.2902130.4957670.3629491.5481066-1.131345-0.0893290.337863-0.9458677-0.9321321.9560300.017587-0.0166928-0.5752470.254161-1.1437040.21589791.193555-0.077118-0.408530-0.862495#breakitintopiecesIn[75]:pieces=[df[:3],df[3:7],df[7:]]In[76]:pd.concat(pieces)Out[76]:01230-0.5487021.467327-1.015962-0.48307511.637550-1.217659-0.291519-1.7455052-0.2639520.991460-0.9190690.2660463-0.7096611.6690521.037882-1.7057754-0.919854-0.0423791.247642-0.00992050.2902130.4957670.3629491.5481066-1.131345-0.0893290.337863-0.9458677-0.9321321.9560300.017587-0.0166928-0.5752470.254161-1.1437040.21589791.193555-0.077118-0.408530-0.862495Join類似于SQL類型的合并,具體請(qǐng)參閱:數(shù)據(jù)庫風(fēng)格的連接In[In[81]:pd.merge(left,right,on='key')Out[81]:keylvalrvalfoo 1 4foo 1 5foo 2 4foo 2 5In[80]:rightOut[80]:keyrvalfoo 4foo 5In[79]:leftOut[79]:keylvalfoo 1foo 2In[78]:right=pd.DataFrame({'key':['foo','foo'],'rval':[4,5]})In[77]:left=pd.DataFrame({'key':['foo','foo'],'lval':[1,2]})另一個(gè)例子:In[In[86]:pd.merge(left,right,on='key')Out[86]:keylvalrvalfoo 1 4bar 2 5In[85]:rightOut[85]:keyrvalfoo 4bar 5In[84]:leftOut[84]:keylvalfoo 1bar 2In[83]:right=pd.DataFrame({'key':['foo','bar'],'rval':[4,5]})In[82]:left=pd.DataFrame({'key':['foo','bar'],'lval':[1,2]})Append將一行連接到一個(gè)DataFrame上,具體請(qǐng)參閱附加:In[87]:df=pd.DataFrame(np.random.randn(8,4),columns=['A','B','C','D'])In[88]:dfOut[88]:ABCD01.3460611.5117631.627081-0.9905821-0.4416521.2115260.2685200.0245802-1.5775850.396823-0.105381-0.53253231.4537491.208843-0.080952-0.2646104-0.7279650.339969-0.6932055-0.3393550.5936160.8843451.59143160.1418090.2203900.4355890.1924517-0.0967010.8033511.715071-0.708758In[89]:s=df.iloc[3]In[90]:df.append(s,ignore_index=True)Out[90]:ABCD01.3460611.5117631.627081-0.9905821-0.4416521.2115260.2685200.0245802-1.5775850.396823-0.105381-0.53253231.4537491.208843-0.080952-0.2646104-0.727965-0.5893460.339969-0.6932055-0.3393550.5936160.8843451.59143160.1418090.2203900.4355890.1924517-0.0967010.8033511.715071-0.70875881.4537491.208843-0.080952-0.264610七、分組對(duì)于”groupby”操作,我們通常是指以下一個(gè)或多個(gè)操作步驟:(Splitting)按照一些規(guī)則將數(shù)據(jù)分為不同的組;對(duì)于每組數(shù)據(jù)分別執(zhí)行一個(gè)函數(shù);將結(jié)果組合到一個(gè)數(shù)據(jù)結(jié)構(gòu)中;詳情請(qǐng)參閱:GroupingsectionIn[91]:::df=pd.DataFrame({'A''B'::['foo','foo',['one','bar','bar','one','foo','foo','two','bar','foo'],'three',:,'two','two','one','three']:::'C':'D':np.random.randn(8),np.random.randn(8)})In[92]:dfOut[92]:ABCD0fooone-1.202872-0.0552241barone-1.8144702.3959852footwo1.0186011.5528253barthree-0.5954470.1665994footwo1.3954330.0476095bartwo-0.392670-0.1364736fooone0.007207-0.5617577foothree1.928123-1.6230331、分組并對(duì)每個(gè)分組執(zhí)行sum函數(shù):In[In[93]:df.groupby('A').sum()Out[93]:C DAbar-2.8025882.42611foo3.146492-0.639582、通過多個(gè)列進(jìn)行分組形成一個(gè)層次索引,然后執(zhí)行函數(shù):ABCDbarone-1.8144702.395985three-0.5954470.166599two-0.392670-0.136473fooone-1.195665-0.616981three1.928123-1.623033two2.4140341.600434In[94In[94]:df.groupby(['A','B']).sum()Out[94]:詳情請(qǐng)參閱層次索引和改變形狀。StackIn[97In[97]:df=pd.DataFrame(np.random.randn(8,2),index=index,cIn[96]:index=pd.MultiIndex.from_tuples(tuples,names=['first','second'])In[95]:tuples=list(zip(*[['bar','bar','baz','baz',: 'foo','foo','qux','qux'],: ['one','two','one','two',: 'one','two','one','two']])):olumns=['A','B'])In[98]:df2=df[:4]In[99]:df2Out[99]:ABfirstsecondbar one0.029399-0.542108two0.282696-0.087302baz one-1.5751701.771208two0.8164821.100230In[100]:stackedIn[101]:stackedOut[101]:firstsecond=df2.stack()bar one A0.029399B-0.542108two A0.282696B-0.087302baz one A-1.575170B1.771208two A0.816482B1.100230dtype:float64ABfirstsecondbarone0.029399-0.542108two0.282696-0.087302bazone-1.5751701.771208two0.8164821.100230barA0.0293990.282696B-0.542108-0.087302bazA-1.5751700.816482B1.7712081.100230firstsecondbarbazoneA0.029399-1.575170B-0.5421081.771208twoA0.2826960.816482B-0.0873021.100230In[102In[102]:stacked.unstack()Out[102]:In[103]:stacked.unstack(1)Out[103]:secondfirstonetwoIn[104]:stacked.unstack(0)Out[104]:詳情請(qǐng)參閱:數(shù)據(jù)透視表.In[In[105]:df=pd.DataFrame({'A':['one','one','two','three']*3,::'bar','bar']*2,:::'B':['A','B','C']*4,'C':['foo','foo','foo','bar','D':np.random.randn(12),'E':np.random.randn(12)})In[106]:dfOut[106]:AoneonetwothreeoneonetwothreeoneonetwothreeBABCABCABCABCCfooDE012345678910111.418757-0.179666foo-1.8790241.291836foobar0.536826-0.0096141.006160bar-0.0297160.3921490.264599bar-1.146178-0.057409foo0.100900-1.425638foo-1.0350181.024098foo0.314665-0.106062bar-0.773723bar-1.170653bar0.6487401.8243750.5959741.167115可以從這個(gè)數(shù)據(jù)中輕松的生成數(shù)據(jù)透視表:CbarfooABoneA-0.7737231.418757B-0.029716-1.879024C-1.1461780.314665threeA1.006160NaNBNaN-1.035018C0.648740NaNtwoANaN0.100900B-1.170653NaNCNaN0.536826In[107In[107]:pd.pivot_table(df,values='D',index=['A','B'],columns=['C'])Out[107]:Pandas在對(duì)頻率轉(zhuǎn)換進(jìn)行重新采樣時(shí)擁有簡(jiǎn)單、強(qiáng)大且高效的功能(樣的數(shù)據(jù)轉(zhuǎn)換為按5分鐘為單位進(jìn)行采樣的數(shù)據(jù))。這種操作在金融領(lǐng)域非常常見。具體參考:時(shí)間序列。In[In[108]:rng=pd.date_range('1/1/2012',periods=100,freq='S')In[109]:ts=pd.Series(np.random.randint(0,500,len(rng)),index=rng)In[110]:ts.resample('5Min').sum()Out[110]:2012-01-01 25083Freq:5T,dtype:int641、時(shí)區(qū)表示:2012-03-0600:00:00+00:000.4640002012-03-0700:00:00+00:000.2273712012-03-0800:00:00+00:00-0.4969222012-03-0900:00:00+00:000.3063892012-03-1000:00:00+00:00-2.290613In[111]:rng=pd.date_range('3/6/201200:00'In[111]:rng=pd.date_range('3/6/201200:00',periods=5,freq='D')In[112]:ts=pd.Series(np.random.randn(len(rng)),rng)In[113]:tsOut[113]:2012-03-06 0.4640002012-03-07 0.2273712012-03-08 -0.4969222012-03-09 0.3063892012-03-10 -2.290613Freq:D,dtype:float64In[114]:ts_utc=ts.tz_localize('UTC')In[115]:ts_utcOut[115]:Freq:D,dtype:float64In[116]:ts_utc.tz_convert('US/Eastern')Out[116]:Freq:D,dtype:float642012-03-0519:00:00-05:000.4640002012-03-0619:00:00-05:000.2273712012-03-0719:00:00-05:00-0.4969222012-03-0819:00:00-05:000.3063892012-03-0919:00:00-05:00-2.2906133、時(shí)間跨度轉(zhuǎn)換:In[117]:rng=pd.date_range('1/1/2012',periods=5,freq='M')In[118]:ts=pd.Series(np.random.randn(len(rng)),index=rng)In[119]:tsOut[119]:2012-01-31-1.1346232012-02-29-1.5618192012-03-31-0.2608382012-04-300.2819572012-05-311.523962Freq:M,dtype:float64In[120]:ps=ts.to_period()In[121]:psOut[121]:2012-01 -1.1346232012-02 -1.5618192012-03 -0.2608382012-04 0.2819572012-05 1.523962Freq:M,dtype:float64In[122]:ps.to_timestamp()Out[122]:2012-01-01-1.1346232012-02-01-1.5618192012-03-01-0.2608382012-04-010.2819572012-05-011.523962Freq:MS,dtype:float644、時(shí)期和時(shí)間戳之間的轉(zhuǎn)換使得可以使用一些方便的算術(shù)函數(shù)。In[125]:ts.index=(prng.asfreq('M','e')+1).asfreq('H','s')+9In[126]:ts.head()Out[126]:1990-03-0109:00-0.9029371990-06-0109:000.0681591990-09-0109:00-0.0578731990-12-0109:00-0.3682041991-03-0109:00-1.144073Freq:H,dtype:float64In[124Freq:H,dtype:float64In[124]:ts=pd.Series(np.random.randn(len(prng)),prng)In[123]:prng=pd.period_range('1990Q1','2000Q4',freq='Q-NOV')從0.15版本開始,pandas可以在DataFrame中支持Categorical類型的數(shù)據(jù),詳細(xì)介紹參看:Categorical簡(jiǎn)介和APIdocumentation。In[In[127]:df=pd.DataFrame({"id":[1,2,3,4,5,6],"raw_grade":['a','b','b','a','a','e']})1、將原始的grade轉(zhuǎn)換為Categorical數(shù)據(jù)類型:In[In[128]:df["grade"]=df["raw_grade"].astype("category")In[129]:df["grade"]Out[129]:abbaaeName:grade,dtype:categoryCategories(3,object):[a,b,e]2、將Categorical類型數(shù)據(jù)重命名為更有意義的名稱:In[In[130]:df["grade"].cat.categories=["verygood","good","verybad"]In[131]:df[In[131]:df["grade"]=df["grade"].cat.set_categories(["verybad","bad","medium","good","verygood"])Name:grade,dtype:categoryCategories(5,object):[verybad,bad,medium,good,verygood]In[132]:df["grade"]Out[132]:0verygood1good2good3verygood4verygood5verybad4、排序是按照Categorical的順序進(jìn)行的而不是按照字典順序進(jìn)行:idraw_gradegrade56everybad12bgood23bgood01averygood34averygood45averygoodIn[133]:df.sort_values(by="grade")Out[In[133]:df.sort_values(by="grade")Out[133]:In[In[134]:df.groupby("grade").size()Out[134]:gradeverybad 1bad 0medium 0good 2verygood 3dtype:int64十一、畫圖具體文檔參看:繪圖文檔。In[In[135]:ts=pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000',periods=1000))In[136]:ts=ts.cumsum()In[137]:ts.plot()Out[137]:<matplotlib.axes._subplots.AxesSubplotat0x7ff2ab2af550>對(duì)于DataFrame來說,plot是一種將所有列及其標(biāo)簽進(jìn)行繪制的簡(jiǎn)便方法:In[In[138]:df=pd.DataFrame(np.random.randn(1000,4),index=ts.index,: columns=['A','B','C','D']):In[139]:df=df.cumsum()In[140]:plt.figure();df.plot();plt.legend(loc='best')Out[140]:<matplotlib.legend.Legendat0x7ff29c8163d0>十二、導(dǎo)入和保存數(shù)據(jù)CSV參考:寫入CSV文件。1、寫入csv文件:In[In[141]:df.to_csv('foo.csv')2、從csv文件中讀?。篣nnamed:0ABCD02000-01-010.266457-0.399641-0.219582101-02-1.170732-0.3458731.653061-0.28295322000-01-03-1.7349330.5304682.060811-0.51553632000-01-04-1.5551211.4526200.239859-101-050.5781170.5113710.103552-2.42820252000-01-060.4783440.449933-0.741620-1.96240962000-01-071.235339-0.091757-1.543861-1.0847539932002-09-20-10.628548-9.153563-7.88314628.3139409942002-09-21-10.390377-8.727491-6.39964530.9141079952002-09-22-8.985362-8.485624-4.66946231.3677409962002-09-23-9.558560-8.781216-4.49981530.5184399972002-09-24-9.902058-9.340490-4.38663930.1055939982002-09-25-10.216020-9.480682-3.93380229.7585609992002-09-26-11.856774-10.671012-3.21602529.369368In[142In[142]:pd.read_csv('foo.csv')Out[142]:[1000rowsx5columns]參考:HDF5存儲(chǔ)1、寫入HDF5存儲(chǔ):In[In[143]:df.to_hdf('foo.h5','df')2、從HDF5存儲(chǔ)中讀取:ABCD2000-01-010.266457-0.399641-0.2195821.1868602000-01-02-1.170732-0.3458731.653061-0.2829532000-01-03-1.7349330.5304682.060811-0.5155362000-01-04-1.5551211.4526200.239859-1.1568962000-01-050.5781170.5113710.103552-2.4282022000-01-060.4783440.449933-0.741620-1.9624092000-01-071.235339-0.091757-1.543861-1.0847532002-09-20-10.628548-9.153563-7.88314628.3139402002-09-21-10.390377-8.727491-6.39964530.9141072002-09-22-8.985362-8.485624-4.66946231.3677402002-09-23-9.558560-8.781216-4.49981530.5184392002-09-24-9.902058-9.340490-4.38663930.1055932002-09-25-10.216020-9.480682-3.93380229.7585602002-09-26-11.856774-10.671012-3.21602529.369368In[144In[144]:pd.read_hdf('foo.h5','df')Out[144]:[1000rowsx4columns]參考:MSExcel1、寫入excel文件:In[In[145]:df.to_excel('foo.xlsx',sheet_name='Sheet1')2、從excel文件中讀取:ABCD2000-01-010.266457-0.399641-0.2195821.1868602000-01-02-1.170732-0.3458731.653061-0.2829532000-01-03-1.7349330.5304682.060811-0.5155362000-01-04-1.5551211.4526200.239859-1.1568962000-01-050.5781170.5113710.103552-2.4282022000-01-060.4783440.449933-0.741620-1.9624092000-01-071.235339-0.091757-1.543861-1.0847532002-09-20-10.628548-9.153563-7.88314628.3139402002-09-21-10.390377-8.727491-6.39964530.9141072002-09-22-8.985362-8.485624-4.66946231.3677402002-09-23-9.558560-8.781216-4.49981530.5184392002-09-24-9.902058-9.340490-4.38663930.1055932002-09-25-10.216020-9.480682-3.93380229.7585602002-09-26-11.856774-10.671012-3.21602529.369368In[146In[146]:pd.read_excel('foo.xlsx','Sheet1',index_col=None,na_values=['NA'])Out[146]:[1000rowsx4columns]如果你嘗試某個(gè)操作并且看到如下異常:>>>>>>ifpd.Series([False,True,False]):print("Iwastrue")Traceback...ValueError:Thetruthvalueofanarrayisambiguous.Usea.empty,a.any()ora.all().解釋及處理方式請(qǐng)見比較。同時(shí)請(qǐng)見陷阱。Pandas秘籍Pandas秘籍PAGE45PAGE45Pandas秘籍原文:Pandascookbook譯者:飛龍第一章第一章PAGE46PAGE46第一章原文:Chapter1譯者:飛龍協(xié)議:CCBY-NC-SA4.0importimportpandasaspdpd.set_option('display.mpl_style','default')#使圖表漂亮一些figsize(15,5)CSV文件中讀取數(shù)據(jù)您可以使用read_csv函數(shù)從CSV文件讀取數(shù)據(jù)。默認(rèn)情況下,它假定字段以逗號(hào)分隔。我們將從蒙特利爾(Montréal)原始頁面(法語),2012年的數(shù)據(jù)。這個(gè)數(shù)據(jù)集是一個(gè)列表,蒙特利爾的7個(gè)不同的自行車道上每天有多少人。broken_df=pd.read_csv(broken_df=pd.read_csv('../data/bikes.csv')In[3]:#查看前三行broken_df[:3]Date;Berri1;Br?beuf(donn?esnondisponibles);C?te-Sainte-Catherine;Maisonneuve1;Maisonneuve2;duParc;Pierre-Dupuy;Rachel1;St-Urbain(donn?esnondisponibles)001/01/2012;35;;0;38;51;26;10;16;102/01/2012;83;;1;68;153;53;6;43;203/01/2012;135;;2;104;248;89;3;58;你可以看到這完全損壞了。read_csv擁有一堆選項(xiàng)能夠讓我們修復(fù)它,在這里我們:將列分隔符改成;將編碼改為latin1(默認(rèn)為utf-8)解析Date列中的日期告訴它我們的日期將日放在前面,而不是月將索引設(shè)置為Datefixed_df=pd.read_csv(fixed_df=pd.read_csv('../data/bikes.csv',sep=';',encoding='latin1',parse_dates=['Date'],dayfirst=True,index_col='Date')fixed_df[:3]Berri1Brébeuf(donnéesnondisponibles)C?te-Sainte-Maisonneuve12Date2012-01-0135NaN038512012-01-0283NaN1681532012-01-03135NaN2104248選擇一列當(dāng)你讀取CSV時(shí),你會(huì)得到一種稱為DataFrame的對(duì)象,它由行和列組成。您從數(shù)據(jù)框架中獲取列的方式與從字典中獲取元素的方式相同。這里有一個(gè)例子:fixed_df[fixed_df['Berri1']Date2012-01-01352012-01-02832012-01-031352012-01-041442012-01-051972012-01-061462012-01-07982012-01-08952012-01-092442012-01-103972012-01-112732012-01-121572012-01-13752012-01-14322012-01-1554...2012-10-2236502012-10-2341772012-10-2437442012-10-2537352012-10-2642902012-10-2718572012-10-2813102012-10-2929192012-10-3028872012-10-3126342012-11-0124052012-11-0215822012-11-038442012-11-049662012-11-052247Name:Berri1,Length:310,dtype:int64繪制一列只需要在末尾添加.plot(),再容易不過了。我們可以看到,沒有什么意外,一月、二月和三月沒有什么人騎自行車。fixed_df['Berri1'].plot()fixed_df['Berri1'].plot()<matplotlib.axes.AxesSubplotat0x3ea1490>如果對(duì)騎自行車的人來說是一個(gè)糟糕的一天,任意地方都是糟糕的一天。fixed_df.plot(figsize=(15fixed_df.plot(figsize=(15,10))<matplotlib.axes.AxesSubplotat0x3fc2110>將它們放到一起下面是我們的所有代碼,我們編寫它來繪制圖表:df=pd.read_csv('../data/bikes.csv'df=pd.read_csv('../data/bikes.csv',sep=';',encoding='latin1',parse_dates=['Date'],dayfirst=True,index_col='Date')df['Berri1'].plot()<matplotlib.axes.AxesSubplotat0x4751750>第二章第二章PAGE59PAGE59第二章原文:Chapter2譯者:飛龍協(xié)議:CCBY-NC-SA4.0##通常的開頭importpandasaspd#使圖表更大更漂亮pd.set_option('display.mpl_style','default')pd.set_option('display.line_width',5000)pd.set_option('display.max_columns',60)figsize(15,5)我們將在這里使用一個(gè)新的數(shù)據(jù)集,來演示如何處理更大的數(shù)據(jù)集。這是來自NYCOpenData的311個(gè)服務(wù)請(qǐng)求的子集。complaints=pd.read_csv(complaints=pd.read_csv('../data/311-service-requests.csv')里面究竟有什么?(總結(jié))包括所有列,以及每列中有多少非空值。complaints<class'pandas.core.frame.DataFrame'>Int64Index:111069entries,0to111068Datacolumns(total52columns):complaints<class'pandas.core.frame.DataFrame'>Int64Index:111069entries,0to111068Datacolumns(total52columns):UniqueKey 111069non-nullvaluesCreatedDate 111069non-nullvaluesClosedDate 60270non-nullvaluesAgency111069non-nullvaluesAgencyName111069non-nullvaluesComplaintType111069non-nullvaluesDescriptor111068non-nullvaluesLocationType79048non-nullvaluesIncidentZip98813non-nullvaluesIncidentAddress84441non-nullvaluesStreetName84438non-nullvaluesCrossStreet184728non-nullvaluesCrossStreet284005non-nullvaluesIntersectionStreet119364non-nullvaluesIntersectionSt
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 銷售資產(chǎn)合同范本
- 墓碑雕刻合同范本
- Troriluzole-hydrochloride-BHV-4157-hydrochloride-生命科學(xué)試劑-MCE
- 單位合同范本6
- 1-3-Diarachidoyl-glycerol-生命科學(xué)試劑-MCE
- 基坑維護(hù)合同范本
- 科技在影視產(chǎn)業(yè)中的運(yùn)用與影響
- 電影制作與發(fā)布的商務(wù)流程和禮節(jié)解析
- 電子商務(wù)盈利策略全解析
- 現(xiàn)代辦公樓的綠建設(shè)計(jì)與實(shí)踐案例分析報(bào)告
- 2025年山東鋁業(yè)職業(yè)學(xué)院高職單招職業(yè)技能測(cè)試近5年??及鎱⒖碱}庫含答案解析
- 2024年湖南汽車工程職業(yè)學(xué)院?jiǎn)握新殬I(yè)技能測(cè)試題庫標(biāo)準(zhǔn)卷
- (正式版)HGT 6313-2024 化工園區(qū)智慧化評(píng)價(jià)導(dǎo)則
- 二級(jí)公立醫(yī)院績(jī)效考核三級(jí)手術(shù)目錄(2020版)
- 高中化學(xué)競(jìng)賽培訓(xùn)計(jì)劃
- 研發(fā)向善課程----綻放生命異彩
- 電廠機(jī)組深度調(diào)峰摸底試驗(yàn)方案
- 地球上的大氣知識(shí)結(jié)構(gòu)圖
- 加油站數(shù)質(zhì)量管理考核辦法版.doc
- 華文版四年級(jí)下冊(cè)全冊(cè)書法教案
- 最新整理自動(dòng)化儀表專業(yè)英語詞匯只是分享
評(píng)論
0/150
提交評(píng)論