版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
Meta公司文本生成視頻技術(shù)手冊(cè)預(yù)測(cè)能力是一項(xiàng)具有廣泛應(yīng)用的核心科學(xué)挑戰(zhàn)(LLM(Dubey2024Touvron2023BrownGen1所示。VideoFPS
\hhttps://go.fb.me/MovieGen?Figure1到。LumaLabs(LumaLabs,2024)OpenAISora(OpenAI,2024在整體視頻質(zhì)量上6MovieGenEditMovieLabs(ElevenLabs)303節(jié)中介紹了模型架構(gòu)和訓(xùn)練細(xì)節(jié)。SFT在第4節(jié)中概述了個(gè)性化的后期訓(xùn)練策略。\h節(jié)和視頻轉(zhuǎn)視頻編輯(5等其他功能。(TAE)于變分自動(dòng)編碼器(Kingma,2013)將形狀為T0 W0的輸入像素空間視頻V壓縮為形狀為T 簡(jiǎn)化我們的模型。 )TAE dT0/8eT0)”)X罰損失(OPL)如下:LOPL(Xr=
a(a
(b(b(b)16(2的重疊圖塊i和i1xj =jhwjxj+1wjxji+1i
n(1t2,]0N,1)tXtt=1的tt。Xt2023)Xt=tX1+(1
分鐘
=X1
X0PEt,X0,X1,Pku(Xt,P,t;?)Vtk2 2024 ODEN量身定制3.4.2節(jié)所述。(SNR)SNR2024RMNSNorm(ZhangSennrich,2019SwiGLU(Shazeer,2020)LLaMa3TransformerP。LLaMa3表1MovieGenVideo30B參數(shù)基礎(chǔ)模型的架構(gòu)超參數(shù)Transformer(Vaswani等人,2017年)我們緊密遵循LLaMa3(Dubey等人,2024年設(shè)計(jì)空間Transformer本身中包含30B參數(shù)TAE等。PUL2Long?promptMetaCLIP?UL2Longprompt)2021x(1080x142023)GenVideo的模型并行方法。)TDP在一臺(tái)服務(wù)器中8GPUNVSwitchesGPU400GbpsRoCERDMANICMeta 256 8通過patchification壓縮為2 (73728,(1280,(1472,(73728,48,(1,(73728,(73728,圖8MovieGenVideoTransformer主?和應(yīng)用的模型并行性Transformer主?并用顏色編碼了用于分片我們的30B(在第3.1.6的不同模型并行化MovieGenVideo768像素視頻輸入長(zhǎng)度為73K個(gè)標(biāo)記我們采用了完全分片數(shù)據(jù)并行(Rajbhandari2020Ren2021Zhao2023張量并行(Shoeybi2019Narayanan2021序列并行(Li2021Korthikanti2023和上Transformer主?的不同部分中使用(如圖8所示)(TP)沿列或行對(duì)線性層的權(quán)重進(jìn)行分片GPU執(zhí)行的運(yùn)算量(FLOP)更少tp)建立在之上例如LayerNorm(CP)允許對(duì)序列維度進(jìn)行部分分片)softmaxattentionCPCPLLaMa3GQA量(例如LLaMa370B8)。GPU1) 12162116秒?12秒24?164秒8秒32?,2024)20242022我們合并重復(fù)的聚類?方根倒數(shù)從每個(gè)合并的聚類中抽樣剪輯(Mahajan2018)。2024B.2) 576縱向?yàn)?76 )FPS1.736O(100)M2.212111690O(100)MO(100)M(bs/GPU*#GPU)/(TP*(T2V從低分辨率(PE·256pxT2I/V15366e5·768pxT2I/V10k2024(12022(2?NNNN(3(((4手o(B.2)用余弦學(xué)習(xí)率調(diào)度程序(LoshchilovandHutter2017)與預(yù)訓(xùn)練階段類似1616FPS10.61624FPS1016
(a(b50輸入/輸出的最大變化發(fā)生在早期時(shí)間步驟25252024199119802024 20。2525N251000252024在本節(jié)中MovieGenVideo(1(2)(34)MovieBench2024感興趣概念的提示1人類活動(dòng)(肢體和嘴巴運(yùn)動(dòng))23自然和風(fēng)景4物理((b(a (c(a(c)5跨越圖11中的概念。(1(2Movie(3(42023Singeretal.2023Girdharetal.2024Hoetal.2022aBarrattandSharma2018ChongandForsyth2020Geetal.,2024Huangetal.2024)FVD(Unterthineretal.2019)IS(Salimansetal.2016)等自動(dòng)指標(biāo)與人工視頻質(zhì)量評(píng)估分?jǐn)?shù)等人,2021年)。)布我們?yōu)镸ovieGenVideoBench提示集生成的非精選視頻。RunwayLumaLabs(LumaLabs,2024年)、Kling1.5(KlingAI,2024年)我們還與閉源?本轉(zhuǎn)視頻方法(OpenAISora進(jìn)行了比較Gen3LumaLabsOpenAISora13.50.52?10.04?1.99 6MovieGenVideoMovieGenVideoBenchKling1.5的范圍為C.1)%CI)1?2%CIMovie(35.02%Sora(8.23%(1?2Kling1.5(3.87%RunwayGen3,Sora(19.27%(33.1%)(8.22%(8.86%Kling1.5MovieGenVideo(13.5%)3.5.1節(jié)所述MovieKling%和MovieMovieLumaLabs圖12MovieGenVideoMovieGenVideo12Movie\hgo.fb.me/MovieGen?Figure12Luma\h找生成的視頻看起來自然OpenAISora\hFigure14微調(diào)的效果3.3節(jié)中描述的監(jiān)督微調(diào)來進(jìn)一步提高視頻生成質(zhì)量724FPS10.6在這里MovieGenVideo線訓(xùn)練和模型設(shè)置381MovieGenVideoBench(MovieGenVideoBench?Mini) B B LLaMa3DiT18.63(A) 3072(18B進(jìn)行次迭代40962024)Flow8)(+10.7%)Xie,2023的模型進(jìn)行比較(PeeblesXie,2023;OpenAI,2024;Ma2024a)。(18.6%(12.6%LLaMa3LLaMa3等人,2024年)。2017)TAE定量指標(biāo)10TAE8是先前工作中用于逐幀編碼的自動(dòng)編碼器的標(biāo)準(zhǔn)(Blattmann等人,2023a;Girdhar等人,2024)在視頻數(shù)據(jù)上TAE潛在通道大小的增加(8vs.16)(Dai等人,2023)。幀內(nèi)AE0.9348TAE0.93520.887732.16大動(dòng)脈栓塞2.5D3D視頻(512像素圖片(512像素 3.6.4TAE與2.5D模型相比3D模型的成本更低TAE中使用2.5DOPL 視頻(512像素視頻(512像素圖片(512像素生成任務(wù)1024px20230.00001642000(a(b視覺質(zhì)量A/B我們與目前最好的?本轉(zhuǎn)圖像模型進(jìn)行了比較Flux.1(BlackForestLabs,2024OpenAIDall?E3(OpenAI,2024年)、MidjourneyV6.1(Midjourney,2024年和IdeogramV2(Ideogram,2024年ELO19我們展示了我們生成的一些定性結(jié)果。(1)(PT2V)練階段PT2V20191)2)3)(2))Transformer?21中的?ArcFaceIIIPT2V在如第4.2.1節(jié)所述))表(二(bPT2V2024a14所示16%。圖和\h找
\h27.36%13.68%由于缺乏監(jiān)督視頻編輯數(shù)據(jù)MovieGenEdit)我們證明TGVE(Wuetal.2023cSingeretal.2024)上的之前最先進(jìn)技術(shù)(Singeretal2024)相比MovieGenEdit的概率超過74%。節(jié)同寬高比FPS過程如圖24\h我們能夠沿通道維度將潛在視頻輸入與噪聲輸出潛在視頻連接起來EmuEdit(Sheynin;?)對(duì)(ctxtxvid)的?本到視頻數(shù)據(jù)集ctexttovideo(c;cinstructxvid)其中ccinstruct是視頻輸出Vtk2 ctexttovideo 6
25LLaMa3xvid中選擇一個(gè)隨機(jī)幀xframe p?(xframe,cinstruct)我們過使用自動(dòng)圖像編輯指標(biāo)對(duì)生成的數(shù)據(jù)點(diǎn)(ctxtcinstructctxtxframexframe
框框。和
(我
canimatedcvid,cinstruct,xvid)1
(我(ctxt、xvid)1xvid2輸出動(dòng)畫幀cvid編輯指令cinstruct動(dòng)畫編輯幀xvid3生成編輯指令:cinstruct 45xf6
p?(xframe,;,xvid7
xf
xf
(我
xvid{xf13(cvid,cinstruct,xEmuEditSegment(Sheynin2024從圖像擴(kuò)展到視頻2023c2024分類?分類?ctexttovideo5圖26反向翻譯階段?凈的輸入視頻進(jìn)行去噪””階段的模型根據(jù)輸入視頻xvid和編輯指令cinstruct生成編輯后的視頻xvid p?(xvid,cinstruct)之后我們利用ctxt、ctxt、xvidcinstructbwdxvid)MovieGenEditBenchFPS(i)評(píng)估但基準(zhǔn)測(cè)試中的視頻分辨率為480 1016FPS的基礎(chǔ)視頻生成模型(OpenAI,2024;RunwayML,20232024可以以高分辨率(例如768p或1080p)16或更高的FPS多種寬高BenchBench)(ViCLIPdir)(ViCLIPout)(i(ii(iii(iv入有關(guān)輸入或生成的視頻的信息(Wuetal.,2023a;Yatimetal.,2023)。2024Runway)100]EditMovieMovieGenEditEVEEVE在MovieGenEditBenchRunwayGen3V2V和RunwayGen3V2VStyle設(shè)置RunwayGen3V2VStyleSDEditViCLIPout得分較低MovieGenEditSDEditSTDF(Yatim等人,2023年73.7566.60InsV2V(Cheng202490.0794.37ViCLIPdir“ViCLIPoutEVE(Singer2024RunwayGen3V2V(RunwayML,2024年 ViCLIPdir“ViCLIPoutEVE(Singer2024RunwayGen3V2V(RunwayML,2024年 RunwayGen3V2V樣式(RunwayML,2024)55.55SDEdit(Meng100](i)2023a節(jié)次迭代(i(iiStage L1#段(第5.1.2節(jié)表(iControlNet(ii節(jié) 表第二階段5.1.3第節(jié)7061)模型從輸入視頻xvid和原始編輯指令cinstruct預(yù)測(cè)生成的視頻xvid和字幕(5的片段于DAC?VAE輸出。Xt2022)(3)(4)2023)MLP音頻采用潛在擴(kuò)散框架(Rombachetal.,2022)其中數(shù)據(jù)(48kHz)表示為緊湊的一維潛在特征形狀為T C幀速率低得多(25Hz),C=128EncodecDACMetaCLIP(Xu2023MetaCLIP1024)細(xì)信息在第6.2.4128c{cvidcctxctxt},其中cvid2RNaud1024Long?promptMetaCLIP長(zhǎng)度為Ntxt標(biāo)記的?本特征序列u(Xtct2RNaud128(1(V2A(2(TV2A)(4cvidNJdN/nhope個(gè)片段j
n(j
1j
1
表示最后的nctx第二條路線X(j+1
函數(shù)w2
ODEti+1
c(j+1)={c(j+1)c(j+1)c
X
(1周)X(t+1,nctx:
= ti+1,nctx:nwin其中X(j+1)1,0:nctx
=多重?cái)U(kuò)散受其成功利用在512 512圖像上訓(xùn)練的擴(kuò)散模型生成9倍寬(512 4608ti+11)
X(j)ti+1,j)函數(shù)zero?pad(X(j),j)將形狀為nwin )第j個(gè)線段從n(j)跨越到n(j)末端因此n(j)
末尾補(bǔ)零j)=1=))} zero?pad(m(j)窗口函數(shù){m(j)}j。,j)/
j0零填充(m(j0
)
nwin1三角窗函數(shù)(即Barlette窗,
=nwin1?nwin
4,608/18,432)1,53630(750)500K384GPU141e?4的恒定學(xué)習(xí)率5K步線性上升。)15]30]15]1e?40.1bf16AdamW)%到CFG7.02024c真實(shí)的或在后期制作中創(chuàng)建的(即Foley聲音)(Tan等人,2017年)表我們首先從大量數(shù)據(jù)中獲取數(shù)據(jù)AED527Audioset(Gemmekeetal.2017)本體為每個(gè)樣本標(biāo)記音頻事件>>
< )2020)像素)4秒到120秒之間中小時(shí)小時(shí)批對(duì)歐拉表學(xué)(Schuhmann202210表示最低品質(zhì))AED7.0ImageBind分?jǐn)?shù)NWT范圍從100%到100%ImageBind2023d)+102.597.595AvsB?100100%。10Gen3OpenAI101015VGGSound(Chen2020OpenAISora(OpenAI,2024RunwayGen3(RunwayML2024年)和我們提出的MovieGenAudioBench。51Iagtal.,;it,2023)MovieGenAudioBenchMovieGenVideo})示生成視頻538個(gè)視頻。y38enohSRealVGGSoundSGenGen3MovieBench多重鏡頭OpenAISoraMovieGenVideo26)20242024)(ElevenLabs)Seeing&HearingTV2AV2A)ElevenLabs(T2A)Movie70.4%82.2%Movie(b(c(d\h圖30MovieGenAudio的視頻到音效生成樣本MovieGenAudioBenchhttps://go.fb.me/MovieGen?Figure30Audio示它們進(jìn)行聯(lián)合SFXV2ASFXSeeing&Hearing(S&H(Xingetal.2024)PikaLabs(PikaLabs)是僅有的兩個(gè)選項(xiàng)ElevenLabsPikaElevenLabsSFX視覺和聽覺(Xing等,2024)TV2A76.8±11.167.9±15.276.8±11.156.1±17.4 V2A58.6±15.249.7±16.360.0±14.156.9±14.1 TV2A41.9±20.431.9±23.041.9±20.435.8±18.5 2A13.2±21.535.0±19.38.7±21.513.2±21.5Diff?Foley(Luo等人,2024年 V2A78.7±6.876.2±6.678.5±6.682.2±5.4 V2A65.0±8.759.5±8.565.0±8.657.2±7.7 V2A77.7±7.063.8±7.776.8±7.161.7±8.2視覺和聽覺(Xingetal.2024V2A82.1±7.476.9±8.082.6±7.363.6±8.6視覺和聽覺(Xing2024TV2A76.2±7.175.4±7.176.1±7.364.1±7.9 TV2A53.6±11.646.0±11.654.5±11.444.6±12.9 V2A71.4±4.060.7±4.2視覺和聽 71.9±4.0Gen 71.4±3.92A31.3±5.6 31.1±5.5S&HS&HTV2A67.7±8.669.3±8.466.4±8.748.6±9.4外部APIT2A12.5±11.8MGenS&H11.3±11.955.2±9.8Diff?FoleyV2AAPIT2A27.4±11.120.6±11.128.0±10.9MovieGenAudioDiff?Foley(Luo2024MovieGenAudioDiff?Foley(Luo2024FoleyCraft(Zhang等人,2024年)VTA?LDM(Xu2024a)V2A76.6±12.648.1±15.679.5±11.161.6±13.0V2A69.2±14.157.2±16.369.2±14.150.4±13.4V2A32.9±18.531.5±18.538.2±18.947.4±16.7[?100%100%]質(zhì) S&H89.9±5.082.4±5.9[?100%100%](b(c(eMovemen(OpenAI頻可在\hhttps://go.fb.me/MovieGen?Figure31到。每個(gè)片段獨(dú)立生成音頻MovieGenAudio(表示為“MovieGenAudio(aAirhea(57sOpenAISorashy(b從\h找 電影生成音頻拼接34.5±11.433.7±11.134.5±11.619.6±10.0視聽85.1±5.6 范圍[?100%100%]及其95%置信區(qū)間MovieGenAudio擴(kuò)26(AQual
54.6±20.050.0±21.754.6±20.03.3±16.737.7±17.632.3±21.75.9±15.029.8±23.430.8±22.89.8±18.66.6±22.84.1±22.858.6±19.049.9±19.158.6±19.016.1±16.1 25.6±20.66.6±15.616.2±20.622.9±21.715.00.32(a (b)ImageBind分?jǐn)?shù)與提示音頻質(zhì)量TV2A“TV2AV2A”V2AV2A0.23300M所示。(a(b(c\hA凈勝率vs.模型 B w/和MDAR圖35顯示了13B模型的擴(kuò)展方法的定性樣本。(2)擴(kuò)展到13B(3多次擴(kuò)散與3B時(shí)的一次性生成頂線相當(dāng)(4win(5beam”3B 贊成 300米29.9±19.025.1±18.720.2±19.135.2±14.334.6±18.836.7±18.836.7±18.419.4±13.411.0±21.310.3±21.311.7±21.318.8±19.313B(a(a(b 贊成 PT41.7±15.337.8±16.343.0±14.720232024和2022)2022a2023頻擴(kuò)散(Blattmannetal.2023a)I2VGen?XL(Zhangetal.,18.4±10.04.1±10.5AR帶軌跡注冊(cè)10.6±11.011.7±9.710.6±11.0AR帶上下?條件。&光束3.4±11.1AR,帶軌跡4.0±11.03.6±11.4AR&3.0±11.43.6±10.3 AV1.7±18.00.4±18.3 總體 納特 贊成 2023b)Dynamicrafter(Xing2023)、VideoGen(Li等人,2023a和VideoCrafter1(Chen等人,一些論?研究了噪聲調(diào)度對(duì)更連貫性的作用(Geetal.,2023;Qiuetal.,2023;Luo20232024模型使用基于U?Net的架構(gòu)Snap?Video(Menapace等,2024和OpenAISora(OpenAI,Latte(Maetal.,2024a)也使用DiT代替U?Net主?進(jìn)行?本到視頻的生成另一方面Transformer模型上20232024)年2013)VAE201720212014)TransformerEfficient?VQGAN(Caoetal.,2023)ViT?VQGAN(Yuetal.,2021)和TiTok(Yuetal.,2024)使用視覺變換器展示了有希望的結(jié)果(a(c(a(c\h(2D(3D視頻擴(kuò)散(Blattmann等人,2023a)潛在移位(An等人,2023)、VideoLDM(Blattmann等人,2023b)、Emu?Video2024MAGViT2023aVQGAN20232023b2.5D)TATS122023a人們已經(jīng)探索了LoRA(Hu2021來調(diào)整輕量級(jí)低秩適配器以加速訓(xùn)練過程HyperDreamBooth(Ruiz2023b)2023c2023b20232024a2023a2024b12320212024b2024b2024a202420242024etal.2023;Lietal.2023b;Ceylanetal.2023;Karaetal.2023;Yangetal.2023)這些方法可應(yīng)用于任何?本轉(zhuǎn)視頻模型調(diào)整模型參數(shù)以處理整個(gè)視頻輸入的方法表現(xiàn)更差(Singer2024Qin2023)。)等人,)EVE的方法對(duì)內(nèi)存的要求高出一個(gè)數(shù)量級(jí)FDDMovieGenVideo(參見第3)2024aXing2024Zhang2024)但也有少數(shù)例外(Kondratyuk202320242023a例如VGGSound(550小時(shí))(Chen等人,2020年或AudioSet(5K小時(shí))(Gemmeke等人,2017年)這些))節(jié)有幾款產(chǎn)品提供視頻轉(zhuǎn)音頻功能PikaLabs4和ElevenLabs.5,但它們都無法真正生成與動(dòng)作一致的音效或同時(shí)包含音樂和音效的電影配樂PikaLabs支持使用視頻和可選的5so(來Ae631(API(5秒)示(EnCodec(Défossez等人,2022)Soundstream(Zeghidour等人,2022)w2vBERT(Chung等人,2021))除了我們新穎的擴(kuò)\h\h\h龐浩宇I(lǐng)shanMisraKiranJagadeeshSinghMaryWilliamsonMattLeMiteshKumarSinghPeizhaoZhangPeterVajdaQuentinDuvalRohitGirdharRoshanSumbalySaiSakethRambhatlaSamTsaiSamanehAzadiSamyakDattaSanyuanChen、SeanBellSharadhRamaswamyShellySheyninSiddharthBhattacharyaYanivTaigmanAlbertPumarolaAlejandroRuizAliThabetArtsiomSanakoyeuArunMallyaBaishanGuoBorisAraya、BreenaKerrCarleighWoodCeLiuCenPengDeShawnWallaceDimitryVengertsevEdgarSch?nfeld、ElliotBlanchardFelixJuefei?XuFraylieNordJeffLiangJohnHoffmanJonasKohlerJosephKimKerenLonsteinLawrenceChenLichengYuLuyaGaoMarkosGeorgopoulosMatthewYuRashelYumingDuAhmadAl?DahleAhuvaGoldstandAjayLadsariaAkashJaiswalAkioKodairaAndrewTreadwayAndrésAlvaradoAntoineToisoul、BaishanGuoBernieHuangBorisArayaBrandonWuBrianEllisChaoZhou、ChenFanChenKovacsChing?FengYehChrisMoghbel、Nord、GabriellaSchwarzGaelLeLanJeffWangJiaboHuJianyuHuangJiecaoYuJieminZhangJinhoHwang、JoellePineau、JongsooParkJunjiaoTianKarthikSivakumarKathrynStadlerLindseyKishlineManoharPaluriMattSetzlerMaxRaphaelMengyiYaronLipmanYashMehtaYeJiaZhaohengNi頻生成arXivarXiv:2304.08477,2023KendallAtkinsonJohnWiley&Sons,1991年。YogeshBalajiSeungjunNahXunHuangArashVahdatJiamingSongKarstenKreisMiikaAittalaTimoAila、SamuliLaineBryanCatanzaro、TeroKarras和Ming?YuLiueDiff?I:帶有一組專家降噪器的?本到圖像擴(kuò)散模型arXiv預(yù)印本arXiv:2211.01324,2022年。和OmerBar?TalHilaCheferOmerTovCharlesHerrmannRoniPaissShiranZadaArielEphratJunhwaHurYuanzhenLiTomerMichaeliOliverWangDeqingSunTaliDekel和InbarMosseriLumiere:用于視頻生成的時(shí)空擴(kuò)散模型arXiv預(yù)印本arXiv:2401.12945,2024年。ShaneBarratt和RishiSharmaInceptionarXiv預(yù)印本arXiv:1801.01973,2018年。ProjectGrand\h\hFLUX2024://\hAndreasBlattmannTimDockhornSumithKulalDanielMendelevitchMaciejKilianDominikLorenzYamLevi、ZionEnglishVikramVoleti、RalphAllanBradley和MiltonE.TerryI年。InstructPix2PixCVPR2023mBrookslPeeblesrHolmeslDePueiGuoiJingdSchnurreTayloryLuhmancLuhmaneNgg和aRamesh4s\hsimulatorsTomBBrownBenjaminMannNickRyderMelanieSubbiahJaredKaplanPrafullaDhariwalArvindNeelakantan、PranavShyamGirishSastry、?VQGAN年。和。Pix2VideoCarolineChanShiryGinosarTinghuiZhouAlexeiAEfrosICCV,2019HilaCheferShiranZadaRoniPaissArielEphratOmerTovMichaelRubinsteinLiorWolfTaliDekelTomerMichaeliInbarMosseriVGGSoundarXiv:2309.057932023b。SEINEICLR2023cMinJinChong和DavidForsythFID和Inception分?jǐn)?shù)以及在哪里可以找到它們CVPR中,2020ArnabChoudhuryYangWangTuomasPelkonenKuttaSrinivasanAbhaJainShenghaoLinDeliaDavidSiavashSoleimanifardMichaelChen、語音預(yù)訓(xùn)練IEEE自動(dòng)語音識(shí)別與理解研討會(huì)(ASRU),2021年。戴曉亮SamTsaiSimonVandenhendeAbhimanyuDubeyEmuarXivarXiv:2309.158072023。蒂莫西···邁拉爾和皮奧特·AramDavtyanSepehrSameniPaoloFavaroICCV2023MostafaDehghaniBasilMustafaJosipDjolongaJonathanHeekMatthiasMindererMathildeCaronAndreasSteinerJoanPuigcerverRobert鄧健康StefanosZafeiriouArcFaceAlexeyDosovitskiyLucasBeyerAlexanderKolesnikovDirkWeissenbornThomasUnterthinerMostafaDehghaniMatthiasMindererGeorgHeigold、SylvainGelly1616TransformersICLR,2021AbhimanyuDubeyAbhinavJauhriAbhinavPandeyAbhishekKadianAhmadAl?DahleAieshaLetmanAkhilMathurAlanScheltenAmyYangAngelaFanLlama3arXiv預(yù)印本arXiv:2407.21783,2024。年。AlexandreDéfossezJadeCopetGabrielSynnaeve和YossiAdi和ElevenLabsElevenLabshttps://\helevenlabs.io/app/sound?effects和PatrickEsserSumithKulalAndreasBlattmannRahimEntezariJonasMüllerHarrySainiYamLeviDominikLorenzAxelSauerFredericBoesel等人開發(fā)人員://\hThomasHayesHarryYangXiYinGuanPangDavidJacobsJia?BinHuangDeviParikhSongweiGeSeungjunNahGuilinLiuTylerPoonAndrewTaoBryanCatanzaroDavidJacobsJia?BinHuang、Ming?YuLiu和YogeshBalaji在ICCV,2023年。葛松偉·瑪哈帕特拉·帕爾瑪FréchetJortF.GemmekeDanielPWEllisDylanFreedmanArenJansenWadeLawrenceR.ChanningMooreManojPlakal和MarvinRitterICASSP2017DeepanwayGhosalNavonilMajumderAmbujMehrish和SoujanyaPoriaRohitGirdharMannatSinghAndrewBrownQuentinDuvalSamanehAzadiSaiSakethRambhatlaAkbarShah、XiYinDeviParikh和IshanECCV2024信息處理系統(tǒng)進(jìn)展2014年。AgrimGuptaLijunYuKihyukSohnGuXiuyeGuMeeraHahnIrfanEssaLuJiangJoséLezama預(yù)印本何宣華預(yù)印本,ZechengHeBoSunFelixJuefei?XuHaoyuMaAnkitRamchandaniVincentCheungSiddharthShahAnmolKaliaHariharSubramanyam、AlirezaZareianLiChenAnkitJainNingZhangPeizhaoZhangRoshanSumbalyPeterVajdaAnimeshSinha圖像生成arXivarXiv:2409.133462024b。JonathanHoAjayJainPieterAbbeelNeurIPS,2020..和預(yù)印本JonathanHoTimSalimansAlexeyGritsenkoWilliamChanMohammadNorouzi和DavidJFleetNeurIPS,2022b年。SusungHongJunyoungSeoHeeseongShinSunghwanHongSeungryongKimDirect2v:大型語言模型是用于零樣本?本到視頻生成的幀級(jí)指導(dǎo)者。arXivarXiv:2305.14330,2023EdwardJ.HuYelongShenPhillipWallisZeyuanAllen?ZhuYuanzhiLiSheanWangLuWang和WeizhuChen年。黃青青DanielSParkTimoIDenkAndyLyChristianFrankNoise2Music:使用擴(kuò)散模型生成?本調(diào)節(jié)的音樂。arXivarXiv:2302.039172023。ZiqiHuangYinanHeJiashuoFanChenyangSiYumingJiangYuanhanTianxingWuQingyangJinNattapolChanpaisitYaohuiWang、v22024\h和Video2MusicOzgurKaraBariscanKurtkayaHidirYesiltepeJamesMRehg和PinarYanardagRave:使用擴(kuò)散模型進(jìn)行隨機(jī)噪聲改組arXivarXiv:2312.04524,2023LevonKhachatryanAndranikMovsisyanVahramTadevosyanRobertoHenschelZhangyangWangShantNavasardyanHumphreyShi。凱?··祖盧阿加·羅布萊克和馬修·Fréchet預(yù)印本和和預(yù)印本KlingAI2024https\h/喬納斯··普馬羅拉·舍恩菲爾德ArtsiomSanakoyeuRoshanSumbalyPeterVajdaAliThabet預(yù)印本,DanKondratyukLijunYuXiuyeGuJoséLezamaJonathanHuangRachelHornungHartwigAdamHassanAkbariYairAlonVighneshBirodkar等人VideoPoet:用于零樣本VijayAnandKorthikantiJaredCasperSangkugLymLawrenceMcAfeeMichaelAnderschMohammadShoeybi和BryanCatanzaroTransformer模型中的激活重新FelixKreukGabrielSynnaeveAdamPolyakUrielSingerAlexandreDéfossezJadeCopetDeviParikhYanivTaigman和YossiAdiAudioGen:?本引導(dǎo)的音頻生成arXiv預(yù)印arXiv:2209.15352,2022RitheshKumarPremSeetharamanAlejandroLuebsIshaanKumar和KundanKumarGaelLeLanBowenShiZhaohengNiSiddSrinivasanAnuragKumarBrianEllisDavidKantVarunNagaraja、ErnieChangWei?NingHsu本引導(dǎo)音樂生成和編輯arXivarXiv:2407.03648,2024MatthewLeApoorvVyasBowenShiBrianKarrerLedaSariRashelMoritzMaryWilliamsonVimalManoharYossiAdiJayMahadeokarWei?NingHsuVoicebox:?本引和BigVGAN和預(yù)印本預(yù)印本,VidToMearXiv預(yù)印本arXiv:2312.10656,2023bPhotoMaker,IshanMisraPéterVajdaDianaMarculescuFlowVid:馴服不完美的光流以實(shí)現(xiàn)一致的視頻到視頻合成。arXiv預(yù)印本arXiv:2312.17681,2023。WACV2024預(yù)印本arXiv:2310.01889,2023a劉浩河DaniloMandicWenwuWang和MarkDPlumbley預(yù)印本,arXiv:2303.054992023c。LumaLabs2024年https://\hlumalabs.ai/dream?machineCLIP4ClipSimianLuoChuanhaoYanChenxuHu和HangZhaoDiff?Foley預(yù)印本,本到音頻生成arXiv預(yù)印本arXiv:2404.09956,2024年。和MusCapsIJCNN2021ShivamMehtaRuiboTuJonasBeskowévaSzékelyGustavEjeHenterMatcha?TTSTTSXinhaoMeiVarunNagarajaGaelLeLanZhaohengNiErnieChangYangyangShi和VikasChandraFoleyGen:視覺引導(dǎo)音頻生成arXiv預(yù)印本arXiv:2309.10537,2023孟陳林StefanoErmonSD年。MidjourneyMidjourney2024https://\h/DeepakNarayananMohammadShoeybiJaredCasperPatrickLeGresleyMostofaPatwaryVijayKorthikantiDmitriVainbrandPrethviKashinkuntiJulieBernauerBryanCatanzaro等Megatron?LM在GPU集群上進(jìn)行高效的大規(guī)模語言模型訓(xùn)練2021年。://\h\hcontext_parallel.html,2024AaronvandenOordYazheLi和OriolVinyals32024\h\h
\h
\hMoA個(gè)性化圖像生成中的解纏arXiv預(yù)印本arXiv:2404.11565,2024WilliamPeebles和SainingXietransformer的可擴(kuò)展擴(kuò)散模型ICCV上,2023年。皮卡實(shí)驗(yàn)室。皮卡實(shí)驗(yàn)室。\hhttps://www.pika.art/EdPizziSreyaDuttaRoySugoshNagavaraRavindraPriyaGoyalMatthijsDouzeDustinPodellZionEnglishKyleLaceyAndreasBlattmannTimDockhornJonasMüllerJoePenna和RobinRom?bachSDXL:改進(jìn)用于高分辨率圖像合成的潛在擴(kuò)散模型arXiv預(yù)印本KRPrajwalBowenShiMatthewLeApoorvVyasAndrosTjandraMahiLuthraBaishanGuoHuiyuWang、TriantafyllosAfourasDavidKant等PySceneDetect開發(fā)人員PySceneDetecthttps://\h/FreeNoise,AlecRadfordJongWookKimChrisHallacyAdityaRameshGabrielGohSandhiniAgarwalGirishSastryAmandaAskellPamelaMishkin、JackClarkICML2021ColinRaffelNoamShazeerAdamRobertsKatherineLeeSharanNarangMichaelMatenaYanqiZhouWeiLi和PeterJLiuSamyamRajbhandariJeffRasleyOlatunjiRuwase和YuxiongHeZeRO預(yù)印本SaiSakethRambhatla和IshanMisraSelfEval:利用生成模型的判別性進(jìn)行評(píng)估arXiv預(yù)印本arXiv:2311.10708,2023aRavinGabeuraRyalimKhedrnR?dleeRollandaGustafsoncMintun、nvAlwalasCarionSAM2v,2024。ZeROOffloadarXivarXiv:2101.068402021。lRuiznLinJampanilPritchln和rAbermanDreamBoothCVPR2023a。NatanielRuizYuanzhenLiVarunJampaniWeiWeiTingboHouYaelPritchNealWadhwaMichaelRubinsteinKfirAbermanHyperDreamBooth:RunwayMLGen?2,2023年https://\h/gen2RunwayMLGen?3Alpha,2024年https://\h/research/introducing?gen?3?alphaarXiv:2202.005122022TimSalimansIanGoodfellowWojciechZarembaVickiCheungAlecRadford和XiChenChristophSchuhmannRomainBeaumontRichardVencuCadeGordonRossWightmanMehdiChertiTheoCoombesAarushKattaClaytonNoamShazeerGLUtransformerarXivarXiv:2002.05202,2020KaiShenZeqianJuXuTanYanqingLiuYichongLengLeiHeTaoQinShengZhao和JiangBianNaturalSpeech2:潛在擴(kuò)散模型是自然且零樣本的語音和歌唱合成器arXiv預(yù)印本MohammadShoeybiMostofaPatwaryRaulPuriPatrickLeGresleyJaredCasper和BryanCatanzaroMegatron?LM:使用模型并行性訓(xùn)練數(shù)十億參數(shù)語言模型arXiv預(yù)印本arXiv:1909.08053,UrielSingerAmitZoharYuvalShellySheyninAdamPolyakDeviParikh和YanivTaigmanJiamingSongChenlinMeng和StefanoErmonarXiv預(yù)印本2020RobynnJStilwell2007KunSuJudithYueLiQingqingHuangDimaKuzminJoonseokLeeChrisDonahueFeiShaArenJansenYuWangMauroVerzettiTimoI.Siu?LanTanMatthewPSpackmanElizabethMWakefield34(5):605–623,2017iTayaDehghanihQTranrGarcianWeiiWanggnChungkShakeriaBahrilr等UL2varXiv:2205.051312年。arXiv:2406.043212024。HugoTouvronLouisMartinKevinStonePeterAlbertAmjadAlmahairiYasmineBabaeiNikolayBashlykovSoumyaBatraPrajjwalBhargava、托馬斯·翁特蒂納(ThomasUnterthiner)SjoerdvanSteenkiste·庫拉赫(KarolKurach)·馬里尼爾(Rapha?lMarinier)·米哈爾斯基(MarcinMichalski)和西爾萬·蓋利(SylvainGelly)AshishVaswaniNoamShazeerNikiParmarJakobUszkoreitLlionJonesAidanNGomezLukaszKaiserIlliaPolosukhinNeurIPS,2017ApoorvVyasBowenShiMatthewLeAndrosTjandraYi?ChiaoWuBaishanGuoJieminZhangXinyueZhang、RobertAdkinsWilliamNgan等人。技術(shù)報(bào)告arXiv預(yù)印本arXiv:2308.06571,2023aQixunWangXuBaiHaofanWangZekuiQin和AnthonyChenInstantID:在幾秒內(nèi)實(shí)現(xiàn)零樣本身份保留生成arXiv預(yù)印本arXiv:2401.07519,2024a使用現(xiàn)成的圖像擴(kuò)
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度重型壓路機(jī)買賣及維修保養(yǎng)合同3篇
- 2025年度企業(yè)自駕游租車合同二零二五年度專用4篇
- 2025年度個(gè)人智能健康監(jiān)測(cè)技術(shù)入股協(xié)議4篇
- 2025年個(gè)人住宅防水保溫一體化合同范本4篇
- 開店策劃指導(dǎo)的合同(2篇)
- 民營醫(yī)療服務(wù):穩(wěn)中求進(jìn)關(guān)注老齡化+供需錯(cuò)配格局下的投資機(jī)會(huì)
- 二零二五版門窗行業(yè)綠色物流與倉儲(chǔ)服務(wù)合同4篇
- 網(wǎng)架鋼結(jié)構(gòu)施工方案
- 二零二五版智能門牌系統(tǒng)與物聯(lián)網(wǎng)技術(shù)合同4篇
- 公路預(yù)埋管線施工方案
- 2025年度版權(quán)授權(quán)協(xié)議:游戲角色形象設(shè)計(jì)與授權(quán)使用3篇
- 2024年08月云南省農(nóng)村信用社秋季校園招考750名工作人員筆試歷年參考題庫附帶答案詳解
- 防詐騙安全知識(shí)培訓(xùn)課件
- 心肺復(fù)蘇課件2024
- 2024年股東股權(quán)繼承轉(zhuǎn)讓協(xié)議3篇
- 2024-2025學(xué)年江蘇省南京市高二上冊(cè)期末數(shù)學(xué)檢測(cè)試卷(含解析)
- 四川省名校2025屆高三第二次模擬考試英語試卷含解析
- 《城鎮(zhèn)燃?xì)忸I(lǐng)域重大隱患判定指導(dǎo)手冊(cè)》專題培訓(xùn)
- 湖南財(cái)政經(jīng)濟(jì)學(xué)院專升本管理學(xué)真題
- 考研有機(jī)化學(xué)重點(diǎn)
- 全國身份證前六位、區(qū)號(hào)、郵編-編碼大全
評(píng)論
0/150
提交評(píng)論