MSG-net-arxiv人工智能機(jī)器學(xué)習(xí)

上傳人：四*** IP屬地：四川上傳時(shí)間：2025-04-02 格式：DOC 頁(yè)數(shù)：21 大小：1.95MB 積分：12 舉報(bào) 版權(quán)申訴

MSG-net-arxiv人工智能機(jī)器學(xué)習(xí)_第2頁(yè)

MSG-net-arxiv人工智能機(jī)器學(xué)習(xí)_第3頁(yè)

MSG-net-arxiv人工智能機(jī)器學(xué)習(xí)_第4頁(yè)

MSG-net-arxiv人工智能機(jī)器學(xué)習(xí)_第5頁(yè)

已閱讀5頁(yè)，還剩16頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Multi-styleGenerativeNetworkforReal-timeTransfer

HangZhangKristinDana

DepartmentofElectricalandComputerEngineering,RutgersUniversity

zhang.hang@,kdana@

Abstract

Recentworkinstyletransferlearnsafeed-forward

generativenetworktoapproximatetheprioroptimization-

basedapproaches,resultinginreal-timeperformance.

However,thesemethodsrequiretrainingseparatenetworks

fordifferenttargetstyleswhichgreatlylimitsthescalabil-

ity.WeintroduceaMulti-styleGenerativeNetwork(MSG-

Net)withanovelInspirationLayer,whichretainsthefunc-

tionalityofoptimization-basedapproachesandhasthefast

speedoffeed-forwardnetworks.TheproposedInspiration

Layerexplicitlymatchesthefeaturestatisticswiththetar-

getstylesatruntime,whichdramaticallyimprovesversa-

tilityofexistinggenerativenetwork,sothatmultiplestyles

canberealizedwithinonenetwork.TheproposedMSG-Net

matchesimagestylesatmultiplescalesandputsthecom-

putationalburdenintothetraining.Thelearnedgenerator

isacompactfeed-forwardnetworkthatrunsinreal-time

aftertraining.Comparingtopreviouswork,theproposed

networkcanachievefaststyletransferwithatleastcom-

parablequalityusingasinglenetwork.Theexperimental

resultshavecovered(butarenotlimitedto)simultaneous

trainingoftwentydifferentstylesinasinglenetwork.The

completesoftwaresystemandpre-trainedmodelswillbe

publiclyavailableuponpublication1.

1.Introduction

Styletransfercanbeapproachedasreconstructingor

synthesizingtexturebasedonthetargetimagesemantic

content[24].Manypioneeringworkshaveachievedsuc-

cessinclassictexturesynthesisstartingwithmethodsthat

resamplepixels[8,9,23,36]ormatchmultiscalefeature

statistics[7,16,29].Thesemethodsemploytraditionalim-

agepyramidsobtainedbyhandcraftedmultiscalelinear?l-

terbanks[1,32]andperformtexturesynthesisbymatching

thefeaturestatisticstothetargetstyleatmultiplescales.

1/zhanghang1989/MSG-Net

Figure1:Theproblemofmulti-styletransferinreal-time

usingasinglenetworkissolvedinthispaper.Examplesof

transferedimagesandthecorrespondingstyles.

Inrecentyears,theconceptsoftexturesynthesisandstyle

transferhavebeenrevisitedwithinthecontextofdeeplearn-

ing.Featurehistogramsatpyramidlevelsintraditional

methodsarereplacedwithGramMatrixrepresentationfrom

convolutionalneuralnets(CNN).Gatysetal.[11,12]?rst

adoptsapre-trainedCNNasadescriptiverepresentation

ofimagestatisticsandprovidesanexplicitrepresentation

thatseparatesimagestyleandcontentinformation.This

frameworkhasachievedgreatsuccessinbothtexturesyn-

thesisandstyletransfer.Thismethodisoptimization-based

becausethenewtextureimageisgeneratedbyapplying

gradientdescentthatmanipulatesawhitenoiseimageto

matchtheGramMatrixrepresentationofthetargetimage.

Optimization-basedapproacheshavenoscalabilityproblem

butrequireexpensivecomputationtogenerateimagesusing

gradientdescent.

Recentwork[21,34]trainafeed-forwardgenerativenet-

worktoapproximatetheoptimizationprocessthattrans-

formstheimageintoatargetstyleinreal-timebymov-

Figure2:AnoverviewofMSG-Net,Multi-styleGenerativeNetwork.Thetransformationnetworkaspartofthegenerator

explicitlymatchesthefeaturesstatisticscapturedbyadescriptivenetworkofthestyletargetsusingtheproposedInspiration

LayerdenotedasIns(introducedinSection3).DetailedarchitectureofthetransformationnetworkisshowninTable1.A

pre-trainedlossnetworkprovidesthesupervisionofMSG-Netlearningbyminimizingthecontentandstyledifferenceswith

thetargetsasdiscussedinSection4.2.

ingthecomputationalburdenintotrainingprocess.How-

ever,theseapproachesrequiretrainingseparatenetworks

foreachdifferentstyle,whichextremelylimitsthescalabil-

ity.Chenetal.[3]adoptahybridsolutionbyseparating

mid-levelconvolutional?ltersindividuallyforeachstyle

likeahardswitch,andsharethedown-samplingandup-

samplingpartsacrossdifferentstyles.Nevertheless,thesize

ofthenetworkisstillproportionaltothenumberofstyles,

whichwillbeproblematicforhundredsofstyles.

Whatlimitsthediversityofstylesinexistinggenerative

network?Existingworklearnsagenerativenetworktak-

inganinputimagexcandprovidingthetransferredoutput

G(xc),inwhichthefeaturestatisticsofthestyleimageare

implicitlylearnedfromthelossfunctionwithoutinforming

thenetworkaboutthestyletargetxs[21,25,34].Forexist-

ingapproaches,thereisafundamentaldif?cultyinderiving

arepresentationthatsimultaneouslypreservesthesemantic

contentofinputimagexcandmatchesthestyleofthetarget

imagexs(seeextendeddiscussioninSection4.3).Inorder

tobuildamulti-stylegenerativenetworkG(xc,xs),where

xscanbechosenfromadiversesetofstyles,thegenerator

networkshouldexplicitlymatchthefeaturestatisticsofthe

styletargetimagesatruntime.

Asthe?rstcontributionofthepaper,weintroduce

anInspirationLayerwhichmatchesthefeaturestatistics

(GramMatrix)ofthetargetandpreservestheinputseman-

ticcontentsatruntime.TheInspirationLayerisend-to-end

learnablewithexistinggenerativenetworkarchitecturesand

putsthecomputationalburdenintothetrainingprocessto

achievereal-timestylematching.TheproposedInspiration

Layerenablesmulti-stylegenerationfromasinglenetwork

whichdramaticallyimprovestheversatilityofexistinggen-

erativenetworkarchitectures.

Thesecondcontributionofthispaperislearninga

novelfeed-forwardgenerativenetworkforreal-timemulti-

stylematching,whichwerefertoasMulti-styleGenerative

Network(MSG-Net).TheInspiratonLayerisacomponent

ofMSG-Net.Thisproposednetworkexplicitlymatchesthe

featurestatisticsatmultiplescalesinordertoretaintheper-

formanceofoptimization-basedmethodsandachievethe

speedoffeed-forwardnetworks.Thenetworkdesignbene-

?tsfromrecentadvancesinCNNarchitecturecalledresid-

ualblock[14],whichreducescomputationalcomplexity

withoutlosingstyleversatilitybypreservinglargernum-

berofchannels.Wefurtherdesignanupsamplingresidual

blocktoallowpassingidentityallthewaythroughthegen-

erativenetwork,enablingthenetworktoextenddeeperand

convergefaster.TheexperimentalresultsshowthatMSG-

Netcanachievereal-timestyletransferwithcomparable

image?delitycomparedtopreviouswork.

Thepaperisorganizedasfollows.Webrie?ydescribe

thepriorworkoncontentandstylerepresentationusing

aCNNframeworkinSection2.Weintroduceourpro-

posedInspirationLayerinSection3andournovelgener-

ativearchitectureMulti-styleGenerativeNetworkinSec-

tion4.Thecomparisontootherapproachesisdiscussedin

Section4.3.Finally,theexperimentalresultsandcompar-

isonsarepresentedinSection5.

2.ContentandStyleRepresentation

CNNspre-trainedonaverylargedatasetsuchasIma-

geNetcanberegardedasdescriptiverepresentationsofim-

agestatisticscontainingbothsemanticcontentandstylein-

formation.Gatysetal.[12]providesexplicitrepresenta-

tionsthatindependentlymodeltheimagecontentandstyle

fromCNNs,whichwebrie?ydescribeinthissectionfor

(a)targets(b)scale1(c)scale2(d)scale3(e)allscales

Figure3:Visualizingtheeffectsofmulti-scalefeaturestatisticsforstyletransfer(top)andtexturesynthesis(bottom).(First

column)Inputtargets;(Centercolumns)Invertedrepresentationsateachindividualscale;(LastColumn)Invertedrepresen-

tationcombiningthemultiplescales.Weusepre-trained16-layerVGGasthedescriptivenetworkandconsiderthesizeof256×256astheoriginalscaleandreducethesizebydividing2

i?1fori-thscale.ThefeaturemapsafterReLUateachscale

areused.

completeness.

Thesemanticcontentoftheimagecanberepresented

astheactivationsofthedescriptivenetworkati-thscale

Fi(x)∈R

Ci×Hi×Wiwithagiventheinputimagex,where

theCi,HiandWiarethenumberoffeaturemapchannels,

featuremapheightandwidth.Thetextureorstyleofthe

imagecanberepresentedasthedistributionofthefeatures

usingGramMatrixG(Fi(x))∈R

Ci×Cigivenby

whereαisatrade-offparameterthatbalancingthecontri-

butionofthecontentandstyletargets.

Theminimizationoftheaboveproblemissolvableby

usinganiterativeapproach,butitisinfeasibletoachieveit

inreal-timeormakethemodeldifferentiable.However,we

canstillapproximatethesolutionandputthecomputational

burdentothetrainingstage.Weintroduceanapproximation

whichtunesthefeaturemapbasedonthetargetstyle:

XHiXWi

Gi(x)

FFh,w(x)F

=h,w(x)

.(1)

h=1w=1

TheGramMatrixisorderlessanddescribesthefeaturedis-

tributions.Forzero-centereddata,theGramMatrixisthe

sameasthecovariancematrixscaledbythenumberofel-

ementsCi×Hi×W

i.Itcanbecalculatedef?cientlyby

?rstreshapingthefeaturemapΦi(x)

F∈R

Ci×(HiWi),

whereΦ()isareshapingoperation.ThentheGramMatrix

canbewrittenasGi(x)=Φi(x)Φi(x)

FFF

3.InspirationLayer

Inthissection,weintroduceInspirationLayer,which

explicitlymatchesmulti-scalefeaturestatisticsbasedonthe

givenstyles.Foragivencontenttargetxcandastyletarget

xs,thecontentandstylerepresentationsatthei-thscale

usingthedescriptivenetworkcanbewrittenasF

i(xc)and

G(F

i(xs)),respectively.AdirectsolutionY?

iisdesirable

whichpreservesthesemanticcontentofinputimageand

matchesthetargetstylefeaturestatistics:

i=argmini?Fi(xc)k

Yi{kY

+αkG(YFkF}.

i)?Gi(xs)

(2)

i=Φ?1h

Φi(xc)

i(xs)i

,(3)

whereW∈RCi×Ciisalearnableweightmatrixand

Φ()isareshapingoperationtomatchthedimension,

sothatΦi(xc)

F∈RCi×(HiWi).Forintuitionon

thefunctionalityofW,supposeW=Gi(xs)

?1,

thenthe?rstterminEquation2isminimized.Nowlet

W=Φi(xc)i(xs))?1,whereLi(xs)

F(FF

?TL

obtainedbytheCholeskyDecompositionofGi(xs)

Li(xs)Li(xs)

T,thenthesecondtermofEqua-

tion2isminimized.WeletWbelearneddirectlyfrom

thelossfunctiontodynamicallybalancethetrade-off.

End-to-endLearningTheInspirationLayerisdifferen-

tiablewithrespecttobothlayerinputandthelayerweights.

Therefore,theInspirationLayercanbelearnedbystandard

StochasticGradientDecent(SGD)solver.Weprovideex-

plicitexpressionforthederivedbackpropagationequations

inthesupplementarymaterial.

layernameoutputsizeMSG-Net

Figure4:Weextendtheoriginaldown-samplingresidual

architecture(left)toanup-samplingversion(right).We

usea1×1fractionally-stridedconvolutionasashortcutand

adoptre?ectancepadding.

4.Multi-styleGenerativeNetwork

4.1.NetworkArchitecture

Existingfeed-forwardbasedstyletransferworklearnsa

generatornetworkthattakesonlythecontentimageasthe

inputandoutputsthetransferredimage,i.e.thegenerator

networkcanbeexpressedasG(xc),whichimplicitlylearns

thefeaturestatisticsfromthelossfunction.Weintroduce

aMulti-styleGenerativeNetworkwhichtakesbothcontent

andstyletargetasinputs.i.e.G(xc,xs).Theproposed

networkexplicitlymatchesthefeaturestatisticsofthestyle

targetsatmultiplescales.

AspartoftheGeneratorNetwork,weadopta16-layer

pre-trainedVGGnetwork[33]asadescriptivenetwork

Fwhichcapturesthefeaturestatisticsofthestyleim-

agexsatdifferentscales,andoutputstheGramMatrices

{G(Fi(xs))}(i=1,...K)whereKistotalnumberof

scales.Thenatransformationnetworktakesthecontentim-

agexcandmatchesthefeaturestatisticsofthestyleimage

atmultiplescaleswithInspirationLayers.Weadoptthe

VGGnetworkthatispre-trainedonImageNet[31]asthe

descriptivenetwork,becausethenetworkfeatureslearned

fromadiversesetofimagesarelikelytobegenericand

informative.

Multi-scaleProcessingFigure3illustratestheimpactof

multi-scalerepresentationbycomparingthereconstruction

resultfromthefeaturestatisticsatindividualscaleswiththe

resultofcombiningmultiplescales.Weuseapre-trained

16-layerVGGasadescriptivenetworkanduseGatys’ap-

proachtoinverttherepresentation[11].Thisexperiment

suggeststhatfeaturestatisticsatindividualscalesarenot

informativeenoughtoreconstructtexturesorstyles.Multi-

scalefeaturestatisticsprovideacomprehensiverepresen-

tationofthetexturesandstyles.Therefore,weintroduce

amulti-scaleInspirationarchitecturetomatchthefeature

statisticsatfourdifferentscalesasshowninFigure2.

Conv1256×2567×7,64

Inspiration1256×256C=64

Res1128×128

1×1,32

3×3,32

×k

1×1,128

Inspiration2128×128C=128

Res26464×

11,64×

3×3,64

×k

1×1,256

Inspiration364×64C=256

Res33232×

11,128×

3×3,128

1×1,512

×k

Inspiration432×32C=512

Up-Res164×64

1×1,64

3×3,64

×k

1×1,256

Up-Res2128×128

1×1,32

3×3,32×k

1×1,128

Up-Res3256×256

1×1,16

×k

3×3,16

1×1,64

Conv256×2567×7,3

Table1:Thearchitectureofthetransformationnetwork

with(18k+6)layers,whichisthecorepartofMulti-style

GenerativeNetwork(MSG-Net).

Up-sampleResidualBlock.Deepresiduallearninghas

achievedgreatsuccessinvisualrecognition[14,15].Resid-

ualblockarchitectureplaysanimportantrolebyreducing

thecomputationalcomplexitywithoutlosingdiversityby

preservingthelargenumberoffeaturemapchannels.We

extendtheoriginalarchitecturewithanup-samplingver-

sionasshowninFigure4(right),whichhasafractionally-

stridedconvolution[27]astheshortcutandadoptsre-

?ectancepaddingtoavoidartifactsofthegenerativepro-

cess.Thisup-samplingresidualarchitectureallowsusto

passidentityallthewaythroughthenetwork(asshownin

Table1),sothatthenetworkconvergesfasterandextends

deeper.

OtherDetails.Weonlyusein-networkdown-sample

(convolutional)andup-sample(fractionally-stridedcon-

volution)inthetransformationnetworkasinprevious

work[21,30].Weusere?ectancepaddingtoavoidartifacts

attheborder.Instancenormalization[35]andReLUare

usedafterweightlayers(convolution,fractionally-strided

convolutionandtheInspirationLayer),whichimprovesthe

generatedimagequalityandisrobusttotheimagecontrast

changes.

4.2.NetworkLearning

Styletransferisanopenproblem,sincethereisnogold-

standardground-truthtofollow.Wefollowpreviouswork

tominimizeaweightedcombinationofthestyleandcontent

differencesofthegeneratornetworkoutputsandthetargets

foragivenpre-trainedlossnetworkF[21,34].Letthegen-

eratornetworkbedenotedbyG(xc,xs)parameterizedby

weightsWG.Learningproceedsbysamplingcontentim-

agesxc～Xcandstyleimagesxs～X

sandthenadjusting

theparametersWGofthegeneratorG(xc,xs)inorderto

minimizetheloss:

WG=argminc,xs{

λckF

c(G(xc,xs))?Fc(xc)k

+λsF?G(F

kGi(G(xc,xs))

i(xs))k

i=1

+λTV`TV(G(xc,xs))},

(4)

whereλcandλsarethebalancingweightsforcontentand

stylelosses.Weconsiderimagecontentatscalecandim-

agestyleatscalesi∈{1,..K}.`

TV()isthetotalvaria-

tionregularizationasusedpriorworkforencouragingthe

smoothnessofthegeneratedimages[21,28,38].

4.3.RelationtoOtherMethods

RelationtoPyramidMatching.Earlymethodsfor

texturesynthesisweredevelopedusingmulti-scaleimage

pyramids[7,16,29,36].Thediscoveryintheseearlier

methodswasthatrealistictextureimagescouldbesynthe-

sizedfrommanipulatingawhitenoiseimagesothatitsfea-

turestatisticswerematchedwiththetargetateachpyramid

level.Ourapproachisinspiredbyclassicmethods,which

matchesfeaturestatisticsatmultipleimagescales,butit

leveragestheadvantagesofdeeplearningnetworkswhile

placingthecomputationalcostsintothetrainingprocess.

RelationtoFusionLayers.OurproposedInspiration

Layerisakindoffusionlayerthattakestwoinputs(con-

tentandstylerepresentations).Currentworkinfusion

layerswithCNNsincludefeaturemapconcatenationand

element-wisesum[10,19,37].However,theseapproaches

arenotdirectlyapplicable,sincethereisnoseparationof

stylestylefromcontent.Forstyletransfer,thegenerated

imagesshouldnotcarrysemanticinformationofthestyle

targetnorstylesofthecontentimage.Inaddition,input

representationsizemustmatchinpriorfusionmethods;but

StorageTrainingTimeTestTime

OptimizationBased[12,24]O(1)N/Aslow

Feed-forward[3,21,34]O(N)O(N)real-time

MSG-Net(ours)O(1)O(N)real-time

Table2:Comparedtoexistingmethods,MSG-Nethasthe

bene?tofreal-timestyle-transferoffeed-forwardbasedap-

proachesaswellasthescalabilityofclassicapproaches.

forstyletransfer,thecontentrepresentationhasthedimen-

sionofCi×Hi×W

iandtheorderlessstylerepresentation

(GramMatrix)hasthedimensionofCi×Ci.

RelationtoGenerativeNetworksandAdversarial

Training.GenerativeAdversarialNetwork(GAN)[13],

whichjointlytrainsanadversarialgeneratoranddiscrimina-

torsimultaneously,hascatalyzedasurgeofinterestinthe

studyofimagegeneration[2,18,19,30,37].Recentwork

onimage-to-imageGAN[19]adoptsaconditionalGANto

provideageneralsolutionforsomeimage-to-imagegenera-

tionproblems.Forthoseproblems,itwaspreviouslyhardto

de?nealossfunction.However,thestyletransferproblem

cannotbetackledusingtheconditionalGANframework,

duetomissingground-truthimagepairs.Instead,wefol-

lowthework[21,34]toadoptadiscriminator/lossnetwork

thatminimizestheperceptualdifferenceofsynthesizedim-

ageswithcontentandstyletargetsandprovidesthesuper-

visionofthegenerativenetworklearning.Theinitialidea

ofemployingGramMatrixtotriggerthestylessynthesisis

inspiredbyarecentwork[2]thatsuggestsusinganencoder

insteadofrandomvectorinGANframework.

Concurrentwithourwork.Concurrentwork[4,17]

exploresarbitrarystyletransfer.Astyleswaplayerispro-

posedin[4],butgetslowerqualityandslowerspeed(com-

paredtoexistingfeed-forwardapproaches).Anadaptive

instancenormalizationisintroducedin[17]tomatchthe

meanandvarianceofthefeaturemapswiththestyletar-

get.Instead,ourInspirationLayermatchesthesecondorder

statisticsofGramMatricesforthefeaturemapsatmultiple

scales.Wealsoexploreapplyingourmethodtonewstyles

(notseenduringtraining)inFigure8.

5.ExperimentalResults

Inthissection,wemakequalitativecomparisonofthe

proposedMSG-Netwithexistingapproachesforstyletrans-

fertask.Weconsiderthegold-standardoptimizationbased

workofGatysetal.[12]andthestate-of-the-artfeed-

forwardapproachofJohnsonetal.[21]withInstanceNor-

malization[35].Additionally,weshowMSG-Netcanbe

appliedtotexturesynthesistask.

(a)input(b)Gatys[12](c)Johnson[21](d)MSG-Net(ours)

Figure5:Comparisontoexistingapproaches.OurproposedMSG-Nethasdramaticallyimprovedthescalabilityofthe

generativenetworkandachievescomparableresultswithexistingworkforeachindividualstyle.

5.1.StyleTransfer

Baselines.Weadoptapubliclyavailableimplementa-

tion[20]ofGatysetal.[12]asagoldenstandardbaseline

ofoptimizationbasedapproaches.Giventhecontentim-

agexc,styleimagexsandapre-trained16-layerVGG[33]

asadescriptivenetworkF.Consideringthecontentrecon-

structionatc-thscaleandthestylereconstructionatscales

i∈{1,...K},animagey?isinitializedwithwhilenoiseand

updatedbyiterativelyminimizingtheobjectivefunction:

y?=argmin

y{λckF

c(y)?Fc(xc)k2

(5)

+λskG(F

i(y))?G(Fi(xs))k2

i=1

+λTV`TV(y)}.

TheoptimizationisperformedusingL-BFGSsolverfor500

iterations.Themethodisslowduetoforwardingandback-

wardingtheimageyateachiteration.

Wealsocompareourapproachwithanimprovedversion

ofrecentfeed-forwardwork[21]usinginstancenormaliza-

tion[35]asthestate-of-the-artapproach,wherewetrain

eachfeed-forwardnetworkindividuallyfordifferentstyle

imagesusingthelossfunctionsameasEquation4.

MethodDetails.16-layerVGGnetwork[33]areused

inthedescriptivenetworkaspartoftheMSG-Netandthe

lossnetworkinEquation4.Forbothnetworks,weconsider

thestylerepresentationat4differentscalesusingthelay-

ersReLU12,ReLU22,ReLU33andReLU43.Forloss

network,weconsiderthecontentrepresentationatthelayer

ReLU22.TheMicrosoftCOCOdataset[26]isusedasthe

contentimageimagesetXc,whichhasaround80,000nat-

uralimages.Wecollect20styleimages,choosingthose

thataretypicallyusedinpreviouswork.42-layerMSG-Net

isusedinthisexperiment(k=2asshowninTable1).

Wefollowthework[21,34]andadoptAdam[22]totrain

thenetworkwithalearningrateof1×10?

3.Forlearning

thenetwork,weusethelossfunctionasdescribedinEqua-

tion4withthebalancingweightsλc=1,λs=5,λTV=

1×10?6forcontent,styleandtotalregularization.Were-

sizethecontentimagesxc～Xcto256×256andlearn

thenetworkwithabatchsizeof4for4,000×Nstyleiter-

ations.Weiterativelyupdatethestyleimagexsevery20

iterations2withsizeof512×512

3.Aftertraining,theMSG-

Netcanacceptarbitraryinputimagesize,andweresizethe

inputimagesto512alongthelongedgebeforefedintothe

networkduringthetestinthisexperiment.Ourimplemen-

tationisbasedonTorch[6],whichtakesroughly8hoursfor

training20-styleMSG-NetmodelonasingleTitanXPas-

calGPU,whichis10timesfasterthanJohnson’sapproach

forthesamenumberofstylesbecausethejointoptimization

acrossdifferentstylesbene?tsfromeachother.

QualitativeComparisonWekeepthesamehyperpa-

rametersforMSG-Netandbaselines,suchasbalancing

weights.ForoptimizationbasedGatys’approach[12],we

stoptheoptimizationafter500iterationsthatistypically

morethanenough.Thefeed-forwardbaselinemodelus-

ingJohnson’sapproach[21]islearnedfor40,000iterations

foreachstylemodelassameasintheoriginalpaper.We

2Thenumberof20isnotempiricallychosen,wedidagridsearch

varyingfrom4to20andchosetheonewithbestquality.

3Weusealargestyleimage,whichprovidesmoretexturedetailsand

improvesthequalitycomparedtothesizeof256×256.

Figure6:DiverseofimagesthataregeneratedusingasingleMSG-Net.Firstrowshowstheinputcontentimagesandthe

otherrowsaregeneratedimageswithdifferentstyletargets(?rstcolumn).

Figure7:TexturesynthesisexamplesthataregeneratedusingasingleMSG-Netandthecorrespondingtexturetargets.

Figure8:TestamodelwiththestyletargetsthathaveNOT

beencoveredduringthetraining.

networkandtrainingstrategyasinstyletransfertask.We

followpreviouswork[25,34]andfeedtheMSG-Netwith

randomnoisetotriggerthetexturesynthesis.Lietal.[25]

suggeststhatusingBrownnoisethatcontainingspectrumof

frequenciesproducesthetextureswithbetterqualitythan

usingwhitenoise.Wefurtherdiscoverthattriggeringthe

networkusingastructurednoisethatcontainingbothdif-

ferentfrequenciesandvariousintensitiesresultsintextures

withevenbetterquality.Examplesoftexturesynthesisare

showninFigure7andtherightcolumnofFigure6

traintheproposed20-styleMSG-Netfor80,000iterations

(4,000×N

style).Astandardhistogrammatchingisaddedas

apost-processingtoalltheapproaches,whichaddsaslight

improvementoftheperceivedcolor.Figure5showsthe

comparisonofthreeapproachesusingpopularpicturesof

LenaandAtlantacitywithtwopopularstyles.Wecansee

thattheoptimizationbasedapproachismorecolorfuland

hassharpertexturesthanfeed-forwardapproaches(John-

son’sandMSG-Net),suchasthebuildingsandtheroads

inthepicturesofAtlanta(bottomrow).Feed-forwardap-

proacheshavetheadvantagesofpreservingsemanticcon-

sistencies,suchashumanfaceandhairsinthepictureof

Lena(toprow),becausethemodelsaretrainedonMS-

COCOdatasetwhichcontainsalotofcontextinformation

ofreal-wordimages.Moreexamplesofthetransferedim-

agesusingMSG-NetareshowninFigure6.Ingeneral,our

proposedMSG-Netdramaticallyimprovethescalabilityof

thenetworkforstyletransferandhasatleastcomparable

qualitycomparingtoexistingwork.

5.2.TextureSynthesis

Thetexturesynthesiscanberegardedasaspecialcase

ofstyletransfer,inwhichthecontentimageisnotinvolved

andthegoalistoreconstructthetextures.Inthissection,

weexplorehowourapproachcanbeappliedontexturesyn-

thesistask.10texturesareselectedfromDescribableTex-

tureDataset(DTD)[5]asthetargets.Weadoptthesame

6.ConclusionandDiscussion

Theproblemofmulti-styletransferinreal-timeusinga

singlenetworkhasbeenaddressedinthispaper.Wetackle

thetechnicaldif?cultiesofexistingapproachesbyintro-

ducinganovelInspirationLayer,whichexplicitlymatches

thetargetstylesatruntime.Wehavedemonstratedthat

theInspirationLayerembeddedinourproposedMulti-style

GenerativeNetworkenables20stylestransferwithoutlos-

ingquality.TheproposedMSG-Netputsthecomputational

burdenofmatchingfeaturestatisticsinthetrainingprocess,

whichenablesreal-timetransfer.Itrunsat17.8frame/sec

fortheinputimageofsize256×256onasingleTitanX

PascalGPU.

However,dealingwithunknownstyleisstillanun-

solvedproblemforfeed-forwardapproaches.Thestrategy

ofputtingburdenintotraininglimitstheperformanceon

unknownstyleimagesasshowninFigure8.Thiscanbe

potentiallysolvedbylargevarietiesoftrainingstylesand

betterstylerepresentationmodel,sothattheinterpolation

betweendifferentstylescanbelearnedbythegeneratornet-

work.

Acknowledgment

ThisworkwassupportedbyNationalScienceFounda-

tionawardIIS-1421134.AGPUusedforthisresearchwas

donatedbytheNVIDIACorporation.

References

[1]P.BurtandE.Adelson.Thelaplacianpyramidasacom-

pactimagecode.IEEETransactionsoncommunications,

31(4):532–540,1983.1

[2]T.Che,Y.Li,A.P.Jacob,Y.Be

人人文庫(kù)> 全部分類> 教育資料 > 輔導(dǎo)培訓(xùn)

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

MSG-net-arxiv人工智能機(jī)器學(xué)習(xí)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

MSG-net-arxiv人工智能機(jī)器學(xué)習(xí)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔