当前位置：首页 > news >正文

怎么样做销往非洲太阳能板的网站青岛网站设计价格

news 2025/11/14 19:35:04

怎么样做销往非洲太阳能板的网站,青岛网站设计价格,陈村网站建设,网上花店网页制作代码手把手利用PyTorch实现扩散模型DDPM DDPM代码实现神经网络定义辅助函数位置嵌入ResNet block注意力模块分组归一化Conditional U-Net 定义前向扩散过程定义PyTorch数据集DataLoader采样训练模型采样后续阅读参考链接上一篇博文已经手把手推导了扩散模型DDPM#xff0c;本文利… 手把手利用PyTorch实现扩散模型DDPM DDPM代码实现神经网络定义辅助函数位置嵌入ResNet block注意力模块分组归一化Conditional U-Net 定义前向扩散过程定义PyTorch数据集DataLoader采样训练模型采样后续阅读参考链接上一篇博文已经手把手推导了扩散模型DDPM本文利用PyTorch在Google Colab notebook中实现扩散模型。 DDPM代码实现注意扩散模型有几种观点。在这里我们采用discrete-time潜变量模型的观点但请务必查看其他观点。神经网络神经网络需要在特定的时间步接收噪声图像并返回预测的噪声。需要注意的是预测的噪声是一个与输入图像具有相同大小/分辨率的张量。因此从技术上讲网络接收和输出具有相同形状的张量。在这种情况下可以使用什么类型的神经网络呢在这里通常使用的方法与自编码器Autoencoder非常相似你可能还记得它出现在典型的“intro to deep learning门”教程中。Autoencoders在encoder和decoder之间具有一个所谓的“bottleneck”层。编码器首先将图像编码为较小的隐藏表示称为“bottleneck”然后解码器将该隐表示解码回实际图像。这迫使网络在瓶颈层中仅保留最重要的信息。在体系结构方面DDPM 的作者采用了一个 U-Net 结构该结构由Ronneberger等人2015引入当时在医学图像分割领域取得了最先进的结果。这个网络像任何自编码器一样由一个位于中间的瓶颈层组成确保网络仅学习最重要的信息。重要的是它在编码器和解码器之间引入了残差连接大大改善了梯度流动灵感来自于 He等人2015年的 ResNet。如图所示U-Net 模型首先对输入进行下采样即在空间分辨率方面使输入变小然后进行上采样。接下来我们逐步实现这个网络。 !pip install -q -U einops datasets matplotlib tqdm导入相关依赖库 import math from inspect import isfunction from functools import partial%matplotlib inlineimport matplotlib.pyplot as plt from tqdm.auto import tqdm from einops import rearrange, reduce from einops.layers.torch import Rearrangeimport torch from torch import nn, einsum import torch.nn.functional as F定义辅助函数首先定义一些在实现神经网络时将使用的辅助函数和类。重要的是定义了一个残差模块它将输入简单地添加到特定函数的输出中换句话说将残差连接添加到特定函数中。 def exists(x):return x is not Nonedef default(val, d):if exists(val):return valreturn d() if isfunction(d) else ddef num_to_groups(num, divisor):groups num // divisorremainder num % divisorarr [divisor] * groupsif remainder 0:arr.append(remainder)return arrclass Residual(nn.Module):def __init__(self, fn):super().__init__()self.fn fndef forward(self, x, *args, **kwargs):return self.fn(x, *args, **kwargs) x我们还为上采样和下采样操作定义了别名。 def Upsample(dim, dim_outNone):return nn.Sequential(nn.Upsample(scale_factor2, modenearest),nn.Conv2d(dim, default(dim_out, dim), 3, padding1),)def Downsample(dim, dim_outNone):# 不再有阶梯卷积或池return nn.Sequential(Rearrange(b c (h p1) (w p2) - b (c p1 p2) h w, p12, p22),nn.Conv2d(dim * 4, default(dim_out, dim), 1),)位置嵌入由于神经网络的参数在不同时间噪声水平之间共享作者采用了受 TransformerVaswani et al., 2017启发的正弦位置嵌入sinusoidal position embeddings来编码 t t t。这使得神经网络可以“know”它正在处理批次中的每个图像的特定时间步噪声水平。 SinusoidalPositionEmbeddings 模块接受形状为batch_size1的张量作为输入即批次中多个带噪声图像的噪声水平并将其转换为形状为batch_sizedim的张量其中 dim 是位置嵌入的维度。然后将其添加到每个残差块中我们将在后面看到。 class SinusoidalPositionEmbeddings(nn.Module):def __init__(self, dim):super().__init__()self.dim dimdef forward(self, time):device time.devicehalf_dim self.dim // 2embeddings math.log(10000) / (half_dim - 1)embeddings torch.exp(torch.arange(half_dim, devicedevice) * -embeddings)embeddings time[:, None] * embeddings[None, :]embeddings torch.cat((embeddings.sin(), embeddings.cos()), dim-1)return embeddings总之就是将 t t t 编码为embedding和原本的输入一起进入网络让网络“知道”当前的输入属于哪个step。 ResNet block 接下来定义 U-Net 模型的核心构建块。DDPM 的作者使用了 Wide ResNet blockZagoruyko et al., 2016但 Phil Wang 将标准卷积层替换为“weight standardized”版本这与分组归一化的结合效果更好有关详细信息请参见Kolesnikov et al., 2019。 class WeightStandardizedConv2d(nn.Conv2d):https://arxiv.org/abs/1903.10520weight standardization purportedly works synergistically with group normalizationdef forward(self, x):eps 1e-5 if x.dtype torch.float32 else 1e-3weight self.weightmean reduce(weight, o ... - o 1 1 1, mean)var reduce(weight, o ... - o 1 1 1, partial(torch.var, unbiasedFalse))normalized_weight (weight - mean) * (var eps).rsqrt()return F.conv2d(x,normalized_weight,self.bias,self.strides,self.padding,self.dilation,self.groups,)class Block(nn.Module):def __init__(self, dim, dim_out, groups8):super().__init__()self.proj WeightStandardizedConv2d(dim, dim_out, 3, padding1)self.norm nn.GroupNorm(groups, dim_out)self.act nn.SiLU()def forward(self, x, scale_shiftNone):x self.proj(x)x self.norm(x)if exists(scale_shift):scale, shift scale_shiftx x * (scale 1) shiftx self.act(x)return xclass ResnetBlock(nn.Module):https://arxiv.org/abs/1512.03385def __init__(self, dim, dim_out, *, time_emb_dimNone, groups8):super().__init__()self.mlp (nn.Sequential(nn.SiLU(), nn.Linear(time_emb_dim, dim_out * 2))if exists(time_emb_dim)else None)self.block1 Block(dim, dim_out, groupsgroups)self.block2 Block(dim_out, dim_out, groupsgroups)self.res_conv nn.Conv2d(dim, dim_out, 1) if dim ! dim_out else nn.Identity()def forward(self, x, time_emb None):scale_shift Noneif exists(self.mlp) and exists(time_emb):time_emb self.mlp(time_emb)time_emb rearrange(time_emb, b c - b c 1 1)scale_shift time_emb.chunk(2, dim1)h self.block1(x, scale_shiftscale_shift)h self.block2(h)return h self.res_conv(x)注意力模块现在定义注意力模块这是 DDPM 的作者在卷积块之间添加的。注意力是著名的 Transformer 架构Vaswani et al., 2017的构建块在人工智能的各个领域从自然语言处理和视觉到蛋白质折叠都取得了巨大的成功。Phil Wang 使用了两种注意力的变体一种是常规的多头自注意力multi-head self-attention就像在 Transformer 中使用的那样另一种是线性注意力变体linear attention variantShen et al., 2018其时间和内存要求与序列长度线性缩放而不是常规注意力的二次缩放。关于注意力机制的详细解释请参阅 Jay Allamar 的精彩博客文章。 class Attention(nn.Module):def __init__(self, dim, heads4, dim_head32):super().__init__()self.scale dim_head ** -0.5self.heads headshidden_dim dim_head * headsself.to_qkv nn.Conv2d(dim, hidden_dim * 3, 1, biasFalse)self.to_out nn.Conv2d(hidden_dim, dim, 1)def forward(self, x):b, c, h, w x.shapeqkv self.to_qkv(x).chunk(3, dim1)q, k, v map(lambda t: rearrange(t, b (h c) x y - b h c (x y), hself.heads), qkv)q q * self.scalesim einsum(b h d i, b h d j - b h i j, q, k)sim sim - sim.amax(dim-1, keepdimTrue).detach()attn sim.softmax(dim-1)out einsum(b h i j, b h d j - b h i d, attn, v)out rearrange(out, b h (x y) d - b (h d) x y, xh, yw)return self.to_out(out)class LinearAttention(nn.Module):def __init__(self, dim, heads4, dim_head32):super().__init__()self.scale dim_head ** -0.5self.heads headshidden_dim dim_head * headsself.to_qkv nn.Conv2d(dim, hidden_dim * 3, 1, biasFalse)self.to_out nn.Sequential(nn.Conv2d(hidden_dim, dim, 1), nn.GroupNorm(1, dim))def forward(self, x):b, c, h, w x.shapeqkv self.to_qkv(x).chunk(3, dim1)q, k, v map(lambda t: rearrange(t, b (h c) x y - b h c (x y), hself.heads), qkv)q q.softmax(dim2)k k.softmax(dim-1)q q * self.scalecontext torch.einsum(b h d n, b h e n - b h d e, k, v)out torch.einsum(b h d e, b h d n - b h e n, context, q)out rearrange(out, b h c (x y) - b (h c) x y, hself.heads, xh, yw)return self.to_out(out)分组归一化 DDPM 的作者在 U-Net 的卷积/注意力层之间交错使用了分组归一化group normalizationWu et al., 2018。在下面定义了一个 PreNorm 类该类将在注意力层之前应用分组归一化正如我们将在后面看到的。值得注意的是关于在 Transformer 中是在注意力之前还是之后应用归一化一直存在争议。 class PreNorm(nn.Module):def __init__(self, dim, fn):super().__init__()self.fn fnself.norm nn.GroupNorm(1, dim)def forward(self, x):x self.norm(x)return self.fn(x)Conditional U-Net 现在我们已经定义了所有构建块position embeddingsResNet blocksattention和group normalization现在该定义整个神经网络了。回想一下网络 ϵ θ ( x t , t ) \boldsymbol{\epsilon}_\theta\left(\mathbf{x_t}, t\right) ϵθ(xt,t)的工作是获取一批有噪声的图像及其各自的噪声水平并输出添加到输入的噪声。更正式地说网络采集一批形状为(batch_size, num_channels, height, width)的噪声图像和一批形状为 (batch_size, 1)的噪声水平作为输入并返回一个形状为 (batch_size, num_channels, height, width)的张量网络构建如下首先在一批有噪声的图像上应用卷积层并计算噪声水平的位置嵌入position embeddings然后执行一系列的下采样阶段downsampling stages。每个下采样阶段由2个ResNet blocks groupnorm attentionresidual connectiona downsample operation组成在网络的中间再次应用ResNet block与attention交错接下来执行一系列上采样阶段upsampling stages。每个上采样阶段由2个ResNet blocks groupnorm attention residual connection an upsample operation组成最后在一个卷积层后面应用一个ResNet block。最终神经网络就像乐高积木一样层层堆叠但了解它们是如何工作的很重要。 class Unet(nn.Module):def __init__(self, dim, init_dimNone, out_dimNone, dim_mults(1, 2, 4, 8), channels3, self_conditionFalse,resnet_block_groups4):super().__init__()# determine dimensionsself.channels channelsself.self_condition self_conditioninput_channels channels * (2 if self_condition else 1)init_dim default(init_dim, dim)self.init_conv nn.Conv2d(input_channels, init_dim, 1, padding0) # changed to 1 and 0 from 7,3dims [init_dim, *map(lambda m: dim * m, dim_mults)]in_out list(zip(dims[:-1], dims[1:]))block_klass partial(ResnetBlock, groupsresnet_block_groups)# time embeddingstime_dim dim * 4self.time_mlp nn.Sequential(SinusoidalPositionEmbeddings(dim),nn.Linear(dim, time_dim),nn.GELU(),nn.Linear(time_dim, time_dim),)# layersself.downs nn.ModuleList([])self.ups nn.ModuleList([])num_resolutions len(in_out)for ind, (dim_in, dim_out) in enumerate(in_out):is_last ind (num_resolutions - 1)self.downs.append(nn.ModuleList([block_klass(dim_in, dim_in, time_emb_dimtime_dim),block_klass(dim_in, dim_in, time_emb_dimtime_dim),Residual(PreNorm(dim_in, LinearAttention(dim_in))),Downsample(dim_in, dim_out)if not is_lastelse nn.Conv2d(dim_in, dim_out, 3, padding1),]))mid_dim dims[-1]self.mid_block1 block_klass(mid_dim, mid_dim, time_emb_dimtime_dim)self.mid_attn Residual(PreNorm(mid_dim, Attention(mid_dim)))self.mid_block2 block_klass(mid_dim, mid_dim, time_emb_dimtime_dim)for ind, (dim, dim_out) in enumerate(reversed(in_out)):is_last ind (len(in_out) - 1)self.ups.append(nn.ModuleList([block_klass(dim_out dim_in, dim_out, time_emb_dimtime_dim),block_klass(dim_out dim_in, dim_out, time_emb_dimtime_dim),Residual(PreNorm(dim_out, LinearAttention(dim_out))),Upsample(dim_out, dim_in)if not is_lastelse nn.Conv2d(dim_out, dim_in, 3, padding1),]))self.out_dim default(out_dim, channels)self.final_res_block block_klass(dim * 2, dim, time_emb_dimtime_dim)self.final_conv nn.Conv2d(dim, self.out_dim, 1)def forward(self, x, time, x_self_condNone):if self.self_condition:x_self_cond default(x_self_cond, lambda: torch.zeros_like(x))x torch.cat((x_self_cond, x), dim1)x self.init_conv(x)r x.clone()t self.time_mlp(time)h []for block1, block2, attn, downsample in self.downs:x block1(x, t)h.append(x)x block2(x, t)x attn(x)h.append(x)x downsample(x)x self.mid_block1(x, t)x self.mid_attn(x)x self.mid_block2(x, t)for block1, block2, attn, upsample in self.ups:x torch.cat((x, h.pop()), dim1)x block1(x, t)x torch.cat((x, h.pop()), dim1)x block2(x, t)x attn(x)x upsample(x)x torch.cat((x, r), dim1)x self.final_res_block(x, t)return self.final_conv(x)定义前向扩散过程 forward diffusion process在 T T T个时间步内逐渐将噪声从真实分布添加到图像中这是根据variance schedule发生的。最初的DDPM作者采用了linear schedule 我们将前向过程的方差设置为线性增加的常数 from β 1 1 0 − 4 \beta_110^{-4} β110−4 to β T 0.02 \beta_T0.02 βT0.02. 然而在Nichol et al.2021中表明使用cosine schedule可以获得更好的结果。下面我们定义 T T T个时间步的不同的schedule我们稍后会选择一个 def cosine_beta_schedule(timesteps, s0.008):cosine schedule as proposed in https://arxiv.org/abs/2102.09672steps timesteps 1x torch.linspace(0, timesteps, steps)alphas_cumprod torch.cos(((x / timesteps) s) / (1 s) * torch.pi * 0.5) ** 2alphas_cumprod alphas_cumprod / alphas_cumprod[0]betas 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])return torch.clip(betas, 0.0001, 0.9999)def linear_beta_schedule(timesteps):beta_start 0.0001beta_end 0.02return torch.linspace(beta_start, beta_end, timesteps)def quadratic_beta_schedule(timesteps):beta_start 0.0001beta_end 0.02return torch.linspace(beta_start**0.5, beta_end**0.5, timesteps) ** 2def sigmoid_beta_schedule(timesteps):beta_start 0.0001beta_end 0.02betas torch.linspace(-6, 6, timesteps)return torch.sigmoid(betas) * (beta_end - beta_start) beta_start首先使用T300个时间步的linear schedule并从 β t \beta_t βt中定义我们需要的变量例如方差的累积乘积KaTeX parse error: Undefined control sequence: \bat at position 1: \̲b̲a̲t̲{\alpha}_t。下面的每个变量都只是一维张量存储从 t t t到 T T T的数值。注意我们还定义了一个extract函数它允许我们按照 t t t提取一个批次的索引。 timesteps 300# define beta schedule betas linear_beta_schedule(timestepstimesteps)# define alphas alphas 1. - betas alphas_cumprod torch.cumprod(alphas, axis0) alphas_cumprod_prev F.pad(alphas_cumprod[:-1], (1, 0), value1.0) sqrt_recip_alphas torch.sqrt(1.0 / alphas)# calculations for diffusion q(x_t | x_{t-1}) and others sqrt_alphas_cumprod torch.sqrt(alphas_cumprod) sqrt_one_minus_alphas_cumprod torch.sqrt(1. - alphas_cumprod)# calculations for posterior q(x_{t-1} | x_t, x_0) posterior_variance betas * (1. - alphas_cumprod_prev) / (1. - alphas_cumprod)def extract(a, t, x_shape):batch_size t.shape[0]out a.gather(-1, t.cpu())return out.reshape(batch_size, *((1,) * (len(x_shape) - 1))).to(t.device)我们将用猫图像说明如何在扩散过程的每个时间步中添加噪声 from PIL import Image import requestsurl http://images.cocodataset.org/val2017/000000039769.jpg image Image.open(requests.get(url, streamTrue).raw) # PIL image of shape HWC image将噪声添加到Pytorch张量而不是Pillow Images中。首先定义能够将PIL图像转换为Pytorch张量可以在上面添加噪声的图像转换反之亦然。这些转换非常简单我们首先通过除以255其结果能在[0,1]范围,然后确保它们在[-1,1]范围。DDPM 文中提到: 我们假设图像数据由在集合{ 0 ,1 、. . . , 255}中的整数组成然后线性缩放到[−1, 1]。这确保了神经网络逆向过程能够从标准正态先验 p ( x T ) p(x_T) p(xT)开始且一致缩放的输入上运行。 from torchvision.transforms import Compose, ToTensor, Lambda, ToPILImage, CenterCrop, Resizeimage_size 128 transform Compose([Resize(image_size),CenterCrop(image_size),ToTensor(), # turn into torch Tensor of shape CHW, divide by 255Lambda(lambda t: (t * 2) - 1),])x_start transform(image).unsqueeze(0) x_start.shape输出结果 torch.Size([1, 3, 128, 128]) 另外还定义了反向变换reverse transform它接收一个PyTorch张量其中包含[-1,1]并将它们重新转换回PIL图像 import numpy as npreverse_transform Compose([Lambda(lambda t: (t 1) / 2),Lambda(lambda t: t.permute(1, 2, 0)), # CHW to HWCLambda(lambda t: t * 255.),Lambda(lambda t: t.numpy().astype(np.uint8)),ToPILImage(), ]) reverse_transform(x_start.squeeze())现在可以像论文中定义前向扩散过程: # forward diffusion (using the nice property) def q_sample(x_start, t, noiseNone):if noise is None:noise torch.randn_like(x_start)sqrt_alphas_cumprod_t extract(sqrt_alphas_cumprod, t, x_start.shape)sqrt_one_minus_alphas_cumprod_t extract(sqrt_one_minus_alphas_cumprod, t, x_start.shape)return sqrt_alphas_cumprod_t * x_start sqrt_one_minus_alphas_cumprod_t * noise在特定的时间步中进行测试: def get_noisy_image(x_start, t):# add noisex_noisy q_sample(x_start, tt)# turn back into PIL imagenoisy_image reverse_transform(x_noisy.squeeze())return noisy_image# take time step t torch.tensor([40])get_noisy_image(x_start, t)可视化不同时间步的结果 import matplotlib.pyplot as plt# use seed for reproducability torch.manual_seed(0)# source: https://pytorch.org/vision/stable/auto_examples/plot_transforms.html#sphx-glr-auto-examples-plot-transforms-py def plot(imgs, with_origFalse, row_titleNone, **imshow_kwargs):if not isinstance(imgs[0], list):# Make a 2d grid even if theres just 1 rowimgs [imgs]num_rows len(imgs)num_cols len(imgs[0]) with_origfig, axs plt.subplots(figsize(200,200), nrowsnum_rows, ncolsnum_cols, squeezeFalse)for row_idx, row in enumerate(imgs):row [image] row if with_orig else rowfor col_idx, img in enumerate(row):ax axs[row_idx, col_idx]ax.imshow(np.asarray(img), **imshow_kwargs)ax.set(xticklabels[], yticklabels[], xticks[], yticks[])if with_orig:axs[0, 0].set(titleOriginal image)axs[0, 0].title.set_size(8)if row_title is not None:for row_idx in range(num_rows):axs[row_idx, 0].set(ylabelrow_title[row_idx])plt.tight_layout()plot([get_noisy_image(x_start, torch.tensor([t])) for t in [0, 50, 100, 150, 199]])在给定模型的情况下定义损失函数 def p_losses(denoise_model, x_start, t, noiseNone, loss_typel1):if noise is None:noise torch.randn_like(x_start)x_noisy q_sample(x_startx_start, tt, noisenoise)predicted_noise denoise_model(x_noisy, t)if loss_type l1:loss F.l1_loss(noise, predicted_noise)elif loss_type l2:loss F.mse_loss(noise, predicted_noise)elif loss_type huber:loss F.smooth_l1_loss(noise, predicted_noise)else:raise NotImplementedError()return lossdnoise_model就是上面定义的U-Net。在真实噪声和预测噪声之间使用Huber损失。定义PyTorch数据集DataLoader 这里定义一个常规的PyTorch数据集。该数据集仅由真实数据集如Fashion、MNIST、CIFAR-10或ImageNet的图像组成线性缩放至 [ − 1 , 1 ] [-1,1] [−1,1]。每个图像都被调整为相同的大小同时是随机水平翻转的。从论文中我们在 CIFAR10 的训练过程中使用了随机水平翻转我们尝试了有翻转和没有翻转的训练发现翻转可以稍微提高样本质量。在这里使用Datasets库轻松地从hub加载 Fashion MNIST 数据集。该数据集由已经具有相同分辨率的图像组成即 28x28。 from datasets import load_dataset# load dataset from the hub dataset load_dataset(fashion_mnist) image_size 28 channels 1 batch_size 128接下来定义一个函数将在整个数据集上即时应用它。为此使用该with_transform功能。该函数只是应用了一些基本的图像预处理随机水平翻转、重新缩放并最终使它们在[-1,1]范围。 from torchvision import transforms from torch.utils.data import DataLoader# define image transformations(e.g. using torchvision) transform Compose([transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Lambda(lambda t: (t * 2) -1) ])# define function def transforms(examples):examples[pixel_values] [transform(image.convert(L)) for image in examples[image]]del examples[image]return examplestransformed_dataset dataset.with_transform(transforms).remove_columns(label)# create dataloader dataloader DataLoader(transformed_dataset[train], batch_sizebatch_size, shuffleTrue)batch next(iter(dataloader)) print(batch.keys()) # dict_keys([pixel_values])采样由于将在训练期间从模型中采样来跟踪进度因此定义了下面的代码。采样方法总结如下从扩散模型生成新图像是通过逆扩散过程来实现的从 T T T开始从高斯分布中采样纯噪声然后使用神经网络逐渐去噪使用它学到的条件概率直到在时间步 t 0 t0 t0结束。如上所示可以得到一个稍微降噪的图像 x t − 1 x_{t-1} xt−1通过使用我们的噪声预测器插入均值的重新参数化。注意方差是提前知道的。理想情况下最终会得到一张看起来像是来自真实数据分布的图像。下面的代码实现了这一点。 torch.no_grad() def p_sample(model, x, t, t_index):betas_t extract(betas, t, x.shape)sqrt_one_minus_alphas_cumprod_t extract(sqrt_one_minus_alphas_cumprod, t, x.shape)sqrt_recip_alphas_t extract(sqrt_recip_alphas, t, x.shape)# Equation 11 in the paper# Use our model (noise predictor) to predict the meanmodel_mean sqrt_recip_alphas_t * (x - betas_t * model(x, t) / sqrt_one_minus_alphas_cumprod_t)if t_index 0:return model_meanelse:posterior_variance_t extract(posterior_variance, t, x.shape)noise torch.randn_like(x)# Algorithm 2 line 4:return model_mean torch.sqrt(posterior_variance_t) * noise # Algorithm 2 (including returning all images) torch.no_grad() def p_sample_loop(model, shape):device next(model.parameters()).deviceb shape[0]# start from pure noise (for each example in the batch)img torch.randn(shape, devicedevice)imgs []for i in tqdm(reversed(range(0, timesteps)), descsampling loop time step, totaltimesteps):img p_sample(model, img, torch.full((b,), i, devicedevice, dtypetorch.long), i)imgs.append(img.cpu().numpy())return imgstorch.no_grad() def sample(model, image_size, batch_size16, channels3):return p_sample_loop(model, shape(batch_size, channels, image_size, image_size))训练模型接下来以常规 PyTorch 方式训练模型。我们还定义了一些逻辑来定期保存生成的图像使用上面定义的sample定义的方法。 from pathlib import Pathdef num_to_groups(num, divisor):groups num // divisorremainder num % divisorarr [divisor] * groupsif remainder 0:arr.append(remainder)return arrresults_folder Path(./results) results_folder.mkdir(exist_ok True) save_and_sample_every 1000下面定义模型并将其移动到GPU还定义了一个标准优化器Adam。 from torch.optim import Adamdevice cuda if torch.cuda.is_available() else cpumodel Unet(dimimage_size,channelschannels,dim_mults(1, 2, 4,) ) model.to(device)optimizer Adam(model.parameters(), lr1e-3)开始训练: from torchvision.utils import save_imageepochs 6for epoch in range(epochs):for step, batch in enumerate(dataloader):optimizer.zero_grad()batch_size batch[pixel_values].shape[0]batch batch[pixel_values].to(device)# Algorithm 1 line 3: sample t uniformally for every example in the batcht torch.randint(0, timesteps, (batch_size,), devicedevice).long()loss p_losses(model, batch, t, loss_typehuber)if step % 100 0:print(Loss:, loss.item())loss.backward()optimizer.step()# save generated imagesif step ! 0 and step % save_and_sample_every 0:milestone step // save_and_sample_everybatches num_to_groups(4, batch_size)all_images_list list(map(lambda n: sample(model, batch_sizen, channelschannels), batches))all_images torch.cat(all_images_list, dim0)all_images (all_images 1) * 0.5save_image(all_images, str(results_folder / fsample-{milestone}.png), nrow 6)训练过程 Loss: 0.5570111274719238 Loss: 0.06583500653505325 Loss: 0.06006840616464615 Loss: 0.051015421748161316 Loss: 0.0394190177321434 Loss: 0.04075610265135765 Loss: 0.039987701922655106 Loss: 0.03415030241012573 Loss: 0.030019590631127357 Loss: 0.036297883838415146 Loss: 0.037256866693496704 Loss: 0.03864285722374916 Loss: 0.03298967331647873 Loss: 0.03331328555941582 Loss: 0.027535393834114075 Loss: 0.03803558647632599 Loss: 0.03721949830651283 Loss: 0.03478413075208664 Loss: 0.03918925300240517 Loss: 0.03608154132962227 Loss: 0.027622627094388008 Loss: 0.02948344498872757 Loss: 0.029868196696043015 Loss: 0.03154699504375458 Loss: 0.029723389074206352 Loss: 0.039195798337459564 Loss: 0.032130151987075806 Loss: 0.031276602298021317 Loss: 0.03440115600824356 Loss: 0.030476151034235954采样要从模型中采样可以使用上面定义的采样函数 # sample 64 images samples sample(model, image_sizeimage_size, batch_size64, channelschannels)# show a random one random_index 5 plt.imshow(samples[-1][random_index].reshape(image_size, image_size, channels), cmapgray)看起来模型能够生成一件漂亮的 T 恤请记住用来训练的数据集的分辨率非常低28x28。还可以创建去噪过程的 gif 图像 import matplotlib.animation as animationrandom_index 53fig plt.figure() ims [] for i in range(timesteps):im plt.imshow(samples[i][random_index].reshape(image_size, image_size, channels), cmapgray, animatedTrue)ims.append([im])animate animation.ArtistAnimation(fig, ims, interval50, blitTrue, repeat_delay1000) animate.save(diffusion.gif) plt.show()后续阅读注意DDPM 论文表明扩散模型是无条件图像生成的一个promising的方向。从DDPM提出到现在已经极大地得到了改进尤其是在文本条件图像生成方面。下面列出了一些截至到2022年6月7日之前的重要的但远非详尽的后续工作 Improved Denoising Diffusion Probabilistic Models (Nichol et al., 2021): 发现学习条件分布的方差除均值外有助于提高性能。Cascaded Diffusion Models for High Fidelity Image Generation (Ho et al., 2021): 引入了级联扩散它包含多个扩散模型的pipeline可生成分辨率不断提高的图像用于高保真图像合成。Diffusion Models Beat GANs on Image Synthesis (Dhariwal et al., 2021): 表明扩散模型可以通过改进 U-Net 架构以及引入分类器引导达到比SOTA生成式模型更好的效果。Classifier-Free Diffusion Guidance (Ho et al., 2021): 表明不需要使用分类器来指导扩散模型只需要使用单个神经网络联合训练条件扩散模型和无条件扩散模型。Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2) (Ramesh et al., 2022): 使用先验将文字说明转换为 CLIP 图像嵌入然后使用扩散模型将其解码为图像。Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (ImageGen) (Saharia et al., 2022): 表明将大型预训练语言模型例如 T5与级联扩散相结合非常适用于文本到图像合成参考链接 The Annotated Diffusion Model带你深入理解扩散模型DDPM扩散模型全新课程扩散模型从0到1实现Denoising Diffusion Probabilitistic Models《Diffusion Models Beat GANs on Image Synthesis》阅读笔记How Diffusion Models WorkDDPM交叉熵损失函数推导DDPMDenoising Diffusion Probabilistic Models扩散模型简述What are Diffusion Models?由浅入深了解Diffusion Model什么是Diffusion模型Probabilistic Diffusion Model概率扩散模型理论与完整PyTorch代码详细解读Denoising Diffusion Probabilistic Model, in Pytorch

查看全文

http://www.zqtcl.cn/news/270321/