当前位置：首页 > news >正文

网站视频你懂我意思吧app黑河做网站公司

news 2025/11/19 19:46:40

网站视频你懂我意思吧app,黑河做网站公司,深圳网络优化推广公司,郑州网站建设zzmshl3.6、训练与测试本文来自开源组织 DataWhale #x1f433; CV小组创作的目标检测入门教程。对应开源项目《动手学CV-Pytorch》的第3章的内容#xff0c;教程中涉及的代码也可以在项目中找到#xff0c;后续会持续更新更多的优质内容#xff0c;欢迎⭐️。如果使用我…3.6、训练与测试本文来自开源组织 DataWhale CV小组创作的目标检测入门教程。对应开源项目《动手学CV-Pytorch》的第3章的内容教程中涉及的代码也可以在项目中找到后续会持续更新更多的优质内容欢迎⭐️。如果使用我们教程的内容或图片请在文章醒目位置注明我们的github主页链接https://github.com/datawhalechina/dive-into-cv-pytorch 3.6.1 模型训练前面的章节我们已经对目标检测训练的各个重要的知识点进行了讲解下面我们需要将整个流程串起来对模型进行训练。目标检测网络的训练大致是如下的流程设置各种超参数定义数据加载模块 dataloader定义网络 model定义损失函数 loss定义优化器 optimizer遍历训练数据预测-计算loss-反向传播首先我们导入必要的库然后设定各种超参数 import time import torch.backends.cudnn as cudnn import torch.optim import torch.utils.data from model import tiny_detector, MultiBoxLoss from datasets import PascalVOCDataset from utils import *device torch.device(cuda if torch.cuda.is_available() else cpu) cudnn.benchmark True# Data parameters data_folder ../../../dataset/VOCdevkit # data files root path keep_difficult True # use objects considered difficult to detect? n_classes len(label_map) # number of different types of objects# Learning parameters total_epochs 230 # number of epochs to train batch_size 32 # batch size workers 4 # number of workers for loading data in the DataLoader print_freq 100 # print training status every __ batches lr 1e-3 # learning rate decay_lr_at [150, 190] # decay learning rate after these many epochs decay_lr_to 0.1 # decay learning rate to this fraction of the existing learning rate momentum 0.9 # momentum weight_decay 5e-4 # weight decay按照上面梳理的流程编写训练代码如下 def main():Training.# Initialize model and optimizermodel tiny_detector(n_classesn_classes)criterion MultiBoxLoss(priors_cxcymodel.priors_cxcy)optimizer torch.optim.SGD(paramsmodel.parameters(),lrlr, momentummomentum,weight_decayweight_decay)# Move to default devicemodel model.to(device)criterion criterion.to(device)# Custom dataloaderstrain_dataset PascalVOCDataset(data_folder,splittrain,keep_difficultkeep_difficult)train_loader torch.utils.data.DataLoader(train_dataset, batch_sizebatch_size,shuffleTrue,collate_fntrain_dataset.collate_fn, num_workersworkers,pin_memoryTrue) # Epochsfor epoch in range(total_epochs):# Decay learning rate at particular epochsif epoch in decay_lr_at:adjust_learning_rate(optimizer, decay_lr_to)# One epochs training train(train_loadertrain_loader,modelmodel,criterioncriterion,optimizeroptimizer,epochepoch)# Save checkpointsave_checkpoint(epoch, model, optimizer)其中我们对单个epoch的训练逻辑进行了封装其具体实现如下 def train(train_loader, model, criterion, optimizer, epoch):One epochs training.:param train_loader: DataLoader for training data:param model: model:param criterion: MultiBox loss:param optimizer: optimizer:param epoch: epoch numbermodel.train() # training mode enables dropoutbatch_time AverageMeter() # forward prop. back prop. timedata_time AverageMeter() # data loading timelosses AverageMeter() # lossstart time.time()# Batchesfor i, (images, boxes, labels, _) in enumerate(train_loader):data_time.update(time.time() - start)# Move to default deviceimages images.to(device) # (batch_size (N), 3, 224, 224)boxes [b.to(device) for b in boxes]labels [l.to(device) for l in labels]# Forward prop.predicted_locs, predicted_scores model(images) # (N, 441, 4), (N, 441, n_classes)# Lossloss criterion(predicted_locs, predicted_scores, boxes, labels) # scalar# Backward prop.optimizer.zero_grad()loss.backward()# Update modeloptimizer.step()losses.update(loss.item(), images.size(0))batch_time.update(time.time() - start)start time.time()# Print statusif i % print_freq 0:print(Epoch: [{0}][{1}/{2}]\tBatch Time {batch_time.val:.3f} ({batch_time.avg:.3f})\tData Time {data_time.val:.3f} ({data_time.avg:.3f})\tLoss {loss.val:.4f} ({loss.avg:.4f})\t.format(epoch,i, len(train_loader),batch_timebatch_time,data_timedata_time, losslosses))del predicted_locs, predicted_scores, images, boxes, labels # free some memory since their histories may be stored完成了代码的编写后我们就可以开始训练模型了训练过程类似下图所示 $ python train.py Loaded base model.Epoch: [0][0/518] Batch Time 6.556 (6.556) Data Time 3.879 (3.879) Loss 27.7129 (27.7129) Epoch: [0][100/518] Batch Time 0.185 (0.516) Data Time 0.000 (0.306) Loss 6.1569 (8.4569) Epoch: [0][200/518] Batch Time 1.251 (0.487) Data Time 1.065 (0.289) Loss 6.3175 (7.3364) Epoch: [0][300/518] Batch Time 1.207 (0.476) Data Time 1.019 (0.282) Loss 5.6598 (6.9211) Epoch: [0][400/518] Batch Time 1.174 (0.470) Data Time 0.988 (0.278) Loss 6.2519 (6.6751) Epoch: [0][500/518] Batch Time 1.303 (0.468) Data Time 1.117 (0.276) Loss 5.4864 (6.4894) Epoch: [1][0/518] Batch Time 1.061 (1.061) Data Time 0.871 (0.871) Loss 5.7480 (5.7480) Epoch: [1][100/518] Batch Time 0.189 (0.227) Data Time 0.000 (0.037) Loss 5.8557 (5.6431) Epoch: [1][200/518] Batch Time 0.188 (0.225) Data Time 0.000 (0.036) Loss 5.2024 (5.5586) Epoch: [1][300/518] Batch Time 0.190 (0.225) Data Time 0.000 (0.036) Loss 5.5348 (5.4957) Epoch: [1][400/518] Batch Time 0.188 (0.226) Data Time 0.000 (0.036) Loss 5.2623 (5.4442) Epoch: [1][500/518] Batch Time 0.190 (0.225) Data Time 0.000 (0.035) Loss 5.3105 (5.3835) Epoch: [2][0/518] Batch Time 1.156 (1.156) Data Time 0.967 (0.967) Loss 5.3755 (5.3755) Epoch: [2][100/518] Batch Time 0.206 (0.232) Data Time 0.016 (0.042) Loss 5.6532 (5.1418) Epoch: [2][200/518] Batch Time 0.197 (0.226) Data Time 0.007 (0.036) Loss 4.6704 (5.0717)剩下的就是等待了 3.6.2 后处理 3.6.2.1 目标框信息解码之前我们的提到过模型不是直接预测的目标框信息而是预测的基于anchor的偏移且经过了编码。因此后处理的第一步就是对模型的回归头的输出进行解码拿到真正意义上的目标框的预测结果。后处理还需要做什么呢由于我们预设了大量的先验框因此预测时在目标周围会形成大量高度重合的检测框而我们目标检测的结果只希望保留一个足够准确的预测框所以就需要使用某些算法对检测框去重。这个去重算法叫做NMS下面我们详细来讲一讲。 3.6.2.2 NMS非极大值抑制 NMS的大致算法步骤如下按照类别分组依次遍历每个类别。当前类别按分类置信度排序并且设置一个最低置信度阈值如0.05低于这个阈值的目标框直接舍弃。当前概率最高的框作为候选框其它所有与候选框的IOU高于一个阈值自己设定如0.5的框认为需要被抑制从剩余框数组中删除。然后在剩余的框里寻找概率第二大的框其它所有与第二大的框的IOU高于设定阈值的框被抑制。依次类推重复这个过程直至遍历完所有剩余框所有没被抑制的框即为最终检测框。图2-29 NMS过程3.6.2.3 代码实现: 整个后处理过程的代码实现位于model.py中tiny_detector类的detect_objects函数中 def detect_objects(self, predicted_locs, predicted_scores, min_score, max_overlap, top_k): Decipher the 441 locations and class scores (output of the tiny_detector) to detect objects.For each class, perform Non-Maximum Suppression (NMS) on boxes that are above a minimum threshold.:param predicted_locs: predicted locations/boxes w.r.t the 441 prior boxes, a tensor of dimensions (N, 441, 4):param predicted_scores: class scores for each of the encoded locations/boxes, a tensor of dimensions (N, 441, n_classes):param min_score: minimum threshold for a box to be considered a match for a certain class:param max_overlap: maximum overlap two boxes can have so that the one with the lower score is not suppressed via NMS:param top_k: if there are a lot of resulting detection across all classes, keep only the top k:return: detections (boxes, labels, and scores), lists of length batch_sizebatch_size predicted_locs.size(0)n_priors self.priors_cxcy.size(0)predicted_scores F.softmax(predicted_scores, dim2) # (N, 441, n_classes)# Lists to store final predicted boxes, labels, and scores for all images in batchall_images_boxes list()all_images_labels list()all_images_scores list()assert n_priors predicted_locs.size(1) predicted_scores.size(1)for i in range(batch_size):# Decode object coordinates from the form we regressed predicted boxes todecoded_locs cxcy_to_xy( gcxgcy_to_cxcy(predicted_locs[i], self.priors_cxcy)) # (441, 4), these are fractional pt. coordinates# Lists to store boxes and scores for this imageimage_boxes list()image_labels list()image_scores list()max_scores, best_label predicted_scores[i].max(dim1) # (441)# Check for each classfor c in range(1, self.n_classes):# Keep only predicted boxes and scores where scores for this class are above the minimum scoreclass_scores predicted_scores[i][:, c] # (441)score_above_min_score class_scores min_score # torch.uint8 (byte) tensor, for indexingn_above_min_score score_above_min_score.sum().item()if n_above_min_score 0:continueclass_scores class_scores[score_above_min_score] # (n_qualified), n_min_score 441class_decoded_locs decoded_locs[score_above_min_score] # (n_qualified, 4)# Sort predicted boxes and scores by scoresclass_scores, sort_ind class_scores.sort(dim0, descendingTrue) # (n_qualified), (n_min_score)class_decoded_locs class_decoded_locs[sort_ind] # (n_min_score, 4)# Find the overlap between predicted boxesoverlap find_jaccard_overlap(class_decoded_locs, class_decoded_locs) # (n_qualified, n_min_score)# Non-Maximum Suppression (NMS)# A torch.uint8 (byte) tensor to keep track of which predicted boxes to suppress# 1 implies suppress, 0 implies dont suppresssuppress torch.zeros((n_above_min_score), dtypetorch.uint8).to(device) # (n_qualified)# Consider each box in order of decreasing scoresfor box in range(class_decoded_locs.size(0)):# If this box is already marked for suppressionif suppress[box] 1:continue# Suppress boxes whose overlaps (with current box) are greater than maximum overlap# Find such boxes and update suppress indicessuppress torch.max(suppress, (overlap[box] max_overlap).to(torch.uint8))# The max operation retains previously suppressed boxes, like an OR operation# Dont suppress this box, even though it has an overlap of 1 with itselfsuppress[box] 0# Store only unsuppressed boxes for this classimage_boxes.append(class_decoded_locs[1 - suppress])image_labels.append(torch.LongTensor((1 - suppress).sum().item() * [c]).to(device))image_scores.append(class_scores[1 - suppress])# If no object in any class is found, store a placeholder for backgroundif len(image_boxes) 0:image_boxes.append(torch.FloatTensor([[0., 0., 1., 1.]]).to(device))image_labels.append(torch.LongTensor([0]).to(device))image_scores.append(torch.FloatTensor([0.]).to(device))# Concatenate into single tensorsimage_boxes torch.cat(image_boxes, dim0) # (n_objects, 4)image_labels torch.cat(image_labels, dim0) # (n_objects)image_scores torch.cat(image_scores, dim0) # (n_objects)n_objects image_scores.size(0)# Keep only the top k objectsif n_objects top_k:image_scores, sort_ind image_scores.sort(dim0, descendingTrue)image_scores image_scores[:top_k] # (top_k)image_boxes image_boxes[sort_ind][:top_k] # (top_k, 4)image_labels image_labels[sort_ind][:top_k] # (top_k)# Append to lists that store predicted boxes and scores for all imagesall_images_boxes.append(image_boxes)all_images_labels.append(image_labels)all_images_scores.append(image_scores)return all_images_boxes, all_images_labels, all_images_scores # lists of length batch_size我们的后处理代码中NMS的部分着实有些绕大家可以参考下Fast R-CNN中的NMS实现更简洁清晰一些 # -------------------------------------------------------- # Fast R-CNN # Copyright (c) 2015 Microsoft # Licensed under The MIT License [see LICENSE for details] # Written by Ross Girshick # -------------------------------------------------------- import numpy as np # dets: 检测的 boxes 及对应的 scores # thresh: 设定的阈值def nms(dets,thresh):# boxes 位置x1 dets[:,0] y1 dets[:,1] x2 dets[:,2]y2 dets[:,3]# boxes scoresscores dets[:,4]areas (x2-x11)*(y2-y11) # 各box的面积order scores.argsort()[::-1] # 分类置信度排序keep [] # 记录保留下的 boxeswhile order.size 0:i order[0] # score最大的box对应的 indexkeep.append(i) # 将本轮score最大的box的index保留\# 计算剩余 boxes 与当前 box 的重叠程度 IoUxx1 np.maximum(x1[i],x1[order[1:]])yy1 np.maximum(y1[i],y1[order[1:]])xx2 np.minimum(x2[i],x2[order[1:]])yy2 np.minimum(y2[i],y2[order[1:]])w np.maximum(0.0,xx2-xx11) # IoUh np.maximum(0.0,yy2-yy11)inter w*hovr inter/(areas[i]areas[order[1:]]-inter)\# 保留 IoU 小于设定阈值的 boxesinds np.where(ovrthresh)[0]order order[inds1]return keep3.6.3 单图预测推理当模型已经训练完成后下面我们来看下如何对单张图片进行推理得到目标检测结果。首先我们需要导入必要的python包然后加载训练好的模型权重。随后我们需要定义预处理函数。为了达到最好的预测效果测试环节的预处理方案需要和训练时保持一致仅去除掉数据增强相关的变换即可。因此这里我们需要进行的预处理为将图片缩放为 224 * 224 的大小转换为 Tensor 并除 255进行减均值除方差的归一化 # Set detect transforms (Its important to be consistent with training) resize transforms.Resize((224, 224)) to_tensor transforms.ToTensor() normalize transforms.Normalize(mean[0.485, 0.456, 0.406],std[0.229, 0.224, 0.225])接着我们就来进行推理过程很简单核心流程可以概括为读取一张图片预处理模型预测对模型预测进行后处理核心代码如下 # Transform the image image normalize(to_tensor(resize(original_image)))# Move to default device image image.to(device)# Forward prop. predicted_locs, predicted_scores model(image.unsqueeze(0))# Post process, get the final detect objects from our tiny detector output det_boxes, det_labels, det_scores model.detect_objects(predicted_locs, predicted_scores, min_scoremin_score, max_overlapmax_overlap, top_ktop_k)这里的detect_objects 函数完成模型预测结果的后处理主要工作有两个首先对模型的输出进行解码得到代表具体位置信息的预测框随后对所有预测框按类别进行NMS来过滤掉一些多余的检测框也就是我们上一小节介绍的内容。最后我们将最终得到的检测框结果进行绘制得到类似如下图的检测结果完整代码见 detect.py 脚本下面是更多的一些VOC测试集中图片的预测结果展示可以看到我们的 tiny_detector 模型对于一些简单的测试图片检测效果还是不错的。一些更难的图片的预测效果如下可以看到当面对一些稍微有挑战性的图片的时候我们的检测器就开始暴露出各种个样的问题包括但不限于漏框右图有很多瓶子没有检测出来误检右图误检了一个瓶子重复检测左图的汽车和右图最前面的人定位不准尤其是对小物体不妨运行下 detect.py赶快看看你训练的模型效果如何吧你观察到了哪些问题有没有什么优化思路呢 3.6.4 VOC测试集评测 3.6.4.1 介绍map指标以分类模型中最简单的二分类为例对于这种问题我们的模型最终需要判断样本的结果是0还是1或者说是positive还是negative。我们通过样本的采集能够直接知道真实情况下哪些数据结果是positive哪些结果是negative。同时我们通过用样本数据跑出分类模型的结果也可以知道模型认为这些数据哪些是positive哪些是negative。因此我们就能得到这样四个基础指标称他们是一级指标最底层的 1真实值是positive模型认为是positive的数量True PositiveTP 2真实值是positive模型认为是negative的数量False Negative FN这就是统计学上的第二类错误Type II Error 3真实值是negative模型认为是positive的数量False Positive FP这就是统计学上的第一类错误Type I Error 4真实值是negative模型认为是negative的数量True Negative TN 在机器学习领域混淆矩阵confusion matrix又称为可能性表格或错误矩阵。它是一种特定的矩阵用来呈现算法性能的可视化效果通常用于监督学习非监督学习通常用匹配矩阵matching matrix。其每一列代表预测值每一行代表的是实际的类别。这个名字来源于它可以非常容易的表明多个类别是否有混淆也就是一个class被预测成另一个class。 Example 假设有一个用来对猫cats、狗dogs、兔子rabbits进行分类的系统混淆矩阵就是为了进一步分析性能而对该算法测试结果做出的总结。假设总共有27只动物8只猫、6条狗、13只兔子。结果的混淆矩阵如下表表3-30二级指标混淆矩阵里面统计的是个数有时候面对大量的数据光凭算个数很难衡量模型的优劣。因此混淆矩阵在基本的统计结果上又延伸了如下4个指标我称他们是二级指标通过最底层指标加减乘除得到的 1准确率Accuracy-----针对整个模型 2精确率Precision 3灵敏度Sensitivity就是召回率Recall 4特异度Specificity 用表格的方式将这四种指标的定义、计算、理解进行汇总表3-31通过上面的四个二级指标可以将混淆矩阵中数量的结果转化为0-1之间的比率。便于进行标准化的衡量。三级指标这个指标叫做F1 Score。他的计算公式是 F1 Score 2PR / PR 其中P代表PrecisionR代表Recall召回率。F1-Score指标综合了Precision与Recall的产出的结果。F1-Score的取值范围从0到1,1代表模型的输出最好0代表模型的输出结果最差。 AP指标即Average Precision 即平均精确度。 mAP即Mean Average Precision即平均AP值是对多个验证集个体求平均AP值作为object detection中衡量检测精度的指标。在目标检测场景如何计算AP呢这里需要引出P-R曲线即以precision和recall作为纵、横轴坐标的二维曲线。通过选取不同阈值时对应的精度和召回率画出如下图所示图3-32 PR曲线P-R曲线的总体趋势是精度越高召回越低当召回到达1时对应概率分数最低的正样本这个时候正样本数量除以所有大于等于该阈值的样本数量就是最低的精度值。另外P-R曲线围起来的面积就是AP值通常来说一个越好的分类器AP值越高。总结在目标检测中每一类都可以根据recall和precision绘制P-R曲线AP就是该曲线下的面积mAP就是所有类的AP的平均值。(这里说的是VOC数据集的mAP指标的计算方法COCO数据集的计算方法略有差异 3.6.4.2 Tiny-Detection VOC测试集评测运行 eval.py 脚本评估模型在VOC2007测试集上的效果结果如下 python eval.py $ python eval.py ... ... Evaluating: 100%|███████████████████████████████| 78/78 [00:5700:00, 1.35it/s] {aeroplane: 0.6086561679840088,bicycle: 0.7144593596458435,bird: 0.5847545862197876,boat: 0.44902321696281433,bottle: 0.2160634696483612,bus: 0.7212041616439819,car: 0.629608154296875,cat: 0.8124480843544006,chair: 0.3599272668361664,cow: 0.5980824828147888,diningtable: 0.6459739804267883,dog: 0.7577021718025208,horse: 0.7861635088920593,motorbike: 0.702280580997467,person: 0.5821948051452637,pottedplant: 0.2793791592121124,sheep: 0.5655995607376099,sofa: 0.708049476146698,train: 0.7575671672821045,tvmonitor: 0.5641061663627625}Mean Average Precision (mAP): 0.602可以看到模型的mAP得分为60.2比经典的YOLO网络的63.4的得分稍低得分还是说的过去的同时我们也可以观察到某几个类别例如bottle和pottedplant的检测效果是很差的说明我们的模型对于小物体较为密集的物体的检测是存在明显问题的。

查看全文

http://www.zqtcl.cn/news/79231/