当前位置：首页 > news >正文

做网站卖东西赚钱么律师事务所网站建设

news 2025/11/22 15:26:48

做网站卖东西赚钱么,律师事务所网站建设,如何自己做app,建设局局长SiamRPN 1、概述 SiamRPN 是一种用于视觉目标跟踪的算法。它结合了 Siamese 网络#xff08;孪生网络#xff09;和 Region Proposal Network#xff08;区域提议网络#xff09;的概念。这种算法的主要目的是在视频序列中准确地跟踪单个目标。下面是它的一些关键特点孪生网络和 Region Proposal Network区域提议网络的概念。这种算法的主要目的是在视频序列中准确地跟踪单个目标。下面是它的一些关键特点孪生网络Siamese NetworkSiamRPN 使用孪生网络来提取视频帧中的特征。孪生网络由两个相同的子网络组成这两个子网络共享相同的权重并可以有效地比较两个不同图像的特征。区域提议网络Region Proposal Network, RPN这是一种用于目标检测的网络可以在图像中生成目标候选区域。SiamRPN 将 RPN 集成到孪生网络中以便于在连续的视频帧中定位目标。跟踪和定位该算法首先在初始帧中标识目标然后在随后的帧中跟踪这个目标。它通过比较初始帧的目标特征和后续帧中的特征来实现这一点。鲁棒性和准确性SiamRPN 能够在多种情况下有效地跟踪目标即使在目标形状变化、遮挡或光照变化的情况下也能保持较高的跟踪准确性。 SiamRPN相对于SiamFC训练数据更加丰富实践证明更多的训练数据可以帮助获得更好的性能增加RPN网络结构使生成的位置更加准确目标框能和目标更加贴合。 2、网络结构论文中的网络结构如下实际转换出来的网络结构图转成onnx用netron打开如下图所示首先经过两个backbone网络提取特征相比于SiamFC每个backbone输出两个分支然后在head中进行两两组合进行卷积操作最终输出两个结果分别是目标置信度和位置回归具体操作见3。 3、代码 3.1 训练训练部分代码说明。数据处理和SiamFC算法相同SiamRPN网络也有两个输入的z初始化的图片x当前帧图片。在训练使用COT-10k数据集下边是部分数据。 COT-10k └── COT-10k/train├── COT-10k/train/GOT-10k_Train_000001├── COT-10k/train/GOT-10k_Train_000002├── COT-10k/train/GOT-10k_Train_000003├── COT-10k/train/GOT-10k_Train_000004├── COT-10k/train/GOT-10k_Train_000005├── COT-10k/train/GOT-10k_Train_000006├── COT-10k/train/GOT-10k_Train_000007├── COT-10k/train/GOT-10k_Train_000008├── COT-10k/train/GOT-10k_Train_000009├── COT-10k/train/GOT-10k_Train_000010└── COT-10k/train/list.txt训练过程中主要的迭代流程如下 def __getitem__(self, index):index random.choice(range(len(self.sub_class_dir)))if self.name GOT-10k:if index 4418 or index 8627 or index 8629 or index 9057 or index 9058 or index 7787 or index 5911:index 3self._pick_img_pairs(index)self.open()self._tranform()regression_target, conf_target self._target()self.count 1return self.ret[train_z_transforms], self.ret[train_x_transforms], regression_target, conf_target.astype(np.int64)在 SiamRPN算法中用于加载并预处理一对图像模板和检测图像并生成对应的回归目标和置信度目标随机选择索引 index random.choice(range(len(self.sub_class_dir)))随机选择一个索引该索引指向 self.sub_class_dir 中的一个子类别。 GOT-10k数据集的处理检查并调整特定的索引值。选择图像对调用 _pick_img_pairs 方法来选择一对图像模板和检测图像。打开图像并进行变换 self.open()用于加载和预处理模板图像和检测图像。self._transform()应用预定义的变换来处理图像数据例如缩放、裁剪、规范化等。生成目标 regression_target, conf_target self._target()生成回归目标例如边界框坐标和置信度目标例如目标存在的置信度。返回值返回处理后的模板图像变换、检测图像变换、回归目标和置信度目标。这些是训练过程中网络所需的关键输入。确保每次迭代都能从数据集中获取一对适当处理的图像及其对应的训练目标。 _pick_img_pairs函数 def _pick_img_pairs(self, index_of_subclass):assert index_of_subclass len(self.sub_class_dir), index_of_subclass should less than total classesvideo_name self.sub_class_dir[index_of_subclass][0]video_num len(video_name)video_gt self.sub_class_dir[index_of_subclass][1]status Truewhile status:if self.max_inter video_num - 1:self.max_inter video_num // 2template_index np.clip(random.choice(range(0, max(1, video_num - self.max_inter))), 0, video_num - 1)detection_index np.clip(random.choice(range(1, max(2, self.max_inter))) template_index, 0,video_num - 1)template_img_path, detection_img_path video_name[template_index], video_name[detection_index]template_gt video_gt[template_index]detection_gt video_gt[detection_index]if template_gt[2] * template_gt[3] * detection_gt[2] * detection_gt[3] ! 0:status Falseelse:# print(Warning : Encounter object missing, reinitializing ...)print(index_of_subclass:, index_of_subclass, \n,template_index:, template_index, \n,template_gt:, template_gt, \n,detection_index:, detection_index, \n,detection_gt:, detection_gt, \n)# load infomation of template and detectionself.ret[template_img_path] template_img_pathself.ret[detection_img_path] detection_img_pathself.ret[template_target_x1y1wh] template_gtself.ret[detection_target_x1y1wh] detection_gtt1, t2 self.ret[template_target_x1y1wh].copy(), self.ret[detection_target_x1y1wh].copy()self.ret[template_target_xywh] np.array([t1[0] t1[2] // 2, t1[1] t1[3] // 2, t1[2], t1[3]], np.float32)self.ret[detection_target_xywh] np.array([t2[0] t2[2] // 2, t2[1] t2[3] // 2, t2[2], t2[3]], np.float32)self.ret[anchors] self.anchors_pick_img_pairs 函数是 SiamRPN中用于从给定的视频序列中选择一对图像一张用作模板template图像另一张用作检测detection图像。该函数的目的是从视频序列中随机选择两帧图像并确保这两帧图像都包含目标检查子类别索引使用 assert 确保提供的 index_of_subclass 小于 self.sub_class_dir 的长度以避免索引越界。获取视频名称和目标框真值 video_name 存储了视频序列中每帧图像的路径。video_num 是视频中的总帧数。video_gt 存储了每帧图像的真值ground truth通常是目标的边界框。选择模板和检测图像的索引循环直到找到有效的图像对即两帧图像的真值均不为零。self.max_inter 是模板和检测图像之间允许的最大帧数间隔。template_index 和 detection_index 是通过随机选择并使用 np.clip 确保索引在有效范围内的两帧图像的索引。检查并确保目标的存在检查模板和检测图像的真值template_gt 和 detection_gt确保它们包含有效的目标信息即宽度和高度不为零。提取模板和检测图像路径及目标信息从视频序列中获取模板和检测图像的路径。提取对应的真值。转换目标坐标格式将目标的坐标从 [x1, y1, width, height] 格式转换为 [center_x, center_y, width, height] 格式。这种格式更适合后续的处理和模型训练。更新返回信息更新 self.ret 字典包含模板和检测图像的路径、目标坐标、锚点等信息。这个函数的关键作用是为SiamRPN提供一对用于训练的图像这一步骤至关重要因为它涉及到从一个图像模板到另一个图像检测的目标跟踪。通过这种方式网络可以学习如何在不同帧之间保持对目标的跟踪。 open函数 def open(self):template# template_img cv2.imread(self.ret[template_img_path]) if you use cv2.imread you can not open .JPEG formattemplate_img Image.open(self.ret[template_img_path])template_img np.array(template_img)detection_img Image.open(self.ret[detection_img_path])detection_img np.array(detection_img)if np.random.rand(1) config.gray_ratio:template_img cv2.cvtColor(template_img, cv2.COLOR_RGB2GRAY)template_img cv2.cvtColor(template_img, cv2.COLOR_GRAY2RGB)detection_img cv2.cvtColor(detection_img, cv2.COLOR_RGB2GRAY)detection_img cv2.cvtColor(detection_img, cv2.COLOR_GRAY2RGB)img_mean np.mean(template_img, axis(0, 1))# img_mean tuple(map(int, template_img.mean(axis(0, 1))))exemplar_img, scale_z, s_z, w_x, h_x self.get_exemplar_image(template_img,self.ret[template_target_xywh],config.exemplar_size,config.context_amount, img_mean)# size_x config.exemplar_size# x1, y1 int((size_x 1) / 2 - w_x / 2), int((size_x 1) / 2 - h_x / 2)# x2, y2 int((size_x 1) / 2 w_x / 2), int((size_x 1) / 2 h_x / 2)# frame cv2.rectangle(exemplar_img, (x1,y1), (x2,y2), (0, 255, 0), 1)# cv2.imwrite(exemplar_img.png,frame)# cv2.waitKey(0)self.ret[exemplar_img] exemplar_imgdetection# detection_img cv2.imread(self.ret[detection_img_path])d self.ret[detection_target_xywh]cx, cy, w, h d # float typewc_z w 0.5 * (w h)hc_z h 0.5 * (w h)s_z np.sqrt(wc_z * hc_z)s_x s_z / (config.instance_size // 2)img_mean_d tuple(map(int, detection_img.mean(axis(0, 1))))a_x_ np.random.choice(range(-12, 12))a_x a_x_ * s_xb_y_ np.random.choice(range(-12, 12))b_y b_y_ * s_xinstance_img, a_x, b_y, w_x, h_x, scale_x self.get_instance_image(detection_img, d,config.exemplar_size, # 127config.instance_size, # 255config.context_amount, # 0.5a_x, b_y,img_mean_d)# size_x config.instance_size## x1, y1 int((size_x 1) / 2 - w_x / 2), int((size_x 1) / 2 - h_x / 2)## x2, y2 int((size_x 1) / 2 w_x / 2), int((size_x 1) / 2 h_x / 2)# frame_d cv2.rectangle(instance_img, (int(x1(a_x*scale_x)),int(y1(b_y*scale_x))), (int(x2(a_x*scale_x)),int(y2(b_y*scale_x))), (0, 255, 0), 1)# cv2.imwrite(detection_img_ori.png,frame_d)# w x2 - x1# h y2 - y1# cx x1 w / 2# cy y1 h / 2# print([a_x_, b_y_, w, h], [int(a_x_), int(b_y_), w, h])self.ret[instance_img] instance_img# self.ret[cx, cy, w, h] [int(a_x_*0.16), int(b_y_*0.16), w, h]self.ret[cx, cy, w, h] [int(a_x_), int(b_y_), w, h]用于加载和预处理模板图像和检测图像的函数。这个方法执行了多个步骤来准备图像以供网络训练或推断使用加载模板图像template和检测图像detection 使用 PIL.Image.open 加载图像并将其转换为 NumPy 数组。这种方式比 cv2.imread 更好因为它支持更多的图像格式如 .JPEG。可选的灰度转换有一定概率由 config.gray_ratio 控制将图像转换为灰度图然后再转换回 RGB这可以增强模型对灰度图像的适应性。计算图像均值计算模板图像的平均颜色值用于后续的图像标准化或处理。获取模板图像调用 get_exemplar_image 方法来裁剪和处理模板图像确保其符合网络输入的尺寸要求。处理检测图像类似地处理检测图像包括可能的位置偏移和尺寸调整。存储处理后的图像和目标信息将处理后的模板图像exemplar_img和检测图像instance_img以及相关的目标信息存储在 self.ret 字典中。通过这些步骤open 方法为目标跟踪任务准备了必要的图像数据。这包括加载图像、进行必要的预处理如缩放、裁剪、颜色转换等以及提取和调整目标的位置和尺寸信息。这些处理后的数据是模型训练和评估的关键输入。在实际应用中这种数据准备是非常重要的因为它直接影响到模型的性能和对不同场景的适应能力。通过随机化处理如随机灰度转换和位移可以进一步提高模型对现实世界中的各种情况的鲁棒性。总之这个 open 方法的目的是确保输入到 SiamRPN 的图像数据是适当处理和格式化的从而使模型能够更有效地学习和预测目标在视频帧中的位置。 get_exemplar_image def get_exemplar_image(self, img, bbox, size_z, context_amount, img_meanNone):cx, cy, w, h bboxwc_z w context_amount * (w h)hc_z h context_amount * (w h)s_z np.sqrt(wc_z * hc_z)scale_z size_z / s_zexemplar_img, scale_x self.crop_and_pad_old(img, cx, cy, size_z, s_z, img_mean)w_x w * scale_xh_x h * scale_xreturn exemplar_img, scale_z, s_z, w_x, h_x解析边界框bbox cx, cy, w, h这些是目标边界框的中心坐标cx, cy和宽高w, h。计算上下文和缩放尺寸 wc_z w context_amount * (w h) 和 hc_z h context_amount * (w h)这里计算模板图像的上下文尺寸即原始目标尺寸周围应该包括多少额外的背景。context_amount 是一个比例因子决定了除了目标本身外还应该包含多少额外的上下文区域。s_z np.sqrt(wc_z * hc_z)这是模板图像的尺寸基于上下文加上目标大小来计算。scale_z size_z / s_z计算缩放比例以便将模板图像缩放到所需的尺寸 size_z。裁剪并填充图像 exemplar_img, scale_x self.crop_and_pad_old(img, cx, cy, size_z, s_z, img_mean)使用 crop_and_pad_old 方法来裁剪和填充图像。这个过程包括根据计算得到的尺寸和上下文将图像裁剪到目标周围然后将其缩放到所需的尺寸。调整目标尺寸 w_x w * scale_x 和 h_x h * scale_x根据缩放比例调整目标在裁剪后图像中的宽度和高度。这个方法的最终目的是生成一个围绕目标的裁剪图像区域同时保留一定量的上下文信息并将其缩放到网络所需的特定尺寸。这对于确保目标跟踪算法能够有效地学习目标特征和上下文信息至关重要。 crop_and_pad_old def crop_and_pad_old(self, img, cx, cy, model_sz, original_sz, img_meanNone):im_h, im_w, _ img.shapexmin cx - (original_sz - 1) / 2xmax xmin original_sz - 1ymin cy - (original_sz - 1) / 2ymax ymin original_sz - 1left int(self.round_up(max(0., -xmin)))top int(self.round_up(max(0., -ymin)))right int(self.round_up(max(0., xmax - im_w 1)))bottom int(self.round_up(max(0., ymax - im_h 1)))xmin int(self.round_up(xmin left))xmax int(self.round_up(xmax left))ymin int(self.round_up(ymin top))ymax int(self.round_up(ymax top))r, c, k img.shapeif any([top, bottom, left, right]):te_im np.zeros((r top bottom, c left right, k), np.uint8) # 0 is better than 1 initializationte_im[top:top r, left:left c, :] imgif top:te_im[0:top, left:left c, :] img_meanif bottom:te_im[r top:, left:left c, :] img_meanif left:te_im[:, 0:left, :] img_meanif right:te_im[:, c left:, :] img_meanim_patch_original te_im[int(ymin):int(ymax 1), int(xmin):int(xmax 1), :]else:im_patch_original img[int(ymin):int(ymax 1), int(xmin):int(xmax 1), :]if not np.array_equal(model_sz, original_sz):im_patch cv2.resize(im_patch_original, (model_sz, model_sz)) # zzp: use cv to get a better speedelse:im_patch im_patch_originalscale model_sz / im_patch_original.shape[0]return im_patch, scale计算裁剪区域首先计算出围绕目标中心点cx, cy的裁剪区域。这个区域的大小由 original_sz 参数指定代表裁剪区域的宽度和高度。边界检查与填充如果裁剪区域超出了原始图像的边界那么就需要对图像进行填充。left, top, right, bottom 分别计算了在各个方向上需要填充的像素数量。img_mean 可以提供填充区域的颜色值。如果未指定通常使用黑色值为0或者图像的平均颜色。执行裁剪与填充创建一个新的零数组te_im其大小足以容纳填充后的图像区域。将原始图像的相关部分复制到这个新数组中并在必要时添加填充。调整图像尺寸如果模型的输入尺寸model_sz与裁剪区域的尺寸original_sz不同则需要调整裁剪区域的尺寸以匹配模型的输入尺寸。这通常通过 cv2.resize 完成。计算缩放比例计算裁剪后图像与模型尺寸之间的缩放比例。返回处理后的图像和缩放比例返回处理后的图像im_patch和缩放比例scale。通过这些步骤crop_and_pad_old 方法能够确保无论目标在图像中的位置如何都可以获得一个恰当尺寸和内容的图像区域用于后续的目标跟踪任务。这对于确保跟踪算法的稳定性和准确性至关重要。 crop_and_pad def crop_and_pad(self, img, cx, cy, gt_w, gt_h, a_x, b_y, model_sz, original_sz, img_meanNone):# random np.random.uniform(-0.15, 0.15)scale_h 1.0 np.random.uniform(-0.15, 0.15)scale_w 1.0 np.random.uniform(-0.15, 0.15)im_h, im_w, _ img.shapexmin (cx - a_x) - ((original_sz - 1) / 2) * scale_wxmax (cx - a_x) ((original_sz - 1) / 2) * scale_wymin (cy - b_y) - ((original_sz - 1) / 2) * scale_hymax (cy - b_y) ((original_sz - 1) / 2) * scale_h# print(xmin, xmax, ymin, ymax, xmin, xmax, ymin, ymax)left int(self.round_up(max(0., -xmin)))top int(self.round_up(max(0., -ymin)))right int(self.round_up(max(0., xmax - im_w 1)))bottom int(self.round_up(max(0., ymax - im_h 1)))xmin int(self.round_up(xmin left))xmax int(self.round_up(xmax left))ymin int(self.round_up(ymin top))ymax int(self.round_up(ymax top))r, c, k img.shapeif any([top, bottom, left, right]):te_im_ np.zeros((int((r top bottom)), int((c left right)), k),np.uint8) # 0 is better than 1 initializationte_im np.zeros((int((r top bottom)), int((c left right)), k),np.uint8) # 0 is better than 1 initialization# cv2.imwrite(te_im1.jpg, te_im)te_im[:, :, :] img_mean# cv2.imwrite(te_im2_1.jpg, te_im)te_im[top:top r, left:left c, :] img# cv2.imwrite(te_im2.jpg, te_im)if top:te_im[0:top, left:left c, :] img_meanif bottom:te_im[r top:, left:left c, :] img_meanif left:te_im[:, 0:left, :] img_meanif right:te_im[:, c left:, :] img_meanim_patch_original te_im[int(ymin):int(ymax 1), int(xmin):int(xmax 1), :]# cv2.imwrite(te_im3.jpg, im_patch_original)else:im_patch_original img[int(ymin):int((ymax) 1), int(xmin):int((xmax) 1), :]# cv2.imwrite(te_im4.jpg, im_patch_original)if not np.array_equal(model_sz, original_sz):h, w, _ im_patch_original.shapeif h w:scale_h_ 1scale_w_ h / wscale config.instance_size / helif h w:scale_h_ w / hscale_w_ 1scale config.instance_size / welif h w:scale_h_ 1scale_w_ 1scale config.instance_size / wgt_w gt_w * scale_w_gt_h gt_h * scale_h_gt_w gt_w * scalegt_h gt_h * scale# im_patch cv2.resize(im_patch_original_, (shape)) # zzp: use cv to get a better speed# cv2.imwrite(te_im8.jpg, im_patch)im_patch cv2.resize(im_patch_original, (model_sz, model_sz)) # zzp: use cv to get a better speed# cv2.imwrite(te_im9.jpg, im_patch)else:im_patch im_patch_original# scale model_sz / im_patch_original.shape[0]return im_patch, gt_w, gt_h, scale, scale_h_, scale_w_用于从原始图像中裁剪并调整大小以生成模型所需的图像。随机缩放 scale_h 和 scale_w 通过随机数生成用于随机调整裁剪区域的高度和宽度。这有助于模型适应不同尺寸的目标。计算裁剪区域使用目标中心点 (cx, cy) 和给定的偏移 (a_x, b_y) 来计算裁剪区域的坐标 (xmin, ymin, xmax, ymax)。original_sz 是裁剪区域的期望尺寸。边界检查与填充计算需要在每个方向上填充的像素数量以确保裁剪区域完全位于图像内部。img_mean 用于填充超出原始图像边界的区域。执行裁剪与填充根据上述计算结果从原始图像中裁剪出目标区域并添加必要的填充。调整图像尺寸如果模型的输入尺寸 (model_sz) 与裁剪区域的尺寸 (original_sz) 不同则需要调整裁剪区域的尺寸以匹配模型的输入尺寸。使用 cv2.resize 完成这一步骤。目标尺寸调整根据裁剪和缩放操作调整目标的宽度 (gt_w) 和高度 (gt_h)。返回处理后的图像和尺寸信息返回处理后的图像 (im_patch)调整后的目标宽度和高度 (gt_w, gt_h)以及用于裁剪和缩放的比例因子 (scale, scale_h_, scale_w_)。这个方法通过引入随机缩放和偏移增加了模型对不同目标尺寸和形状的适应性这对于目标跟踪算法的性能至关重要。通过精确控制裁剪和填充过程确保了即使目标靠近图像边缘也能够得到适当处理的图像区域这有助于模型更准确地识别和跟踪目标。 compute_target函数 def compute_target(self, anchors, box):# box [-(box[0]), -(box[1]), box[2], box[3]]regression_target self.box_transform(anchors, box)iou self.compute_iou(anchors, box).flatten()# print(np.max(iou))pos_index np.where(iou config.pos_threshold)[0]neg_index np.where(iou config.neg_threshold)[0]label np.ones_like(iou) * -1label[pos_index] 1label[neg_index] 0print(len(neg_index))for i, neg_ind in enumerate(neg_index):if i % 40 0:label[neg_ind] 0# max_index np.argsort(iou.flatten())[-20:]return regression_target, label用于计算SiamRPN中的回归目标和分类标签的。具体来说这个方法会基于预定义的锚点框anchors和一个给定的边界框box计算每个锚点的回归目标和它们是否包含目标的标签。我们逐步解析这个方法计算回归目标 regression_target self.box_transform(anchors, box)这一行代码使用 box_transform 函数计算每个锚点框到真实边界框的回归目标。通常这包括计算锚点框和真实边界框之间的偏移量和尺寸比例。计算交并比IoU iou self.compute_iou(anchors, box).flatten()计算每个锚点框与真实边界框之间的交并比IoU。IoU 是一种衡量两个边界框重叠程度的指标。确定正负样本索引 pos_index np.where(iou config.pos_threshold)[0]找出那些与真实边界框的 IoU 超过某个正阈值config.pos_threshold的锚点框这些锚点框被视为正样本。neg_index np.where(iou config.neg_threshold)[0]找出那些与真实边界框的 IoU 低于某个负阈值config.neg_threshold的锚点框这些锚点框被视为负样本。初始化标签数组 label np.ones_like(iou) * -1初始化一个与 IoU 数组大小相同的标签数组初始值设置为 -1表示这些锚点框既不是正样本也不是负样本。标记正负样本 label[pos_index] 1将正样本的标签设为 1。label[neg_index] 0将负样本的标签设为 0。返回回归目标和标签返回计算得到的回归目标和标签。回归目标用于调整锚点框的位置和尺寸以更好地匹配真实的目标边界框标签用于分类指示哪些锚点框包含目标正样本以及哪些不包含负样本。这种方法是目标跟踪算法中生成训练样本的关键步骤。通过这种方式算法能够学习如何从大量的候选锚点框中区分出包含目标的锚点框并准确地调整它们以更好地对齐目标。这对于提高算法的跟踪精度和鲁棒性至关重要。 3.2 demo运行相主要时track部分里边包括init和update对第一帧跟踪目标进行init init def init(self, frame, bbox):self.bbox np.array([bbox[0] - 1 (bbox[2] - 1) / 2, bbox[1] - 1 (bbox[3] - 1) / 2, bbox[2], bbox[3]]) # cx,cy,w,hself.pos np.array([bbox[0] - 1 (bbox[2] - 1) / 2, bbox[1] - 1 (bbox[3] - 1) / 2]) # center x, center y, zero basedself.target_sz np.array([bbox[2], bbox[3]]) # width, heightself.origin_target_sz np.array([bbox[2], bbox[3]]) # w,hself.img_mean np.mean(frame, axis(0, 1))exemplar_img, scale_z, _ get_exemplar_image(frame, self.bbox, config.exemplar_size, config.context_amount,self.img_mean)exemplar_img self.transforms(exemplar_img)[None, :, :, :] # 在测试阶段转换成tensor类型就可以了self.model.track_init(exemplar_img.cuda())init负责设置跟踪器的初始状态包括确定初始的目标边界框、计算图像的平均颜色并准备第一帧中的模板exemplar图像设置初始边界框self.bbox 这行代码将输入的边界框 [x, y, width, height] 转换成以中心点坐标表示的形式 [center_x, center_y, width, height]。这样做通常是为了方便后续计算。设置目标的位置self.pos 与 self.bbox 类似这里也计算了目标的中心点坐标但只包含 x 和 y 坐标。设置目标尺寸self.target_sz 和 self.origin_target_sz 这些行代码记录了目标的原始尺寸即输入边界框的宽度和高度。计算图像均值self.img_mean 计算输入帧的平均颜色值这在后续的图像预处理中可能会用到。获取模板图像exemplar_img get_exemplar_image 函数根据初始边界框从输入帧中提取出模板图像并进行相应的缩放和处理。图像变换 exemplar_img self.transforms(exemplar_img)[None, :, :, :]对模板图像应用预定义的变换并增加一个新的批处理维度。初始化模型跟踪 self.model.track_init(exemplar_img.cuda())使用处理好的模板图像初始化跟踪模型。初始化过程对于目标跟踪算法来说至关重要因为它设置了算法在后续帧中进行跟踪的基础。通过精确地定义初始状态和准备第一帧中的模板图像跟踪器能够在接下来的帧中有效地定位和跟踪目标。 update def update(self, frame):instance_img_np, _, _, scale_x get_instance_image(frame, self.bbox, config.exemplar_size,config.instance_size,config.context_amount, self.img_mean)instance_img self.transforms(instance_img_np)[None, :, :, :]pred_score, pred_regression self.model.track(instance_img.cuda()) #pred_conf pred_score.reshape(-1, 2, config.anchor_num * config.score_size * config.score_size).permute(0, 2,1)pred_offset pred_regression.reshape(-1, 4, config.anchor_num * config.score_size * config.score_size).permute(0, 2, 1)delta pred_offset[0].cpu().detach().numpy()box_pred box_transform_inv(self.anchors, delta) # 通过 anchors 和 offset 来预测boxscore_pred F.softmax(pred_conf, dim2)[0, :, 1].cpu().detach().numpy() # 计算预测分类得分def change(r):return np.maximum(r, 1. / r) # x 和 y 逐位进行比较选择最大值def sz(w, h):pad (w h) * 0.5sz2 (w pad) * (h pad)return np.sqrt(sz2)def sz_wh(wh):pad (wh[0] wh[1]) * 0.5sz2 (wh[0] pad) * (wh[1] pad)return np.sqrt(sz2)s_c change(sz(box_pred[:, 2], box_pred[:, 3]) / (sz_wh(self.target_sz * scale_x))) # scale penaltyr_c change((self.target_sz[0] / self.target_sz[1]) / (box_pred[:, 2] / box_pred[:, 3])) # ratio penaltypenalty np.exp(-(r_c * s_c - 1.) * config.penalty_k) # 尺度惩罚和比例惩罚pscore penalty * score_pred # 对每一个anchors的分类预测×惩罚因子pscore pscore * (1 - config.window_influence) self.window * config.window_influence # 再乘以余弦窗best_pscore_id np.argmax(pscore) # 得到最大的得分target box_pred[best_pscore_id, :] / scale_x # targetx,y,w,h是以上一帧的pos为0,0lr penalty[best_pscore_id] * score_pred[best_pscore_id] * config.lr_box # 预测框的学习率res_x np.clip(target[0] self.pos[0], 0, frame.shape[1]) # wframe.shape[1]res_y np.clip(target[1] self.pos[1], 0, frame.shape[0]) # hframe.shape[0]res_w np.clip(self.target_sz[0] * (1 - lr) target[2] * lr, config.min_scale * self.origin_target_sz[0],config.max_scale * self.origin_target_sz[0])res_h np.clip(self.target_sz[1] * (1 - lr) target[3] * lr, config.min_scale * self.origin_target_sz[1],config.max_scale * self.origin_target_sz[1])self.pos np.array([res_x, res_y]) # 更新之后的坐标self.target_sz np.array([res_w, res_h])bbox np.array([res_x, res_y, res_w, res_h])self.bbox ( # cx, cy, w, hnp.clip(bbox[0], 0, frame.shape[1]).astype(np.float64),np.clip(bbox[1], 0, frame.shape[0]).astype(np.float64),np.clip(bbox[2], 10, frame.shape[1]).astype(np.float64),np.clip(bbox[3], 10, frame.shape[0]).astype(np.float64))bbox np.array([ # tr-x,tr-y w,hself.pos[0] 1 - (self.target_sz[0] - 1) / 2,self.pos[1] 1 - (self.target_sz[1] - 1) / 2,self.target_sz[0], self.target_sz[1]])# return self.bbox, score_pred[best_pscore_id]return bboxupdate 是SiamRPN的关键组成部分用于在每一帧中更新目标的位置和尺寸。它执行了一系列操作从处理当前帧图像到应用模型预测再到调整跟踪框处理当前帧图像 get_instance_image 函数从当前帧中提取出一个实例instance图像这个图像是围绕前一帧中预测的目标位置裁剪和缩放得到的。应用图像变换 self.transforms 应用于实例图像。模型预测 self.model.track 使用处理后的实例图像进行预测输出预测得分pred_score和回归pred_regression。处理预测结果将预测结果转换为合适的格式并通过逆变换box_transform_inv将预测的偏移应用于锚点得到预测的边界框box_pred。计算尺度和比例惩罚 s_c 和 r_c 分别计算尺度惩罚和比例惩罚。应用惩罚和余弦窗对每个锚点的预测得分应用惩罚因子和余弦窗。选取最佳预测根据得分选择最佳预测的锚点并根据该锚点的预测更新目标位置和尺寸。更新目标位置和尺寸根据最佳预测的锚点和学习率lr更新目标的位置self.pos和尺寸self.target_sz。返回新的边界框最后方法返回更新后的边界框。整个 update 方法的作用是在每一帧中根据模型预测和前一帧的信息来更新目标的位置和尺寸从而实现对目标的连续跟踪。通过这种方式目标跟踪算法可以适应目标在视频中的运动和变化。

查看全文

http://www.zqtcl.cn/news/84604/