淄博哪个网站做房屋出赁好,新手建站,广告制作公司名称,上海小程序服务商roidb是比较复杂的数据结构#xff0c;存放了数据集的roi信息。原始的roidb来自数据集#xff0c;在trian.py的get_training_roidb(imdb)函数进行了水平翻转扩充数量#xff0c;然后prepare_roidb(imdb)【定义在roidb.py】为roidb添加了一些说明性的属性。 在这里暂时记录下… roidb是比较复杂的数据结构存放了数据集的roi信息。原始的roidb来自数据集在trian.py的get_training_roidb(imdb)函数进行了水平翻转扩充数量然后prepare_roidb(imdb)【定义在roidb.py】为roidb添加了一些说明性的属性。 在这里暂时记录下roidb的结构信息后面继续看的时候可能会有些修正 roidb是由字典组成的listroidb[img_index]包含了该图片索引所包含到roi信息下面以roidb[img_index]为例说明 roidb[img_index]包含的key, value boxes box位置信息box_num*4的np array gt_overlaps 所有box在不同类别的得分box_num*class_num矩阵 gt_classes 所有box的真实类别box_num长度的list flipped 是否翻转 image 该图片的路径字符串 width 图片的宽 height 图片的高 max_overlaps 每个box的在所有类别的得分最大值box_num长度 max_classes 每个box的得分最高所对应的类box_num长度 bbox_targets 每个box的类别以及与最接近的gt-box的4个方位偏移 共5列 def add_bbox_regression_targets(roidb):Add information needed to train bounding-box regressors.assert len(roidb) 0assert max_classes in roidb[0], Did you call prepare_roidb first?num_images len(roidb)# Infer number of classes from the number of columns in gt_overlaps# 类别数roidb[0]对应第0号图片上的roi,shape[1]多少列表示roi属于不同类上的概率num_classes roidb[0][gt_overlaps].shape[1]for im_i in xrange(num_images):rois roidb[im_i][boxes]max_overlaps roidb[im_i][max_overlaps]max_classes roidb[im_i][max_classes]# bbox_targets每个box的类别以及与最接近的gt-box的4个方位偏移roidb[im_i][bbox_targets] \_compute_targets(rois, max_overlaps, max_classes)# 这里config是falseif cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:# Use fixed / precomputed means and stds instead of empirical values# 使用固定的均值和方差代替经验值means np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (num_classes, 1))stds np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (num_classes, 1))else:# Compute values needed for means and stds# 计算所需的均值和方差# var(x) E(x^2) - E(x)^2# 计数各个类别出现box的数量class_counts np.zeros((num_classes, 1)) cfg.EPS #加上cfg.EPS防止除0出错# 21类*4个位置如果出现box的类别与其中某一类相同将该box的4个target加入4个列元素中sums np.zeros((num_classes, 4)) # 21类*4个位置如果出现box的类别与其中某一类相同将该box的4个target的平方加入4个列元素中squared_sums np.zeros((num_classes, 4))for im_i in xrange(num_images):targets roidb[im_i][bbox_targets]for cls in xrange(1, num_classes):cls_inds np.where(targets[:, 0] cls)[0]# box的类别与该类匹配计入if cls_inds.size 0:class_counts[cls] cls_inds.sizesums[cls, :] targets[cls_inds, 1:].sum(axis0)squared_sums[cls, :] \(targets[cls_inds, 1:] ** 2).sum(axis0)means sums / class_counts # 均值stds np.sqrt(squared_sums / class_counts - means ** 2) #标准差print bbox target means:print meansprint means[1:, :].mean(axis0) # ignore bg classprint bbox target stdevs:print stdsprint stds[1:, :].mean(axis0) # ignore bg class# Normalize targets# 对每一box归一化targetif cfg.TRAIN.BBOX_NORMALIZE_TARGETS:print Normalizing targetsfor im_i in xrange(num_images):targets roidb[im_i][bbox_targets]for cls in xrange(1, num_classes):cls_inds np.where(targets[:, 0] cls)[0]roidb[im_i][bbox_targets][cls_inds, 1:] - means[cls, :]roidb[im_i][bbox_targets][cls_inds, 1:] / stds[cls, :]else:print NOT normalizing targets# 均值和方差也用于预测# These values will be needed for making predictions# (the predicts will need to be unnormalized and uncentered)return means.ravel(), stds.ravel() # ravel()排序拉成一维def _compute_targets(rois, overlaps, labels): # 参数rois只含有当前图片的box信息Compute bounding-box regression targets for an image.# Indices目录 of ground-truth ROIs# ground-truth ROIsgt_inds np.where(overlaps 1)[0]if len(gt_inds) 0:# Bail if the image has no ground-truth ROIs# 不存在gt ROI返回空数组return np.zeros((rois.shape[0], 5), dtypenp.float32)# Indices of examples for which we try to make predictions# BBOX阈值只有ROI与gt的重叠度大于阈值这样的ROI才能用作bb回归的训练样本ex_inds np.where(overlaps cfg.TRAIN.BBOX_THRESH)[0]# Get IoU overlap between each ex ROI and gt ROI# 计算ex ROI and gt ROI的IoUex_gt_overlaps bbox_overlaps(# 变数据格式为floatnp.ascontiguousarray(rois[ex_inds, :], dtypenp.float),np.ascontiguousarray(rois[gt_inds, :], dtypenp.float))# Find which gt ROI each ex ROI has max overlap with:# this will be the ex ROIs gt target# 这里每一行代表一个ex_roi,列代表gt_roi,元素数值代表两者的IoUgt_assignment ex_gt_overlaps.argmax(axis1) #按行求最大返回索引.gt_rois rois[gt_inds[gt_assignment], :] #每个ex_roi对应的gt_rois与下面ex_roi数量相同ex_rois rois[ex_inds, :]targets np.zeros((rois.shape[0], 5), dtypenp.float32)targets[ex_inds, 0] labels[ex_inds] #第一个元素是labeltargets[ex_inds, 1:] bbox_transform(ex_rois, gt_rois) #后4个元素是ex_box与gt_box的4个方位的偏移return targets 下面解读一下这两个函数。
1. _compute_targets(rois, overlaps, labels)
这个函数用来计算roi的偏移量。基本的步骤就是先确认是否含有ground-truth rois,主要通过 ground-truth ROIs的overlaps1来确认。
然后找到重叠度大于一定阈值的box再进行计算。 对于满足条件的box,会调用程序bbox_overlaps重新计算box对应于ground-truth box的重叠度根据最大的重叠度找对应的ground truth box.
这样就可以利用 fast_rcnn.bbox_transform 的bbox_transform计算4个偏移分别是中心点的x,y坐标w,d长度偏移。
输出的是一个二维数组横坐标是盒子的序号纵坐标是5维第一维是类别第二维到第五维为偏移。
bbox_overlaps的代码如下
def bbox_overlaps(np.ndarray[DTYPE_t, ndim2] boxes,np.ndarray[DTYPE_t, ndim2] query_boxes):Parameters----------boxes: (N, 4) ndarray of floatquery_boxes: (K, 4) ndarray of floatReturns-------overlaps: (N, K) ndarray of overlap between boxes and query_boxescdef unsigned int N boxes.shape[0]cdef unsigned int K query_boxes.shape[0]cdef np.ndarray[DTYPE_t, ndim2] overlaps np.zeros((N, K), dtypeDTYPE)cdef DTYPE_t iw, ih, box_areacdef DTYPE_t uacdef unsigned int k, nfor k in range(K):box_area ((query_boxes[k, 2] - query_boxes[k, 0] 1) *(query_boxes[k, 3] - query_boxes[k, 1] 1))for n in range(N):iw (min(boxes[n, 2], query_boxes[k, 2]) -max(boxes[n, 0], query_boxes[k, 0]) 1)if iw 0:ih (min(boxes[n, 3], query_boxes[k, 3]) -max(boxes[n, 1], query_boxes[k, 1]) 1)if ih 0:ua float((boxes[n, 2] - boxes[n, 0] 1) *(boxes[n, 3] - boxes[n, 1] 1) box_area - iw * ih)overlaps[n, k] iw * ih / uareturn overlaps2. add_bbox_regression_targets 主要两个两件事 1. 确定roidb每个图片的box的回归偏移量bbox_targets 2. 对于所有的类别计算偏移量的均值和方差这样输出的矩阵是二维行坐标是种类这里是21类纵坐标是偏移量这里是4.
并且在需要正则化目标项即cfg.TRAIN.BBOX_NORMALIZE_TARGETStrue时使每个偏移都减去均值除以标准差。并返回均值和方差的折叠一维向量
用于预测即逆操作去正则化则中心化。 参考 py-faster-rcnn代码阅读3-roidb.py Faster RCNN roidb.py