The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore Fully Convolutional Networks - Caffe Tutorial

View in Fullscreen

models + code fully convolutional networks are fast, end-to-end models for pixelwise problems - code in Caffe branch (merged soon) - models for PASCAL VOC, NYUDv2,

Like this book? You can publish your book online for free in a few minutes!

http://anyflip.com/hckt/ktxv/

Download PDF

Related Publications

Explore More

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by , 2016-10-30 21:08:03

Fully Convolutional Networks - Caffe Tutorial

Pages:

1 - 31

models + code fully convolutional networks are fast, end-to-end models for pixelwise problems - code in Caffe branch (merged soon) - models for PASCAL VOC, NYUDv2,

Fully Convolutional Networks

Jon Long and Evan Shelhamer
CVPR15 Caffe Tutorial

pixels in, pixels out

monocular depth estimation (Liu et al. 2015)
semantic
segmentation

boundary prediction (Xie & Tu 2015)

< 1/5 second

???

end-to-end learning

3

a classification network

“tabby cat”
4

becoming fully convolutional

5

becoming fully convolutional

6

upsampling output

7

end-to-end, pixels-to-pixels network

8

end-to-end, pixels-to-pixels network

conv, pool, upsampling
nonlinearity pixelwise

output + loss

9

spectrum of deep features

combine where (local, shallow) with what (global, deep)

fuse features into deep jet

(cf. Hariharan et al. CVPR15 “hypercolumn”)

skip layers

end-to-end, joint learning skip to fuse layers! interp + sum
of semantics and location interp + sum
dense output

skip layer refinement

input image stride 32 stride 16 stride 8 ground truth

no skips 1 skip 2 skips

training + testing

- train full image at a time without patch sampling
- reshape network to take input of any size
- forward time is ~150ms for 500 x 500 x 21 output

FCN SDS* Truth Input

Relative to prior state-of-the-
art SDS:

- 20% improvement
for mean IoU

- 286× faster

*Simultaneous Detection and Segmentation
Hariharan et al. ECCV14

models + code

fully convolutional networks are fast, end-
to-end models for pixelwise problems

- code in Caffe branch (merged soon) caffe.berkeleyvision.org
- models for PASCAL VOC, NYUDv2, github.com/BVLC/caffe

SIFT Flow, PASCAL-Context in Model Zoo

fcn.berkeleyvision.org

models

- PASCAL VOC standard for object segmentation
- NYUDv2 multi-modal rgb + depth scene segmentation
- SIFT Flow multi-task for semantic + geometric segmentation
- PASCAL-Context object + scene segmentation

inference

inference script (gist)

solving

solving script (gist)

Reshape

- Decide shape on-the-fly in C++ / Python / MATLAB
- DataLayer automatically reshapes

for batch size == 1
- Essentially free

(only reallocates when necessary)

Helpful Layers

- Losses can take spatial predictions + truths
- Deconvolution / “backward convolution”

can compute interpolation
- Crop: maps coordinates between layers

FCN for Pose Estimation

Georgia Gkioxari
UC Berkeley

FCN for Pose Estimation

Input data:

Image

FCN for Pose Estimation

Input data: Keypoints

Image

FCN for Pose Estimation

Input data: Keypoints

Image

Define an area around the
keypoint as its positive
neighborhood with radius r.

FCN for Pose Estimation

Input data: Keypoints Labels

Image

FCN for Pose Estimation Labels

Input data:

Image

Heat Map Predictions from FCN

Test Image Right Ankle Right Knee Right Hip Right Wrist Right Elbow Right Shoulder

Heat Map Predictions from FCN

Test Image Right Ankle Right Knee Right Hip Right Wrist Right Elbow Right Shoulder

Two modes because there
are two Right Shoulders in
the image!

Heat Maps to Keypoints

PCK @ 0.2 LSP test set FCN baseline PCK == ~69%
State-of-the-art == ~72%
Ankle 56.5

Knee 60.0

Hip 56.6

Wrist 62.9

Elbow 71.8

Shoulder 78.8

Head 93.6

Details

Architecture:
● FCN - 32 stride. No data augmentation.
● radius = 0.1*im.shape[0] (no cross validation)

Runtime on a K40:
● 0.7 sec/iteration for training (15hrs for 80K iterations)
● 0.25 sec/image for inference for all keypoints

conclusion

fully convolutional networks are fast, end-
to-end models for pixelwise problems

- code in Caffe branch (merged soon) caffe.berkeleyvision.org
- models for PASCAL VOC, NYUDv2, github.com/BVLC/caffe

SIFT Flow, PASCAL-Context in Model Zoo

fcn.berkeleyvision.org

Click to View FlipBook Version