deep learning - caffe reshape / upsample fully connected layer -

assuming have layer this:

layer {   name: "fully-connected"   type: "innerproduct"   bottom: "bottom"   top: "top"   inner_product_param {     num_output: 1   } }

the output batch_size x 1. in several papers (for exmaple link1 page 3 picture on top, or link2 page 4 on top)i have seen used such layer in end come 2d image pixel-wise prediction. how possible transform 2d image? thinking of reshape or deconvolution, cannot figure out how work. simple example helpful

update: input images 304x228 , ground_truth (depth images) 75x55.

################# main net ##################  layer {   name: "conv1"   type: "convolution"   bottom: "data"   top: "conv1"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   convolution_param {     num_output: 96     kernel_size: 11     stride: 4     weight_filler {       type: "gaussian"       std: 0.01     }     bias_filler {       type: "constant"       value: 0     }   } } layer {   name: "relu1"   type: "relu"   bottom: "conv1"   top: "conv1" } layer {   name: "norm1"   type: "lrn"   bottom: "conv1"   top: "norm1"   lrn_param {     local_size: 5     alpha: 0.0001     beta: 0.75   } } layer {   name: "pool1"   type: "pooling"   bottom: "norm1"   top: "pool1"   pooling_param {     pool: max     kernel_size: 3     stride: 2   } } layer {   name: "conv2"   type: "convolution"   bottom: "pool1"   top: "conv2"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   convolution_param {     num_output: 256     pad: 2     kernel_size: 5     group: 2     weight_filler {       type: "gaussian"       std: 0.01     }     bias_filler {       type: "constant"       value: 0.1     }   } } layer {   name: "relu2"   type: "relu"   bottom: "conv2"   top: "conv2" } layer {   name: "norm2"   type: "lrn"   bottom: "conv2"   top: "norm2"   lrn_param {     local_size: 5     alpha: 0.0001     beta: 0.75   } } layer {   name: "pool2"   type: "pooling"   bottom: "norm2"   top: "pool2"   pooling_param {     pool: max     kernel_size: 3     stride: 2   } } layer {   name: "conv3"   type: "convolution"   bottom: "pool2"   top: "conv3"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   convolution_param {     num_output: 384     pad: 1     kernel_size: 3     weight_filler {       type: "gaussian"       std: 0.01     }     bias_filler {       type: "constant"       value: 0     }   } } layer {   name: "relu3"   type: "relu"   bottom: "conv3"   top: "conv3" } layer {   name: "conv4"   type: "convolution"   bottom: "conv3"   top: "conv4"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   convolution_param {     num_output: 384     pad: 1     kernel_size: 3     group: 2     weight_filler {       type: "gaussian"       std: 0.01     }     bias_filler {       type: "constant"       value: 0.1     }   } } layer {   name: "relu4"   type: "relu"   bottom: "conv4"   top: "conv4" } layer {   name: "conv5"   type: "convolution"   bottom: "conv4"   top: "conv5"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   convolution_param {     num_output: 256     pad: 1     kernel_size: 3     group: 2     weight_filler {       type: "gaussian"       std: 0.01     }     bias_filler {       type: "constant"       value: 0.1     }   } } layer {   name: "relu5"   type: "relu"   bottom: "conv5"   top: "conv5" } layer {   name: "pool5"   type: "pooling"   bottom: "conv5"   top: "pool5"   pooling_param {     pool: max     kernel_size: 3     stride: 2   } } layer {   name: "fc6"   type: "innerproduct"   bottom: "pool5"   top: "fc6"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   inner_product_param {     num_output: 4096     weight_filler {       type: "gaussian"       std: 0.005     }     bias_filler {       type: "constant"       value: 0.1     }   } } layer {   name: "relufc6"   type: "relu"   bottom: "fc6"   top: "fc6" } layer {   name: "drop6"   type: "dropout"   bottom: "fc6"   top: "fc6"   dropout_param {     dropout_ratio: 0.5   } }  layer {   name: "fc7"   type: "innerproduct"   bottom: "fc6"   top: "fc7"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   inner_product_param {     num_output: 4070     weight_filler {       type: "gaussian"       std: 0.005     }     bias_filler {       type: "constant"       value: 0.1     }   } }  layer {   type: "reshape"   name: "reshape"   bottom: "fc7"   top: "fc7_reshaped"   reshape_param {     shape { dim:  1  dim: 1  dim:  55 dim: 74 }   } }  layer {   name: "deconv1"   type: "deconvolution"   bottom: "fc7_reshaped"   top: "deconv1"   convolution_param {     num_output: 64     kernel_size: 5     pad: 2     stride: 1       #group: 256     weight_filler {         type: "bilinear"     }     bias_term: false   } }  #########################  layer {   name: "conv6"   type: "convolution"   bottom: "data"   top: "conv6"   param {     lr_mult: 1     decay_mult: 1   }   param {     lr_mult: 2     decay_mult: 0   }   convolution_param {     num_output: 63     kernel_size: 9     stride: 2     pad: 1     weight_filler {       type: "gaussian"       std: 0.01     }     bias_filler {       type: "constant"       value: 0     }   } } layer {   name: "relu6"   type: "relu"   bottom: "conv6"   top: "conv6" }  layer {   name: "pool6"   type: "pooling"   bottom: "conv6"   top: "pool6"   pooling_param {     pool: max     kernel_size: 3     stride: 2   } }  ######################## layer {   name: "concat"   type: "concat"   bottom: "deconv1"   bottom: "pool6"   top: "concat"   concat_param {     concat_dim: 1   } }  layer {   name: "conv7"   type: "convolution"   bottom: "concat"   top: "conv7"   convolution_param {     num_output: 64     kernel_size: 5     pad: 2     stride: 1     weight_filler {       type: "gaussian"       std: 0.011     }     bias_filler {       type: "constant"       value: 0     }   } }  layer {     name: "relu7"     type: "relu"     bottom: "conv7"     top: "conv7"     relu_param{     negative_slope: 0.01         engine: cudnn     } }  layer {   name: "conv8"   type: "convolution"   bottom: "conv7"   top: "conv8"   convolution_param {     num_output: 64     kernel_size: 5     pad: 2     stride: 1     weight_filler {       type: "gaussian"       std: 0.011     }     bias_filler {       type: "constant"       value: 0     }   } }  layer {     name: "relu8"     type: "relu"     bottom: "conv8"     top: "conv8"     relu_param{     negative_slope: 0.01         engine: cudnn     } }  layer {   name: "conv9"   type: "convolution"   bottom: "conv8"   top: "conv9"   convolution_param {     num_output: 1     kernel_size: 5     pad: 2     stride: 1     weight_filler {       type: "gaussian"       std: 0.011     }     bias_filler {       type: "constant"       value: 0     }   } }  layer {     name: "relu9"     type: "relu"     bottom: "conv9"     top: "result"     relu_param{     negative_slope: 0.01         engine: cudnn     } }

log:

i1108 19:34:57.239722  4277 data_layer.cpp:41] output data size: 1,1,228,304 i1108 19:34:57.243340  4277 data_layer.cpp:41] output data size: 1,1,55,74 i1108 19:34:57.247392  4277 net.cpp:150] setting conv1 i1108 19:34:57.247407  4277 net.cpp:157] top shape: 1 96 55 74 (390720) i1108 19:34:57.248191  4277 net.cpp:150] setting pool1 i1108 19:34:57.248196  4277 net.cpp:157] top shape: 1 96 27 37 (95904) i1108 19:34:57.253263  4277 net.cpp:150] setting conv2 i1108 19:34:57.253276  4277 net.cpp:157] top shape: 1 256 27 37 (255744) i1108 19:34:57.254202  4277 net.cpp:150] setting pool2 i1108 19:34:57.254220  4277 net.cpp:157] top shape: 1 256 13 18 (59904) i1108 19:34:57.269943  4277 net.cpp:150] setting conv3 i1108 19:34:57.269961  4277 net.cpp:157] top shape: 1 384 13 18 (89856) i1108 19:34:57.285303  4277 net.cpp:150] setting conv4 i1108 19:34:57.285338  4277 net.cpp:157] top shape: 1 384 13 18 (89856) i1108 19:34:57.294801  4277 net.cpp:150] setting conv5 i1108 19:34:57.294841  4277 net.cpp:157] top shape: 1 256 13 18 (59904) i1108 19:34:57.295207  4277 net.cpp:150] setting pool5 i1108 19:34:57.295210  4277 net.cpp:157] top shape: 1 256 6 9 (13824) i1108 19:34:57.743222  4277 net.cpp:150] setting fc6 i1108 19:34:57.743259  4277 net.cpp:157] top shape: 1 4096 (4096) i1108 19:34:57.881680  4277 net.cpp:150] setting fc7 i1108 19:34:57.881718  4277 net.cpp:157] top shape: 1 4070 (4070)  i1108 19:34:57.881826  4277 net.cpp:150] setting reshape i1108 19:34:57.881846  4277 net.cpp:157] top shape: 1 1 55 74 (4070)  i1108 19:34:57.884768  4277 net.cpp:150] setting conv6 i1108 19:34:57.885309  4277 net.cpp:150] setting pool6 i1108 19:34:57.885327  4277 net.cpp:157] top shape: 1 63 55 74 (256410)  i1108 19:34:57.885395  4277 net.cpp:150] setting concat i1108 19:34:57.885412  4277 net.cpp:157] top shape: 1 64 55 74 (260480)  i1108 19:34:57.886759  4277 net.cpp:150] setting conv7 i1108 19:34:57.886786  4277 net.cpp:157] top shape: 1 64 55 74 (260480)  i1108 19:34:57.897269  4277 net.cpp:150] setting conv8 i1108 19:34:57.897303  4277 net.cpp:157] top shape: 1 64 55 74 (260480) i1108 19:34:57.899129  4277 net.cpp:150] setting conv9 i1108 19:34:57.899138  4277 net.cpp:157] top shape: 1 1 55 74 (4070)

the value of num_output of last connected layer not 1 pixel wise prediction. equal w*h of input image.

what made feel value 1?

edit 1:

below dimensions of each layer mentioned in link1 page 3 figure:

layer        output dim [c*h*w] course1     96*h1*w1     conv layer course2     256*h2*w2    conv layer course3     384*h3*w3    conv layer course4     384*h4*w4    conv layer course5     256*h5*w5    conv layer course6     4096*1*1     fc layer course7     x*1*1        fc layer    'x' interpreted w*h

to understand further, lets assume have network predict pixels of image. images of size 10*10. thus, final output of fc layer having dimension 100*1*1(like in course7). interpreted 10*10.

now question be, how can 1d array predict 2d image correctly. this, have note loss calculated output, using labels corresponding pixel data. during training, weights learn predict pixel data.

edit 2:

trying draw net using draw_net.py in caffe, gives this:

the relu layer connected conv6 , fc6 has same name, leading complicated connectivity in drawn image. not sure on whether cause issues during training, suggest rename 1 of relu layers unique name avoid unforseen issues.

coming question, there doesn't seem upsampling happening after connected layers. seen in log:

i1108 19:34:57.881680  4277 net.cpp:150] setting fc7 i1108 19:34:57.881718  4277 net.cpp:157] top shape: 1 4070 (4070)  i1108 19:34:57.881826  4277 net.cpp:150] setting reshape i1108 19:34:57.881846  4277 net.cpp:157] top shape: 1 1 55 74 (4070)  i1108 19:34:57.884768  4277 net.cpp:150] setting conv6 i1108 19:34:57.885309  4277 net.cpp:150] setting pool6 i1108 19:34:57.885327  4277 net.cpp:157] top shape: 1 63 55 74 (256410)

fc7 has output dimension of 4070*1*1. being reshaped 1*55*74 passed input conv6 layer.

the output of whole network produced in conv9, has output dimension of 1*55*74, similar dimension of labels (depth data).

please pinpoint on feel upsample happening, if answer still not clear.

Search This Blog

Alcombright

deep learning - caffe reshape / upsample fully connected layer -

Comments

Post a Comment

Popular posts from this blog

c# SetCompatibleTextRenderingDefault must be called before the first -

C#.NET Oracle.ManagedDataAccess ConfigSchema.xsd -

c++ - Fill runtime data at compile time with templates -