ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page
  • Mobilenet v1
  • Depthwise Separable Convolution.
  • Depth Multiplier: Thinner Models
  • Mobilenet v2
  • Inverted residuals

Was this helpful?

  1. Neural Networks

Mobilenet

PreviousResnetNextL1 and L2 Loss

Last updated 3 years ago

Was this helpful?

Mobilenet v1

Depthwise Separable Convolution.

Standard convolutions have the computational cost of :

DK⋅DK⋅M⋅N⋅DF⋅DFD_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_FDK​⋅DK​⋅M⋅N⋅DF​⋅DF​

where the computational cost depends multiplicatively onthe number of input channels M, the number of output channe is N, the kernel size DK⋅DKD_K \cdot D_KDK​⋅DK​ and the feature map size DF⋅DFD_F \cdot D_FDF​⋅DF​.

Depthwise convolution is extremely efficient relative to standard convolution. However it only filters input channels, it does not combine them to create new features. So an additional layer that computes a linear combination ofthe output of depthwise convolution via 1×11 \times 11×1 convolutionis needed in order to generate these new features.

The combination of depthwise convolution and 1×11 \times 11×1 (pointwise) convolution is called depthwise separable con-volution.

Depthwise separable convolutions cost:

DK⋅DK⋅M⋅DF⋅DF+⋅M⋅N⋅DF⋅DFD_K \cdot D_K \cdot M \cdot D_F \cdot D_F + \cdot M \cdot N \cdot D_F \cdot D_FDK​⋅DK​⋅M⋅DF​⋅DF​+⋅M⋅N⋅DF​⋅DF​
  • DFD_{F}DF​ is the spatial width and height of a square input feature map1

  • MMM is the number of input channels (input depth)

  • DGD_{G}DG​ is the spatial width and height of a square output feature map

  • NNN is the number of output channel (output depth).

Depth Multiplier: Thinner Models

For a given layer, and depth multiplier α\alphaα, the number of input channels MMM becomes αM\alpha MαM and the number of output channels NNN becomes αN\alpha NαN

Mobilenet v2

Inverted residuals

  • Use shortcuts directly between the bottlenecks.

  • The ratio between the size of the input bottleneck and the inner size as the expansion ratio.

    In which, when stride = 1

    def bottleneck_block(x, expand=64, squeeze=16):
        m = Conv2D(expand, (1,1))(x)
        m = BatchNormalization()(m)
        m = Activation('relu6')(m)
        m = DepthwiseConv2D((3,3))(m)
        m = BatchNormalization()(m)
        m = Activation('relu6')(m)
        m = Conv2D(squeeze, (1,1))(m)
        m = BatchNormalization()(m)
        return Add()([m, x])

    when stride = 2, no shortcut

  • Why using expansion ratio = 6 and use relu with expanded dimension and then use shortcuts directly between the bottlenecks?

    • From the paper, the author summarized that:

      1. If the manifold of interest remains non-zero volume after ReLU transformation, it corresponds to a linear transformation.

      2. ReLU is capable of preserving complete information about the input manifold, but only if the input manifold lies in a low-dimensional subspace of the input space.

    • if we have lots of channels, and there is a a structure in the activation manifold that information might still be preserved in the other channels.

    • inspired by the intuition that the bottlenecks actually contain all the necessary information, while an expansion layer acts merely as an implementation detail that accompanies a non-linear transformation of the tensor, we use shortcuts directly between the bottlenecks.

  • Comparison of Mobilenet v1 and Mobilenet v2

The bottleneck blocks appear similar to residual block where each block contains an input followed by several bottlenecks then followed by expansion. detail code .

here
inverted residuals in mobilenet v2
mobilenet v2 structure
image-20190720172622388
mv1v2