A More Diverse iOS Community

Diversity is important. In a fast-paced scale-up, it’s a goal that can be easy to forget about. However, this is exactly the right time to think about diversity. At N26, the iOS engineering team has…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




An overview of VGG16 and NiN models

This post aims to introduce briefly two classic convolutional neural networks, VGG16 and NiN (a.k.a Network in Network). We are going to discover their architectures as well as their implementations on the Keras platform. You can refer to my previous blogs for some related topics: Convolutional neural networks, LeNet, and Alexnet models.

In summary:

In this post, we only focus on the deployment of the VGG16, its architecture as well as its implementation on Keras. Other configurations are constructed similarly.

The structure of VGG16 is described by the following figure:

Figure 1: The architecture of VGG16. Source: Researchgate.net

VGG16 is composed of 13 convolutional layers, 5 max-pooling layers, and 3 fully connected layers. Therefore, the number of layers having tunable parameters is 16 (13 convolutional layers and 3 fully connected layers). That is the reason why the model name is VGG16. The number of filters in the first block is 64, then this number is doubled in the later blocks until it reaches 512. This model is finished by two fully connected hidden layers and one output layer. The two fully connected layers have the same neuron numbers which are 4096. The output layer consists of 1000 neurons corresponding to the number of categories of the Imagenet dataset. In the next section, we are going to implement this architecture on Keras.

Firstly, we need to import some necessary libraries:

Once all necessary libraries are ready, the model can be implemented by the following function:

Now, let’s see the detailed information in each layer of the model:

As the number of filters increases following the model depth, hence the number of parameters increases significantly in the later layers. Especially, the parameter number in the two fully connected hidden layers is very large, with 102, 764, 544, and 16, 781, 312 parameters, respectively. It accounts for 86.4% parameters of the whole model.

A large number of parameters may reduce the model performance. Sometimes, it leads to overfitting. A natural question arises: Is it possible to replace the fully connected layers with something to reduce the model complexity? The NiN model, which we are going to discuss in the next section, is an appropriate answer to this question.

Network in Network (NiN) is a deep convolutional neural network introduced by Min Lin, Qiang Chen, Shuicheng Fan [2]. The structure of this network is different from the classic CNN models:

The original NiN network is composed of four NiN blocks. Each block includes three convolutional layers:

Each NiN block is followed by a Max-pooling layer with pooling size 3 × 3, and strides of 2. Except the last block is followed by a Global Average Pooling layer.

This model is implemented easily by the following function:

Let’s consider the detailed information (i.e the output size and parameter number) for each layer of the model:

Remark that the global average pooling layer has no parameter. Hence using this layer instead of fully connected layers helps to reduce significantly the model complexity. The parameter number of this model is much smaller compared to the one of the VGG model.

Conclusion: We have discovered the architectures of VGG and NiN models. The construction of the VGG model is similar to the previous models, LeNet and Alexnet. They are all composed of convolutional layers, pooling layers, and terminated by fully connected layers. The reasonable depth extension of VGG makes it outperform the previous ones. However, the number of parameters in fully connected layers is too large, especially in cased of large-scale image processing. NiN overcomes this drawback by replacing these layers with a global average pooling layer. This layer has some advantages, it is more native to enforce the correspondences between feature maps and categories. Besides, there is no parameter to optimize in this layer. Hence, using this layer helps to avoid overfitting while training the model. The appearance of the NiN model is also an inspiration for the construction of later modern CNN models that we are going to discuss in the ne xt posts.

I hope that this post is helpful for you! Don’t hesitate to follow my medium blog to receive related topics.

Thanks for reading!

My blog page: https://lekhuyen.medium.com/

References:

[1] Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).

[2] Lin, Min, Qiang Chen, and Shuicheng Yan. “Network in network.” arXiv preprint arXiv:1312.4400 (2013).

Add a comment

Related posts:

Pixar storytelling. Activity 1.

We planned a two-day trip to the capital city on the New Year eve. A usual thing, you may say, but I had a strong feeling this journey would be special. We took our seats on a night express. I was…

this frozen death

you have taken my heart i once gave you openly, with love — and you murdered me with it. you owned it, and me, until it no longer served your needs. you took the part of me allowing me to…

Does your operating system run on hope?

If you are like me you if you don’t feel hopeful for your life, then you tend to feel depressed, hard on yourself, and you start to question your own worth. The more and more I live my life, grow as…