What do convolutions do




















In deep learning, a convolutional neural network CNN or ConvNet is a class of deep neural networks, that are typically used to recognize patterns present in images but they are also used for spatial data analysis, computer vision, natural language processing, signal processing, and various other purposes The architecture of a Convolutional Network resembles the connectivity pattern of neurons in the Human Brain and was inspired by the organization of the Visual Cortex.

This specific type of Artificial Neural Network gets its name from one of the most important operations in the network: convolution. What Is a Convolution? Convolutions have been used for a long time typically in image processing to blur and sharpen images, but also to perform other operations.

CNNs make use of filters also known as kernels , to detect what features, such as edges, are present throughout an image. Convolutional layers apply a convolution operation to the input, passing the result to the next layer.

A convolution converts all the pixels in its receptive field into a single value. For example, if you would apply a convolution to an image, you will be decreasing the image size as well as bringing all the information in the field together into a single pixel.

The final output of the convolutional layer is a vector. The deeply complex hierarchical structure of neurons and connections in the brain play a major role in this process of remembering and labelling objects. Think about how we learned what, for example, an umbrella is.

Or a duck, lamp, candle, or book. In the beginning, our parents or family told us the name of the objects in our direct environment. We learned by examples that were given to us. Slowly but surely we started to recognise certain things more and more often in our environment. They became so common that the next time we saw them, we would instantly know what the name of this object was.

They became part of our model on the world. Similar to how a child learns to recognise objects, we need to show an algorithm millions of pictures before it is be able to generalize the input and make predictions for images it has never seen before. Their world consists of only numbers. Every image can be represented as 2-dimensional arrays of numbers, known as pixels.

We just have to think of what an image is in a different way. Their name stems from one of the most important operations in the network: convolution. Convolutional Neural Networks are inspired by the brain. Research in the s and s by D. H Hubel and T. N Wiesel on the brain of mammals suggested a new model for how mammals perceive the world visually. They showed that cat and monkey visual cortexes include neurons that exclusively respond to neurons in their direct environment.

In their paper , they described two basic types of visual neuron cells in the brain that each act in a different way: simple cells S cells and complex cells C cells.

The simple cells activate, for example, when they identify basic shapes as lines in a fixed area and a specific angle. The complex cells have larger receptive fields and their output is not sensitive to the specific position in the field. The complex cells continue to respond to a certain stimulus, even though its absolute position on the retina changes. Complex refers to more flexible, in this case.

In vision , a receptive field of a single sensory neuron is the specific region of the retina in which something will affect the firing of that neuron that is, will active the neuron. Every sensory neuron cell has similar receptive fields, and their fields are overlying.

Further, the concept of hierarchy plays a significant role in the brain. Information is stored in sequences of patterns, in sequential order. The neocortex , which is the outermost layer of the brain, stores information hierarchically. It is stored in cortical columns, or uniformly organised groupings of neurons in the neocortex. In , a researcher called Fukushima proposed a hierarchical neural network model. He called it the neocognitron. This model was inspired by the concepts of the Simple and Complex cells.

The neocognitron was able to recognise patterns by learning about the shapes of objects. Their first Convolutional Neural Network was called LeNet-5 and was able to classify digits from hand-written numbers.

For the entire history on Convolutional Neural Nets, you can go here. In the remainder of this article, I will take you through the architecture of a CNN and show you the Python implementation as well. Convolutional Neural Networks have a different architecture than regular Neural Networks. Regular Neural Networks transform an input by putting it through a series of hidden layers.

Every layer is made up of a set of neurons , where each layer is fully connected to all neurons in the layer before. Finally, there is a last fully-connected layer — the output layer — that represent the predictions. Convolutional Neural Networks are a bit different. First of all, the layers are organised in 3 dimensions : width, height and depth.

Further, the neurons in one layer do not connect to all the neurons in the next layer but only to a small region of it. Lastly, the final output will be reduced to a single vector of probability scores, organized along the depth dimension. In this part, the network will perform a series of convolutions and pooling operations during which the features are detected. So to do this, we're going to break out some more trigonometric identities. And once again, this is just a trig identity that you'll find really in the inside cover of probably your calculus book.

So we can make this substitution here, make this substitution right there, and then let's see what our integrals become. So the first one over here, let me just write it here. We get sine of t times the integral from 0 to t of this thing here. So 1 plus cosine of 2 tau and all of that is d tau. That's this integral right there. And then we have this integral right here, minus cosine of t times the integral from-- let me be very clear.

This is tau is equal to 0 to tau is equal to t. And then this thing right here, I did some u subsitution. If u is equal to sine of t, then this becomes u. And we showed that du is equal to cosine-- sorry, u is equal to sine of tau. And then we showed that du is equal to cosine tau d tau, so this thing right here is equal to du. So it's u du, and let's see if we can do anything useful now. So this integral right here, the antiderivative of this is pretty straightforward, so what are we going to get?

Let me write this outside part. And now let me take the antiderivative of this. This is going to be tau plus the antiderivative of this. I mean, we could have done the u substitution. And you're going to evaluate that from 0 to t. And then we have minus cosine of t.

When we take the antiderivative of this-- let me do this on the side. So the integral of u du, that's trivially easy. It was sine of tau. And we're going to evaluate that from 0 to t. And we didn't even have to do all this u substitution. The way I often do it in my head, I see the sine of tau, cosine of tau.

So it looks like we're in the home stretch. We're taking the convolution of sine of t with cosine of t. Now, if I evaluate this thing at t, what do I get? So this part right here, this whole thing right there, what does that simplify to? Well this is 0, sine of 0 is 0, so this is all 0. All right, now what does this one simplify to over here? Well, this one over here, you have minus cosine of t.

So far, everything that we have written simplifies to-- let me multiply it all out. I just took the minus cosine t and multiplied it through here and I got that.

Now, this is a valid answer, but I suspect that we can simplify this more, maybe using some more trigonometric identities. And this guy right there looks ripe to simplify. And we know that the sine of 2t-- another trig identity you'll find in the inside cover of any of your books-- is 2 times the sine of t times the cosine of t.

So if you substitute that there, what does our whole expression equal? You get this first term. Let me scroll down a little bit. Just a trig identity, nothing more than that. No one ever said this was going to be easy, but hopefully it's instructive on some level.



0コメント

  • 1000 / 1000