20

# What's the Deal with Activation Functions in Neural Networks?

**Even though activation functions are not the most exciting topic in machine learning, they are an important component of neural networks. This blog post will cover what activation functions are and how to choose one. There is also a bonus section on common mistakes with activation functions!
Activation Functions: A Neural Network's "On" Switch When you build a neural network, each neuron has two inputs - its own weight multiplied by its corresponding input value from another neuron - and then it sums those values up with an activation function that turns the sum into either 0 or 1. The out**

The activation function is responsible for turning the sum of the weighted input values into either 0 or a positive number. If the summed value is greater than 0, then the output of the neuron will be set to a positive number. If the summed value is equal to or less than 0, then the neuron's output will be set to 0.

The basic reason for having an activation function in a neural network is that it makes sure that you don't accidentally activate every neuron, which would result in overfitting. This means training your model as many times as possible on as much data as possible and making sure its prediction error+the difference between the prediction it makes and the real value is as low as possible.

There are many commonly used activation functions in neural networks: sigmoid, hyperbolic tangent (tanh), ReLU, leaky ReLu, softmax and others.

There's even a library for recurrent neural networks, called PyBrain which implements a variety of different activation functions.

**Sigmoid activation function**

Sigmoid is the simplest - it's just a step function with an offset:

f(x)=max(-0.01, 0.01+exp(-abs(x))).

Yes, you read that right!

It takes your input value and transforms it into a number between -0.01 and 0.01, which is then used to decide whether or not the neuron should be activated.

**Hyperbolic tangent (tanh) activation function**

Tanh is a little more complicated than the sigmoid function, but not by much.

It's what you get when you apply two hyperbolic tangents to your input:

f(x)=max(-0.01, 0.99+exp(-abs(x))- exp(-abs(x)).

It has been proven to work better when training neural networks than sigmoid or ReLU (see here for more information).

**ReLU (rectified linear unit) activation function**

The ReLU activation function is one of the most popular functions in use today:

f(x)=max(0, x), for all x.

It's a very simple function - all it does is set any negative number to 0 and then return them input values that are always positive, like in images or text data.

**Leaky ReLU activation function**

Leaky ReLU is similar to the regular ReLU activation function, but with a small "leakiness" factor:

f(x)=max(0, x)+alpha*(x-min(0, x)).

This allows for some values to be negative and activates the neuron if the input value is above 0.

**Softmax activation function**

The softmax activation function is used for multi-class classification problems:

f(x)=exp(x)/sum_i exp(xi), for all x.

This ensures that the sum of all outputs from the neuron is always equal to one and that the probability of each class is represented in proportion.

There are many different activation functions in use today, but the most popular ones are sigmoid, ReLU and Tanh. It's important to choose an activation function that will work best for your specific problem. For more information on how to choose the right one, check out this article.

- Forgetting to add an activation function
- Using the wrong activation functions for your problem (sigmoid is not a good choice if you're working with images, and ReLU isn't great when training recurrent neural networks)
- Not understanding how the activation function works and what it does to your data.

Be sure to do your research before choosing an activation function, so you can be sure you're using the right one for your problem!

I hope you found this blog post interesting and informative. If you did, please share it with your friends so more people can learn about neural networks! Thanks for reading.

20