Which activation function is commonly used in hidden layers of deep neural networks due to its ability to mitigate vanishing gradient problems?
Explanation:
ReLU (Rectified Linear Unit) is popular for hidden layers because it accelerates convergence by avoiding vanishing gradients.
ReLU (Rectified Linear Unit) is popular for hidden layers because it accelerates convergence by avoiding vanishing gradients.