3. Gradient Descent (Basic)
3. Gradient Descent (Basic)
Created: August 4, 2021 10:05 AM
Reviewed: No
- Differentiation
$$f'(x) = \lim\limits_{h\to0} {f(x+h)-f(x)\over h}$$
Numpy: sympy.diff
import sympy as sym
from sympy.abc import x
sym.diff(sym.poly(x**2 + 2*x + 3), x) # Differenciate by x
Poly(2*x + 2, x, domain = 'zz')
- Gradient ascent
$$x' = x+f'(x)$$
1) f'(x) < 0
x' = x + f'(x) < x
2) f'(x) > 0
x' = x + f'(x) > x
x' moves in the direction where f(x) increases.
- This does not guarantee that f(x') > f(x).
→ Gradient ascent is used to find the local maximum.
- Gradient descent
$$x' = x-f'(x)$$
1) f'(x) < 0
x' = x - f'(x) > x
2) f'(x) > 0
x' = x - f'(x) < x
x' moves in the direction where f(x) decreases.
- This does not guarantee that f(x') < f(x).
→ Gradient descent is used to find the local minimum.
- Learning rate
$$x' = x \pm \lambda f'(x)$$
$\lambda$ is learning rate.
- Partial derivative
$${\partial \over \partial_{x_i}}f(\bold x) = \lim\limits_{h\to0} {f(\bold x +h \bold e_i)-f(\bold x)\over h}$$
$$\bold e_1 = [1, 0,0,0,..., 0, 0, 0, ... , 0]$$
$$\bold e_i = [0, 0,0,0,..., 1, 0, 0, ... , 0] $$
$$\bold e_d = [0, 0,0,0,..., 0, 0, 0, ... , 1] $$
- Gradient vector
$$\nabla f = ({\partial \over \partial_{x_1}}f, {\partial \over \partial_{x_2}}f, ..., {\partial \over \partial_{x_i}}f, ..., {\partial \over \partial_{x_d}}f) $$
- $\nabla$ is nabla
Use $\nabla f$ instead of $f'(x)$ to update $\bold x = (x_1,..., x_d)$ at the same time.
$-\nabla f$ is the fastest decreasing direction.