3. Gradient Descent (Basic)

Created: August 4, 2021 10:05 AM
Reviewed: No

Differentiation

$$f'(x) = \lim\limits_{h\to0} {f(x+h)-f(x)\over h}$$

Numpy: sympy.diff

import sympy as sym
from sympy.abc import x

sym.diff(sym.poly(x**2 + 2*x + 3), x)  # Differenciate by x

Poly(2*x + 2, x, domain = 'zz')

Gradient ascent

$$x' = x+f'(x)$$

1) f'(x) < 0

x' = x + f'(x) < x

2) f'(x) > 0

x' = x + f'(x) > x

x' moves in the direction where f(x) increases.

This does not guarantee that f(x') > f(x).

→ Gradient ascent is used to find the local maximum.

Gradient descent

$$x' = x-f'(x)$$

1) f'(x) < 0

x' = x - f'(x) > x

2) f'(x) > 0

x' = x - f'(x) < x

x' moves in the direction where f(x) decreases.

This does not guarantee that f(x') < f(x).

→ Gradient descent is used to find the local minimum.

Learning rate

$$x' = x \pm \lambda f'(x)$$

$\lambda$ is learning rate.

Partial derivative

$${\partial \over \partial_{x_i}}f(\bold x) = \lim\limits_{h\to0} {f(\bold x +h \bold e_i)-f(\bold x)\over h}$$

$$\bold e_1 = [1, 0,0,0,..., 0, 0, 0, ... , 0]$$

$$\bold e_i = [0, 0,0,0,..., 1, 0, 0, ... , 0] $$

$$\bold e_d = [0, 0,0,0,..., 0, 0, 0, ... , 1] $$

Gradient vector

$$\nabla f = ({\partial \over \partial_{x_1}}f, {\partial \over \partial_{x_2}}f, ..., {\partial \over \partial_{x_i}}f, ..., {\partial \over \partial_{x_d}}f) $$

$\nabla$ is nabla

Use $\nabla f$ instead of $f'(x)$ to update $\bold x = (x_1,..., x_d)$ at the same time.

$-\nabla f$ is the fastest decreasing direction.