AI Tech - Naver Boostcamp/1. Python & AI Math

3. Gradient Descent (Basic)

Pinocchio 2021. 8. 9. 17:22

3. Gradient Descent (Basic)

Created: August 4, 2021 10:05 AM
Reviewed: No

  1. Differentiation

$$f'(x) = \lim\limits_{h\to0} {f(x+h)-f(x)\over h}$$

Numpy: sympy.diff

import sympy as sym
from sympy.abc import x

sym.diff(sym.poly(x**2 + 2*x + 3), x)  # Differenciate by x
Poly(2*x + 2, x, domain = 'zz')
  1. Gradient ascent

$$x' = x+f'(x)$$

1) f'(x) < 0

x' = x + f'(x) < x

2) f'(x) > 0

x' = x + f'(x) > x

x' moves in the direction where f(x) increases.

  • This does not guarantee that f(x') > f(x).

→ Gradient ascent is used to find the local maximum.

  1. Gradient descent

$$x' = x-f'(x)$$

1) f'(x) < 0

x' = x - f'(x) > x

2) f'(x) > 0

x' = x - f'(x) < x

x' moves in the direction where f(x) decreases.

  • This does not guarantee that f(x') < f(x).

→ Gradient descent is used to find the local minimum.

  1. Learning rate

$$x' = x \pm \lambda f'(x)$$

$\lambda$ is learning rate.

  1. Partial derivative

$${\partial \over \partial_{x_i}}f(\bold x) = \lim\limits_{h\to0} {f(\bold x +h \bold e_i)-f(\bold x)\over h}$$

$$\bold e_1 = [1, 0,0,0,..., 0, 0, 0, ... , 0]$$

$$\bold e_i = [0, 0,0,0,..., 1, 0, 0, ... , 0] $$

$$\bold e_d = [0, 0,0,0,..., 0, 0, 0, ... , 1] $$

  1. Gradient vector

$$\nabla f = ({\partial \over \partial_{x_1}}f, {\partial \over \partial_{x_2}}f, ..., {\partial \over \partial_{x_i}}f, ..., {\partial \over \partial_{x_d}}f) $$

  • $\nabla$ is nabla

Use $\nabla f$ instead of $f'(x)$ to update $\bold x = (x_1,..., x_d)$ at the same time.

$-\nabla f$ is the fastest decreasing direction.