Machine Learning [Week 01/11] Introduction

[mathjax]

Machine Learning (Stanford University) の受講メモ

Introduction

Supervised Learning
- それぞれのサンプルに正解がある
Unsupervised Learning
- クラスタリング、セグメント分け
- カクテルパーティー問題
- singular value decomposition

Linear Regression with One Variable

Model and Cost Function

Regression Problem
- 予測が実数値
- 不動産価格を予想
- 写真から年齢を予想
Classification Problem
- yが少数の離散値しか取れない
- 不動産のカテゴライズ
- 腫瘍が良性か悪性か
表現
- Training Set
- Learning Algorithm
- h: hypothesis(仮説)
  - 適切な名前じゃないかもしれないが習慣的に
  - predict
how to represent h?
- \(h_\theta(x)=\theta_0+\theta_1x\)
- Shorthand: \(h(x)\)
- This is Linear regression
cost function(squared error function, mean squared error)
- h(x) is close to y for training ex(x,y)
- goal: minimize: \[
  J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2
  \]

Parameter Learning

gradient descent
- Have some function: \(J(\theta_0,\theta_1)\)
- Want min \(J(\theta_0,\theta_1)\)
- Outline:
  - Start with some \(\theta_0,\theta_1\)
  - Keep changing to reduce \(J(\theta_0,\theta_1)\) until end up at a minimum.
gradient descent algorithm
- repeat until convergence
  - \(\theta_j:=\theta_j-\alpha \frac{\partial}{\partial \theta_j}J(\theta_0,\theta_1)\)
  - learning rate: \(\alpha\)
  - simultaneous update: for j=0 and j=1
  - if alpha is small, gradient descent can be slow.
  - if alpha is too large, gradient descent can overshoot the minimum. may fail to converge収束, or even diverge発散.
  - derivativeはステップごとに小さくなるのでalphaを変更する必要はない
- “Batch” Gradient Descent
  - バッチ勾配降下法
  - 教師データの全体を見るという意味でBatch
  - 線形回帰モデルを

Linear Algebra Review

Matrices and Vectors

Matrix Elements

\[
\begin{align}
A&=\begin{bmatrix}
1402 & 191\\
1371 & 821\\
949 & 1437\\
147 & 1448
\end{bmatrix}\\
A_{ij}&= i,j \text{ entry in the }i^{th}\text{ row, }j^{th} \text{column.}
\end{align}
\]

Vector: An n x 1 matrix.\[
\begin{align}
y&=\begin{bmatrix}
460\\
232\\
315\\
178
\end{bmatrix}\dots \text{4-dimensioned vector } \dots \mathbb{R}^4 \\
y_i&=i^{th}\text{ element }\\
y&=\begin{bmatrix}y_1\\ y_2\\ y_3\\ y_4\end{bmatrix}\dots \text{1-indexed} \dots y^{[1]}
\end{align}
\]

Addition and Scalar Multiplication

Matrix Addition

Scalar Multiplication

Combination of Operators

Matrix Vector Multiplication

\[
\begin{bmatrix}
1&3\\4&0\\2&1
\end{bmatrix}
\begin{bmatrix}
1\\5
\end{bmatrix}=
\begin{bmatrix}
16\\4\\7
\end{bmatrix}
\]

House size:
- 2104
- 1416
- 1534
- 852
\(h_\theta = -40+0.25x \)
\[
\begin{bmatrix}
1&2104\\
1&1416\\
1&1534\\
1&852
\end{bmatrix}\times
\begin{bmatrix}
-40\\
0.25
\end{bmatrix}=
\begin{bmatrix}
h_\theta(2014)\\
h_\theta(1416)\\
h_\theta(1534)\\
h_\theta(852)
\end{bmatrix}
\]

Matrix Matrix Multiplication

A x B = C
(m,n) x (n,o) = (m,o)

House size:
- 2104
- 1416
- 1534
- 852
\(h_\theta^{[1]}(x) = -40+0.25x \)
\(h_\theta^{[2]}(x) = 200+0.1x \)
\(h_\theta^{[3]}(x) = -150+0.4x \)
\[
\begin{bmatrix}
1&2104\\
1&1416\\
1&1534\\
1&852
\end{bmatrix}\times
\begin{bmatrix}
-40 & 200 & -150\\
0.25 & 0.1 & 0.4
\end{bmatrix}=
\begin{bmatrix}
h_\theta^{[1]}(2014)&h_\theta^{[2]}(2014)&h_\theta^{[3]}(2014)\\
h_\theta^{[1]}(1416)&h_\theta^{[2]}(1416)&h_\theta^{[3]}(1416)\\
h_\theta^{[1]}(1534)&h_\theta^{[2]}(1534)&h_\theta^{[3]}(1534)\\
h_\theta^{[1]}(852)&h_\theta^{[2]}(852)&h_\theta^{[3]}(852)
\end{bmatrix}
\]

Matrix Multiplication Properties

A x B ≠ B x A
A x (B x C)=(A x B) x C

Identity Matrix
- Denoted \(I or I_{n\times n}\)
- For any matrix A
- \( A \cdot I = I \cdot A = A\)

Inverse and Transpose

\[
\begin{align}
1=\text{“identity”}\\
3(3^{-1})=3 \times \frac13 =1\\
0^{-1}: \text{undefined}
\end{align}
\]

Matrix inverse

If A is an m x m square matrix, and if it has an inverse
\[
\begin{align}
A A^{-1}=A^{-1}A=I
\end{align}
\]

Matrix Transpose

\[
\displaylines{
A_{m \times n} \text{ and }B=A^T\\
\text{then}\\
B_{n \times m} \text{ and }
B_{ij}=A_{ji}
}
\]