[mathjax]
Machine Learning (Stanford University) の受講メモ
Introduction
- Supervised Learning
- それぞれのサンプルに正解がある
- Unsupervised Learning
- クラスタリング、セグメント分け
- カクテルパーティー問題
- singular value decomposition
Linear Regression with One Variable
Model and Cost Function
- Regression Problem
- 予測が実数値
- 不動産価格を予想
- 写真から年齢を予想
- Classification Problem
- yが少数の離散値しか取れない
- 不動産のカテゴライズ
- 腫瘍が良性か悪性か
- 表現
- Training Set
- Learning Algorithm
- h: hypothesis(仮説)
- 適切な名前じゃないかもしれないが習慣的に
- predict
- how to represent h?
- \(h_\theta(x)=\theta_0+\theta_1x\)
- Shorthand: \(h(x)\)
- This is Linear regression
- cost function(squared error function, mean squared error)
- h(x) is close to y for training ex(x,y)
- goal: minimize: \[
J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2
\]
Parameter Learning
- gradient descent
- Have some function: \(J(\theta_0,\theta_1)\)
- Want min \(J(\theta_0,\theta_1)\)
- Outline:
- Start with some \(\theta_0,\theta_1\)
- Keep changing to reduce \(J(\theta_0,\theta_1)\) until end up at a minimum.
- gradient descent algorithm
- repeat until convergence
- \(\theta_j:=\theta_j-\alpha \frac{\partial}{\partial \theta_j}J(\theta_0,\theta_1)\)
- learning rate: \(\alpha\)
- simultaneous update: for j=0 and j=1
- if alpha is small, gradient descent can be slow.
- if alpha is too large, gradient descent can overshoot the minimum. may fail to converge収束, or even diverge発散.
- derivativeはステップごとに小さくなるのでalphaを変更する必要はない
- “Batch” Gradient Descent
- バッチ勾配降下法
- 教師データの全体を見るという意味でBatch
- 線形回帰モデルを
- repeat until convergence
Linear Algebra Review
Matrices and Vectors
Matrix Elements
\[
\begin{align}
A&=\begin{bmatrix}
1402 & 191\\
1371 & 821\\
949 & 1437\\
147 & 1448
\end{bmatrix}\\
A_{ij}&= i,j \text{ entry in the }i^{th}\text{ row, }j^{th} \text{column.}
\end{align}
\]
Vector: An n x 1 matrix.\[
\begin{align}
y&=\begin{bmatrix}
460\\
232\\
315\\
178
\end{bmatrix}\dots \text{4-dimensioned vector } \dots \mathbb{R}^4 \\
y_i&=i^{th}\text{ element }\\
y&=\begin{bmatrix}y_1\\ y_2\\ y_3\\ y_4\end{bmatrix}\dots \text{1-indexed} \dots y^{[1]}
\end{align}
\]
Addition and Scalar Multiplication
Matrix Addition
Scalar Multiplication
Combination of Operators
Matrix Vector Multiplication
\[
\begin{bmatrix}
1&3\\4&0\\2&1
\end{bmatrix}
\begin{bmatrix}
1\\5
\end{bmatrix}=
\begin{bmatrix}
16\\4\\7
\end{bmatrix}
\]
- House size:
- 2104
- 1416
- 1534
- 852
- \(h_\theta = -40+0.25x \)
- \[
\begin{bmatrix}
1&2104\\
1&1416\\
1&1534\\
1&852
\end{bmatrix}\times
\begin{bmatrix}
-40\\
0.25
\end{bmatrix}=
\begin{bmatrix}
h_\theta(2014)\\
h_\theta(1416)\\
h_\theta(1534)\\
h_\theta(852)
\end{bmatrix}
\]
Matrix Matrix Multiplication
- A x B = C
- (m,n) x (n,o) = (m,o)
- House size:
- 2104
- 1416
- 1534
- 852
- \(h_\theta^{[1]}(x) = -40+0.25x \)
- \(h_\theta^{[2]}(x) = 200+0.1x \)
- \(h_\theta^{[3]}(x) = -150+0.4x \)
- \[
\begin{bmatrix}
1&2104\\
1&1416\\
1&1534\\
1&852
\end{bmatrix}\times
\begin{bmatrix}
-40 & 200 & -150\\
0.25 & 0.1 & 0.4
\end{bmatrix}=
\begin{bmatrix}
h_\theta^{[1]}(2014)&h_\theta^{[2]}(2014)&h_\theta^{[3]}(2014)\\
h_\theta^{[1]}(1416)&h_\theta^{[2]}(1416)&h_\theta^{[3]}(1416)\\
h_\theta^{[1]}(1534)&h_\theta^{[2]}(1534)&h_\theta^{[3]}(1534)\\
h_\theta^{[1]}(852)&h_\theta^{[2]}(852)&h_\theta^{[3]}(852)
\end{bmatrix}
\]
Matrix Multiplication Properties
- A x B ≠ B x A
- A x (B x C)=(A x B) x C
- Identity Matrix
- Denoted \(I or I_{n\times n}\)
- For any matrix A
- \( A \cdot I = I \cdot A = A\)
Inverse and Transpose
\[
\begin{align}
1=\text{“identity”}\\
3(3^{-1})=3 \times \frac13 =1\\
0^{-1}: \text{undefined}
\end{align}
\]
Matrix inverse
If A is an m x m square matrix, and if it has an inverse
\[
\begin{align}
A A^{-1}=A^{-1}A=I
\end{align}
\]
Matrix Transpose
\[
\displaylines{
A_{m \times n} \text{ and }B=A^T\\
\text{then}\\
B_{n \times m} \text{ and }
B_{ij}=A_{ji}
}
\]