カテゴリー
機械学習

Machine Learning [Week 2/11] Linear Regression with Multiple Variables

Machine Learning (Stanford University) の受講メモ

Linear Regression with Multiple Variables

Multivariate Linear Regression

  • Hypothesis: \[
    h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_nx_n\\
    x_0=1 とすると\\
    x=\begin{bmatrix}
    x_0\\
    x_1\\
    \vdots\\
    x_n\\
    \end{bmatrix}\in\Bbb R^{n+1},
    \theta=\begin{bmatrix}
    \theta_0\\
    \theta_1\\
    \vdots\\
    \theta_n\\
    \end{bmatrix}\in\Bbb R^{n+1}\\
    h_\theta(x)=\theta_0x_0+\theta_1x_1+\theta_2x_2+\dots+\theta_nx_n\\
    =\theta^{\mathrm{T}}x
    \]
  • Cost function:\[
    J(\theta_0,\theta_1,\dots,\theta_n)=J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2
    \]
  • Gradient descent:
    • repeat until convergence
      • \[
        \theta_j:=\theta_j-\alpha \frac{\partial}{\partial \theta_j}J(\theta)\\
        = \theta_j-\alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
        \]
      • simultaneous update: $$\theta_j for (j=0,\dots,n)$$
    • Feature Scaling
      • Idea: Make sure features are on a similar scale.
      • Get every feature into approximately \(-1<=x1<=1\) range.
        • -3 to 3 is safe
        • 1/3 to 1/3 is safe
    • Mean normalization
      • Replace \(x_i\) with \(x_i-\mu_i\) to make features have approximately zero mean(Do not apply to \(x_0 = 1\)).
      • \(x_1\gets\frac{x_1-\mu_1}{s_1}\)
        • \(\mu_1\): avg value of \(x_1\) training set.
        • \(s_1\): max-min or standard diviation標準偏差
    • Debugging gradient descent: Making sure gradient descent is working correctly.
      • If \(\alpha\) is too small: slow convergence.
      • If \(\alpha\) is too large: \(J(\theta)\) may not decrease on envery iteration; may not converge.
    • Automatic convergence test: Declare convergence if \(J(\theta)\) decreases by less than E in one iteration,where E is some small value such as \(10^{-3}\).
      • \(\alpha\)のとり方は結構難しいので、自動収束テストに頼るよりも、プロットしてみたほうが洞察が得られやすい
  • Features and Polynomial Regression
    • predict a house’s price insight:
      • \(h_\theta(x)=\theta_0+\theta_1(size)+\theta_2(size)^2+\theta_3(size)^3\)
        • range of x: 1 to \((1000,1000^2,1000^3)\)
      • \(h_\theta(x)=\theta_0+\theta_1(size)+\theta_2\sqrt{(size)}\)
    • change the behavior or curve
      • quadratic, cubic or square root function or any other form.
      • We can combine multiple features into one.
      • In this way feature scaling becomes very important.

Computing Parameters Analytically

  • versus
    • Gradient Descent
      • Need to choose \alpha
      • Needs many iterations
      • \(O(kn^2)\)
      • Works well even when n is large
      • \(n=10^6~\)
    • Normal Equation
      • No need to choose \alpha
      • No need to iterate
      • Need to compute \((X^TX)^-1\)
        • \(O(n^3)\)
      • Slow if n is very large
        • n=100,1000 ok
        • n=10000 …ok
  • Normal Equation の非可逆性(おまけ)
    • \(\theta=(X^\mathrm{T}X)^{-1}X^{T}y\)
    • What if \(X^TX\) is non-invertible? (singular/degenerate)
    • Octave: pinv(X’*X)*X’*y
      • pinv
        • pesudo inverse
        • non-inverseでも\(\theta\)が求められる
      • inv
        • inverse
    • \(X^TX\) is non-invertible?
      • Redundant features(linearly dependent)
        • 互いに線形関数になっている
      • Too many features(e.g. m<=n)
        • delete some features, or use regularization.

Octave/Matlab Tutorial

講義動画はOctaveを使ってるが、環境によっては提出時に問題が起きやすいらしく、現在はMatlabで資料が用意されている。

この講座の受講者用のMatlabライセンスが提供されているので、ブラウザ上で動くMatlab onlineが利用できる。

Vectorization

  • Gradient descent: repeat for all j\[
    \theta_j:=\theta_j-\alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
    \]
  • Simultaneous updates:\[
    \theta_0:=\theta_0-\alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}\\
    \theta_1:=\theta_1-\alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_1^{(i)}\\
    \theta_2:=\theta_2-\alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_2^{(i)}\\
    \]
  • Vectorized implementation:\[
    \begin{align*}
    &\theta:=\theta-\alpha\delta\\
    &\delta= \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x^{(i)}\\
    &\theta\in\Bbb R^{n+1}\\
    &\delta=\begin{bmatrix}
    \delta_0\\
    \delta_1\\
    \delta_2\end{bmatrix}\in\Bbb R^{n+1}\\
    &\alpha\in\Bbb R\\
    &(h_\theta(x^{(i)})-y^{(i)})\in\Bbb R\\
    &x^{(i)}=\begin{bmatrix}
    x^{(0)}\\
    x^{(1)}\\
    x^{(2)}
    \end{bmatrix}\in\Bbb R^{n+1}\\
    \end{align*}
    \]

コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です