カテゴリー

# Machine Learning [Week 4/11] Neural Networks: Representation

• 内容
• Neural network forward propagation
• Multiple classification using logistic regression and the one-vs-all technique
• メモ
• この講座だと、$$x_0$$ としてb(ias)定義し、$$\Theta$$のDimensionを求めている。
• $$z=wx+b$$
• $$w.shape:(n^{[l]},n^{[l-1]})$$
• $$\Theta.shape:(s_{j+1},s_j+1)$$: +1はbias

## Motivations

### Non-linear Hypotheses

Picture: feature N is too large.

### Neurons and the Brain

Origins: Algorithms that try to mimic the brain. 80s-early 90s; popularity diminished in late 90s.
Recent resurgence: State-of-the-art technique for many applications.

• The “one learning algorithm” hypothesis
• Auditory cortex learns to see.
• Somatosensory cortex learns to see.
• Sensor representations in the brain
• Human echolocation (sonar).
• Haptic belt: Direction sense.
• Implanting a 3rd eye.
• センサーは脳のたいていの場所に繋げて、扱い方学習する
• コースではNNは人工知能のためではなく、機械学習の応用として扱う

## Neural Networks

• Neurons in the brain
• Neuron model: Logistic unit
• $$x_0=1$$: bias unit
• 便利だから
• $$g(z)$$: Sigmoid(logistic)activation function.
• $$\theta$$: weight
• label
• \begin{align*}& a_i^{(j)} = \text{“activation” of unit i in layer j} \newline& \Theta^{(j)} = \text{matrix of weights controlling function mapping from layer j to layer j+1}\end{align*}
• one hidden layer
• $\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline x_3\end{bmatrix}\rightarrow\begin{bmatrix}a_1^{(2)} \newline a_2^{(2)} \newline a_3^{(2)} \newline \end{bmatrix}\rightarrow h_\theta(x)$
• The values for each the “activation” nodes
• \begin{align*} a_1^{(2)} = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \newline a_2^{(2)} = g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3) \newline a_3^{(2)} = g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \newline h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \newline \end{align*}
• The dimensions of these matrices of weights
• layer $$j$$ unit $$s_j$$ to layer $$j+1$$ unit $$s_{j+1}$$ $\Theta^{(j)}\in\Bbb R^{( s_{j+1} \times (s_j + 1) )}$
• Forward propagation: Vectorized implementation
• \begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newline a_3^{(2)} = g(z_3^{(2)}) \newline \end{align*}
• layer j=2 and node k, the variable z will be: $z_k^{(2)} = \Theta_{k,0}^{(1)}x_0 + \Theta_{k,1}^{(1)}x_1 + \cdots + \Theta_{k,n}^{(1)}x_n$
• Vector representation of x and z^j\begin{align*}x = \begin{bmatrix}x_0 \newline x_1 \newline\cdots \newline x_n\end{bmatrix} &z^{(j)} = \begin{bmatrix}z_1^{(j)} \newline z_2^{(j)} \newline\cdots \newline z_n^{(j)}\end{bmatrix}\end{align*}
• $z^{(j)} = \Theta^{(j-1)}a^{(j-1)}$
• $a^{(j)} = g(z^{(j)})$
• $z^{(j+1)} = \Theta^{(j)}a^{(j)}$
• $h_\Theta(x) = a^{(j+1)} = g(z^{(j+1)})$

## Applications

### Examples and Intuitions I

• Function: x1 and x2 then true;
• The graph of our functions will look like:\begin{align*}\begin{bmatrix}x_0 \newline x_1 \newline x_2\end{bmatrix} \rightarrow\begin{bmatrix}g(z^{(2)})\end{bmatrix} \rightarrow h_\Theta(x)\end{align*}
• first matrix:$\Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20\end{bmatrix}$
• \begin{align*}& h_\Theta(x) = g(-30 + 20x_1 + 20x_2) \newline \newline & x_1 = 0 \ \ and \ \ x_2 = 0 \ \ then \ \ g(-30) \approx 0 \newline & x_1 = 0 \ \ and \ \ x_2 = 1 \ \ then \ \ g(-10) \approx 0 \newline & x_1 = 1 \ \ and \ \ x_2 = 0 \ \ then \ \ g(-10) \approx 0 \newline & x_1 = 1 \ \ and \ \ x_2 = 1 \ \ then \ \ g(10) \approx 1\end{align*}

### Examples and Intuitions II

• FunctionをAND,NOR,ORとして直感的な理解を

### Multiclass Classification

• We can define our set of resulting classes as y:$y^{(i)}=\begin{bmatrix}1\\0\\0\\0\end{bmatrix}, \begin{bmatrix}0\\1\\0\\0\end{bmatrix}, \begin{bmatrix}0\\0\\1\\0\end{bmatrix}, \begin{bmatrix}0\\0\\0\\1\end{bmatrix}$
• Our final hypothesis function setup:$\begin{bmatrix}x_0\\x_1\\x_2\\x_3\end{bmatrix}\rightarrow \begin{bmatrix}a_0^{[2]}\\ a_1^{[2]}\\a_2^{[2]}\\ a_3^{[2]}\end{bmatrix} \rightarrow \begin{bmatrix} a_0^{[3]}\\ a_1^{[3]} \\ a_2^{[3]}\\ a_3^{[3]}\end{bmatrix} \rightarrow\dots\rightarrow\begin{bmatrix}h_\Theta (x)_1\\ h_\Theta (x)_2 \\ h_\Theta (x)_3 \\ h_\Theta (x)_4 \end{bmatrix}$
• Our resulting hypothesis for one set of inputs may look like:$h_\Theta(x) =\begin{bmatrix}0 \newline 0 \newline 1 \newline 0 \newline\end{bmatrix}$