Statitical Learning Theory Note by Hisashi Kashima

The course was taught in Spring 2017. The course material is here.

Apr. 17
1. Represent discrete input by one-hot encoding
Regularization in Ridge regression
1. Include penalty on the norm of weights $w$ to avoid instability

Apr. 17

Represent discrete input by one-hot encoding

If an input has $N$ possible discrete value, we should encode the $i$-th value as

\[(0, \dots, \stackrel{i}{1}, \dots, 0)\]

In essence, we introduce an $N$-dimensional binary-valued subspace for the input.

It is not possible to represent it as an variable in $\mathbb{Z}_N$ because it is embedded in $\mathbb{R}$ and will introduce magnitude.

Regularization in Ridge regression

Ridge regression share the idea of weight decay in machine learning. But their starting points differ.

Include penalty on the norm of weights $w$ to avoid instability

when we introduce the regularization term, it will lead to new solution.

\[w = (X^T X + \lambda I)^{-1} X y\]

I originally thought that a small enough $\lambda$ should both ensure the stability of solution and approximate the original solution. But actually the $\lambda$ here can be plugged back to the original loss function. The loss function will become

\[L(w) = ||y-Xw||_2 + \lambda||w||_2\]

The latter one is simply the weight decaying factor in machine learning.

Written on April 17th, 2017 by Hanezu

Feel free to share!

Don't worry. Think, and do.

Blog Tech 随记

Statitical Learning Theory Note by Hisashi Kashima

Apr. 17

Represent discrete input by one-hot encoding

Regularization in Ridge regression

Include penalty on the norm of weights $w$ to avoid instability

Statitical Learning Theory Note by Hisashi Kashima

Apr. 17

Represent discrete input by one-hot encoding

Regularization in Ridge regression

Include penalty on the norm of weights $w$ to avoid instability

You may also enjoy:

Music Information Retrieval Notes

Vectorized cs231n Beautifully with numpy

Replace derivatives in back propagation