## 机器学习实战Ch05 多元线性回归

[ 隐藏 ]

### 1. 多元线性回归要解决的问题

$x_1$ $x_2$ $y$
1.2 2.3 1
2.5 2.2 0
1.4 2.1 1

### 2. sigmoid函数

$$g(z) = \frac{1}{1 + e^{-z}} \tag{1}$$

$$h_\theta(x)= g(\mathbf{\vec{\theta}}^\mathrm{T} \cdot \vec{x}) = \frac{1}{1 + e^{\mathbf{\vec{\theta}}^{\mathrm{T}} \cdot \vec{x}}} \tag{2}$$

$$\mathbf{\vec{\theta}}^\mathrm{T} \cdot \vec{x} = \begin{bmatrix} \theta_1 & \theta_2 \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \theta_1 \cdot x_1 + \theta_2 \cdot x_2 \\ \tag{3}$$

$x_0$ $x_1$ $x_2$ $y$
1.0 1.2 2.3 1
1.0 2.5 2.2 0
1.0 1.4 2.1 1

$$z_{\theta} = \mathbf{\vec{\theta}}^\mathrm{T} \cdot \vec{x} = \begin{bmatrix} \theta_0 & \theta_1 & \theta_2 \end{bmatrix} \cdot \begin{bmatrix} x_0 \\ x_1 \\ x_2 \end{bmatrix} = \theta_0 \cdot x_0 + \theta_1 \cdot x_1 + \theta_2 \cdot x_2 \tag{4}$$

$$h_\theta(x) = \frac{1}{1 + e^{- \left (\begin{bmatrix} \theta_0 & \theta_1 & \theta_2 \end{bmatrix} \cdot \begin{bmatrix} x_0 \\ x_1 \\ x_2 \end{bmatrix} \right )}} \tag{5}$$

$$h_\theta(x) = \frac{1}{1 + e^{-(\theta_0 \cdot x_0 + \theta_1 \cdot x_1 + \theta_2 \cdot x_2)}} \tag{6}$$

### 3. 极大似然估计

$$P(1|\vec{x};\theta) = h_\theta(x) \tag{7}$$

$$P(0|\vec{x};\theta) = 1 – h_\theta(x) \tag{8}$$

$$P(y|\vec{x};\theta) = [h_\theta(\vec{x})]^{y} \cdot [1 – h_\theta(\vec{x})]^{1-y} \tag{9}$$

$$L(\vec{\theta}) = p = \prod_{i=1}^3 [h_\theta(\vec{x}^{(i)})]^{y(i)} \cdot [1 – h_\theta(\vec{x}^{(i)})]^{1-y(i)} \tag{10}$$

$$l(\vec{\theta}) = \ln{L(\vec{\theta})} = \sum_{i=1}^3 y^{(i)}\ln{h_\theta(\vec{x}^{(i)})} + (1-y^{(i)})\ln{[1-h_\theta(\vec{x}^{(i)})]} \tag{11}$$

### 4. 梯度上升法求系数$\vec{\theta}$的最优解

1. 首先任取一组初始的$\theta_0,\theta_1,\theta_2$值,不妨去$(1,1,1)$,这里函数要在这个的初始点可微.
2. 求出这一点的梯度,分别对$\theta_0,\theta_1,\theta_2$求偏导数,可得梯度为$(1,3a\theta_1^2,2b\theta_2)$,梯度是一个方向,当$\theta_0,\theta_1,\theta_2$
往这个方向移动时,获得一组新的$\theta_0,\theta_1,\theta_2$,将这组值带入函数后,求的的函数的值是增大的.假如每次迭代时,朝梯度方向移动$\alpha$(很小的步长)个单位后,
得到的新的一组
\begin{aligned} \theta_0 &:=\theta_0 + 1 &\cdot \quad \alpha \\ \theta_1 &:= \theta_1 + 3a\theta_1^2 & \cdot \quad \alpha \\ \theta_2 &:= \theta_2 + 2b\theta_2 &\cdot \quad \alpha \end{aligned} \tag{12}
3. 重复上面的步骤,直到达到迭代次数或者符合要求的精度

$$\frac{\mathrm{d}y}{\mathrm{d}x} = f^\prime(u) \cdot g^\prime(x) \;或\;\frac{\mathrm{d}y}{\mathrm{d}x} = \frac{\mathrm{d}y}{\mathrm{d}u} \cdot \frac{\mathrm{d}u}{\mathrm{d}x}$$

(同济六版高等数学下册64页).

$$\frac{\partial l({\vec{\theta}})}{\partial \theta_{j}} = \frac{\partial l({\vec{\theta}})}{\partial g(\vec{\theta}^\mathrm{T} \cdot \vec{x})} \quad {\cdot} \quad \frac{\partial g(\vec{\theta}^\mathrm{T} \cdot \vec{x})}{\partial \vec{\theta}^\mathrm{T} \cdot \vec{x}} \quad \cdot \quad \frac{\partial \left ( \vec{\theta}^\mathrm{T} \cdot \vec{x}\right )}{\partial \theta_j} \tag{13}$$

• 第1部分

$$\frac{\partial l({\vec{\theta}})}{\partial g(\vec{\theta}^\mathrm{T} \cdot \vec{x})} = y \cdot \frac{1}{g(\mathbf{\vec{\theta}}^\mathrm{T} \cdot \vec{x})} + (1- y) \cdot \frac{1}{1 – g(\mathbf{\vec{\theta}}^\mathrm{T} \cdot \vec{x})} \tag{14.1}$$

• 第2部分,将$(\vec{\theta}^\mathrm{T} \cdot \vec{x})$看成一个整体由$z$表示,这里转化成求(1)式的导数$g(z) = \frac{1}{1 + e^{-z}}$,容易求得,
$g^\prime(z) = g(z) \cdot (1 – g(z))$,可见$g$的导数可以由它自己表示,可简化后面的计算,于是有:

$$\frac{\partial g(\vec{\theta}^\mathrm{T} \cdot \vec{x})}{\partial \vec{\theta}^\mathrm{T} \cdot \vec{x}} = g(\vec{\theta}^\mathrm{T} \cdot \vec{x}) \cdot \left (1 – g(\vec{\theta}^\mathrm{T} \cdot \vec{x}) \right ) \tag{14.2}$$

• 第3部分

$$\frac{\partial \left ( \vec{\theta}^\mathrm{T} \cdot \vec{x}\right )}{\partial \theta_j} = \frac{\partial (\theta_0 \cdot x_0 + \theta_1 \cdot x_1 + \theta_2 \cdot x_2)}{\partial \theta_j} = x_j \tag{14.3}$$

$$\frac{\partial l({\vec{\theta}})}{\partial \theta_{j}} =\sum_i^3 \left (y^{(i)} – h\theta (x^{(i)}) \right ) \cdot x_j \tag{15}$$

$$\theta_j := \theta_j + \alpha \cdot \sum_i^3 \left (y^{(i)} – h\theta (x^{(i)}) \right ) \cdot x_j \tag{16}$$

NOTE: 这里每一个带$i$标的表示从训练数据第$i$行拿到对应的数据参与计算,而$h\theta (x^{(i)})$时根据上一次计算出来的${\theta_0,\theta_1,\theta_2}$

weights = weights + alpha * dataMatrix.transpose() * error

### 5. 实战代码

#!/usr/bin/env python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets as skDs
import sklearn as skl

def main():
print(__file__)
# print(np.array(dataArr))
# print(np.array(labelMat))
# inX = np.mat([1, 2, 3])
# print(sigmoid(inX))
print(weights)
plotBestFit(weights.getA())
plotBestFit(weights2)

dataMat = []
labelMat = []
fr = open('testSet.txt')
lineArr = line.strip().split()
dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
labelMat.append(int(lineArr[2]))
return dataMat, labelMat

def sigmoid(inX):
return 1.0 / (1 + np.exp(-inX))

dataMatrix = np.mat(dataMatIn)
labelMat = np.mat(classLabels).transpose()
m, n = np.shape(dataMatrix)
alpha = 0.001
maxCycles = 500
weights = np.ones((n, 1))
for k in range(maxCycles):
h = sigmoid(dataMatrix * weights)
error = (labelMat - h)
weights = weights + alpha * dataMatrix.transpose() * error
return weights

m, n = np.shape(dataMatrix)
alpha = 0.01
weights = np.ones(n) # initialize to all ones
for i in range(m):
h = sigmoid(sum(dataMatrix[i] * weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
return weights

m, n = np.shape(dataMatrix)
weights = np.ones(n) # initialize to all ones
for j in range(numIter): # 迭代预定的次数
dataIndex = list(range(m)) # 获取一个训练数据集行标的数组
for i in range(m):
alpha = 4 / (1.0 + j + i) + 0.0001
# apha decreases with iteration, does not go to 0 because of the constant
randIndex = int(np.random.uniform(0, len(dataIndex)))
h = sigmoid(sum(dataMatrix[randIndex] * weights))
error = classLabels[randIndex] - h
weights = weights + alpha * error * np.array(dataMatrix[randIndex])
# 这里跟书上不一样, 改成 np.array(dataMatrix[randIndex]
# 不然报错 can't multiply sequence by non-int of type 'numpy.float64'
# 应该是Python和numpy的版本跟书中用的不一样了
del (dataIndex[randIndex])
return weights

def plotBestFit(weights):
dataArr = np.array(dataMat)
n = np.shape(dataArr)[0]
xcord1 = []
ycord1 = []
xcord2 = []
ycord2 = []
for i in range(n):
if int(labelMat[i]) == 1:
xcord1.append(dataArr[i, 1])
ycord1.append(dataArr[i, 2])
else:
xcord2.append(dataArr[i, 1])
ycord2.append(dataArr[i, 2])
fig = plt.figure()
ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
ax.scatter(xcord2, ycord2, s=30, c='green')

x = np.arange(-3.0, 3.0, 0.1)
y = (-weights[0] - weights[1] * x) / weights[2]
ax.plot(x, y)
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()

if __name__ == '__main__':
main()

微信赞赏  支付宝赞赏