Predict Boston housing prices using first-order linear equations
The loaded data is released with sklearn, and comes from the data and prices of 506 houses collected by boston before 1993. load_boston() is used to load data.
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import time
from sklearn.linear_model import LinearRegression
boston =load_boston()
X = boston.data
y = boston.target
print("X.shape:{}. y.shape:{}".format(X.shape, y.shape))print('boston.feature_name:{}'.format(boston.feature_names))
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.2, random_state=3)
model =LinearRegression()
start = time.clock()
model.fit(X_train, y_train)
train_score = model.score(X_train, y_train)
cv_score = model.score(X_test, y_test)print('time used:{0:.6f}; train_score:{1:.6f}, sv_score:{2:.6f}'.format((time.clock()-start),
train_score, cv_score))
The output content is:
X.shape:(506,13). y.shape:(506,)
boston.feature_name:['CRIM''ZN''INDUS''CHAS''NOX''RM''AGE''DIS''RAD''TAX''PTRATIO''B''LSTAT']
time used:0.012403; train_score:0.723941, sv_score:0.794958
It can be seen that the accuracy rate on the test set is not high, it should be under-fitting.
Use polynomials for linear regression
The above example is under-fitting, indicating that the model is too simple to fit the data. Now increase model complexity and introduce polynomials.
For example, if the original feature is two features [a, b],
When the degree is 2, the polynomial characteristic becomes [1, a, b, a^2, ab, b^2]. The case where degree is other values can be deduced by analogy.
The polynomial feature is equivalent to increasing the complexity of the data and the model, enabling better fitting.
The following code uses Pipeline to connect the polynomial feature and the linear regression feature, and finally tests the score when the degree is 1, 2, and 3.
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import time
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
def polynomial_model(degree=1):
polynomial_features =PolynomialFeatures(degree=degree, include_bias=False)
linear_regression =LinearRegression(normalize=True)
pipeline =Pipeline([('polynomial_features', polynomial_features),('linear_regression', linear_regression)])return pipeline
boston =load_boston()
X = boston.data
y = boston.target
print("X.shape:{}. y.shape:{}".format(X.shape, y.shape))print('boston.feature_name:{}'.format(boston.feature_names))
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.2, random_state=3)for i inrange(1,4):print('degree:{}'.format( i ))
model =polynomial_model(degree=i)
start = time.clock()
model.fit(X_train, y_train)
train_score = model.score(X_train, y_train)
cv_score = model.score(X_test, y_test)print('time used:{0:.6f}; train_score:{1:.6f}, sv_score:{2:.6f}'.format((time.clock()-start),
train_score, cv_score))
The output is:
X.shape:(506,13). y.shape:(506,)
boston.feature_name:['CRIM''ZN''INDUS''CHAS''NOX''RM''AGE''DIS''RAD''TAX''PTRATIO''B''LSTAT']
degree:1
time used:0.003576; train_score:0.723941, sv_score:0.794958
degree:2
time used:0.030123; train_score:0.930547, sv_score:0.860465
degree:3
time used:0.137346; train_score:1.000000, sv_score:-104.429619
You can see that a degree of 1 is the same as not using a polynomial above. The score of degree is 3 on the training set is 1, and the score on the test set is negative, which is obviously over-fitting.
Therefore, a model with a degree of 2 should be selected in the end.
The second-order polynomial is much better than the first-order polynomial, but there is still a big gap between the scores on the test set and the training set. This may be the reason for insufficient data. More information is needed to further improve the accuracy of the model.
Comparison of normal equation solution and gradient descent
In addition to the gradient descent method to approximate the optimal solution, the final solution can also be directly calculated using the regular equation solution method.
**According to Wu Enda's course, the optimal solution of linear regression is: **
theta = (X^T * X)^-1 * X^T * y
In fact, the two methods have their own advantages and disadvantages:
Gradient descent method:
Disadvantages: need to choose learning rate, need multiple iterations
Advantages: It can still work at a good speed when there are many eigenvalues (more than 10,000)
Normal equation solution:
Advantages: no need to set learning rate, no need for multiple iterations
Disadvantages: need to calculate the transpose and inverse of X, complexity O3; especially slow when there are many eigenvalues (more than 10,000)
In nonlinear calculations such as classification, the normal equation solution method is not applicable, so the gradient descent method has a wider range of applications.
The above sklearn+python: linear regression case is all the content shared by the editor, I hope to give you a reference.