Home >  > 利用SVM(支持向量机)预测股价

利用SVM(支持向量机)预测股价

0

简要介绍:
使用SKlearn和LinearRegression两种方法进行股价预测,最后对两种方法的结果进行对比。

一、安装sklearn
在网上查了一下,要安装sklearn,需要先安装scipy,
要安装scipy,需要先安装numpy+mkl,不过我在自己的电脑上测试一下,发现可以直接import sklearn,所以一切OK。

二、测试
1.代码出错
然后测试代码,打印几次之后,报错:

raise klass(message, resp.status_code, resp.text, resp.headers, code)
quandl.errors.quandl_error.LimitExceededError: (Status 429) (Quandl Error QELx01
) You have exceeded the anonymous user limit of 50 calls per day. To make more c
alls today, please register for a free Quandl account and then include your API
key with your requests.

好吧,我去注册账号吧,结果发现注册到step 3 of 3的时候,发现无法显示验证码,重新换一个浏览器也不行。

好吧,改用 米筐的数据吧,这一次发现自己找的代理可以使用https://www.2980.com来注册邮箱。这样就解决了以后注册米筐的问题。

好吧,这台电脑上好像又没装rqdata,继续改用joinquant,成功下载到了数据。

三、小知识
1.SVM(SVR)算法
是英文support vector machine的缩写,翻译成中文为支持向量机。它是一种分类算法,但是也可以做回归,根据输入的数据不同可做不同的模型(若输入标签为连续值则做回归,若输入标签为分类值则用SVC()做分类)

SVR(kernel='rbf', C=1e3, gamma=0.1)
(1)kernel:核函数的类型,一般常用的有’rbf’,’linear’,’poly’,等如图4-1-2-1所示,发现使用rbf参数时函数模型的拟合效果最好。

(2)C:惩罚因子。C表征你有多么重视离群点,C越大越重视,越不想丢掉它们。C值大时对误差分类的惩罚增大,C值小时对误差分类的惩罚减小。当C越大,趋近无穷的时候,表示不允许分类误差的存在,margin越小,容易过拟合;当C趋于0时,表示我们不再关注分类是否正确,只要求margin越大,容易欠拟合。如图4-1-2-2所示发现当使用1e3时最为适宜。

(3)gamma: 是’rbf’,’poly’和’sigmoid’的核系数且gamma的值必须大于0。随着gamma的增大,存在对于测试集分类效果差而对训练分类效果好的情况,并且容易泛化误差出现过拟合。如图4-1-2-3所示发现gamma=0.01时准确度最高。

可参考:https://blog.csdn.net/qq_16633405/article/details/70243030

四、源码:

import quandl
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split

#Get the stock data
# df = quandl.get("WIKI/AMZN")
# Take a look at the data
# print(df.head())

from jqdatasdk import *
auth('138xxxxxxx','a...4')
# quotes = pd.read_csv('002.csv')  # 读取训练数据

stock_code = '601318.XSHG'
start_date = '2016-02-05'
end_date = '2017-02-07'



df = get_price(stock_code, start_date, end_date, frequency='daily',skip_paused=False,fq='pre',fields=['open','high','low','close','money'])
print(df.head())

df = df[['close']]
print(df.head())


# A variable for predicting 'n' days out into the future
forecast_out = 30 #'n=30' days
#Create another column (the target or dependent variable) shifted 'n' units up
df['Prediction'] = df[['close']].shift(-forecast_out)
#print the new data set
print(df.tail())


### Create the independent data set (X)  #######
# Convert the dataframe to a numpy array
X = np.array(df.drop(['Prediction'],1))

#Remove the last 'n' rows
X = X[:-forecast_out]
# print(X)


### Create the dependent data set (y)  #####
# Convert the dataframe to a numpy array (All of the values including the NaN's)
y = np.array(df['Prediction'])
# Get all of the y values except the last 'n' rows
y = y[:-forecast_out]
# print(y)


# Split the data into 80% training and 20% testing
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and train the Support Vector Machine (Regressor)
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
svr_rbf.fit(x_train, y_train)

# Testing Model: Score returns the coefficient of determination R^2 of the prediction. 
# The best possible score is 1.0
svm_confidence = svr_rbf.score(x_test, y_test)
print("svm confidence: ", svm_confidence)


# Create and train the Linear Regression  Model
lr = LinearRegression()
# Train the model
lr.fit(x_train, y_train)

# Testing Model: Score returns the coefficient of determination R^2 of the prediction. 
# The best possible score is 1.0
lr_confidence = lr.score(x_test, y_test)
print("lr confidence: ", lr_confidence)


# Set x_forecast equal to the last 30 rows of the original data set from Adj. Close column
x_forecast = np.array(df.drop(['Prediction'],1))[-forecast_out:]
# print(x_forecast)


# Print linear regression model predictions for the next 'n' days
lr_prediction = lr.predict(x_forecast)
print("linear regression model predictions for the next n days:\n",lr_prediction)
#将numpy数据存为csv
np.savetxt("lr_prediction.csv", lr_prediction, delimiter=',')

# Print support vector regressor model predictions for the next 'n' days
svm_prediction = svr_rbf.predict(x_forecast)
print("support vector regressor model predictions for the next n days:\n",svm_prediction)
#将numpy数据存为csv
np.savetxt("svm_prediction.csv", svm_prediction, delimiter=',')

#Original stock price
z = np.array(df['close'])
print("Original stock price for the next n days:\n",z[-forecast_out:])
np.savetxt("z.csv", z[-forecast_out:], delimiter=',')

五、执行结果

https://www.youtube.com/watch?v=EYnC4ACIt2g&t=9s

六、分析

很明显,在执行“df['Prediction'] = df[['close']].shift(-forecast_out)”这句的时候,其实就是将未来30天的价格移到了现在,又去预测未来价格,使用了未来函数。

参考资料:
https://www.joinquant.com/view/community/detail/3f5f73ce173b157abac9aef8c0a71a96

https://www.youtube.com/watch?v=JgC9BEVS9Tk
https://www.youtube.com/watch?v=wxKLtaUryCw

https://www.youtube.com/watch?v=zwqwlR48ztQ

暧昧帖

本文暂无标签

发表评论

*

*