皮马印第安人糖尿病拟合分析(based on BackProp神经网络) - Blog of Mathias
Blog of Mathias Web Securtiy&Deep Learning
皮马印第安人糖尿病拟合分析(based on BackProp神经网络)
发表于: | 分类: 技术文章 | 评论:0 | 阅读:422
  • 数据来源于internet,应该是国外的一个医疗机构的数据

一共有8个特征

  1. Number of times pregnant
  2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. Diastolic blood pressure (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2-Hour serum insulin (mu U/ml)
  6. Body mass index (weight in kg/(height in m)^2)
  7. Diabetes pedigree function
  8. Age (years)
    数据类型大概是这样的
    1.jpg

首先要对数据做初步处理,使其能在matlab中充当样本矩阵
python源码如下

f=open('data.txt','a+');
value=f.read();
new_value=value.replace(",","\n");
print new_value;
output = open('new_data.txt', 'w');
output.writelines(new_value);
#data
f=open('new_data.txt','a+');
value=f.readlines();
num=1;
data=[];
lei=[];
for i in value:
    i=i.replace('\n','');
    if num%9==0:
        lei.append(i);
    else:
        data.append(i);
    num=num+1;
output1=open('features.txt','a+');
output2=open('class.txt','a+');
num=1;
for i in data:
    output1.write(i);
    if num%8==0:
        output1.write(';\n');
    else:
        output1.write(' ');
    num=num+1;
for i in lei:
    output2.write(i);
    output2.write(' ');
print lei;

得到了标准矩阵数据。但是注意因为matlab中一列为一个样本
所以要把矩阵进行转置处理
Matlab源码如下

C=[xxxxxxxx(省略)]
P=transpose(C)
T=[1 0 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 1 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 0 0 1 1 1 0 1 0 1 0 1 0 0 0 0 1 0]
Z=premnmx(P,T)
net=newff(minmax(Z),[12,6,1],{'logsig','tansig','logsig'},'traingdx') 
net.trainParam.show=100
net.trainParam.epochs=100000
net.trainParam.goal=0.001
net=train(net,Z,T)
y=sim(net,Z)

数据进行了归一化处理
由于是老版的matlab7 我不知道是不是这一步影响了数据。
之后的误差比较大
1.jpg

因为收敛很慢,所以进行超量训练。
最终随机选取几组数据进行测试(没有测试集)
拟合效果只能说一般
这是拟合得比较好的几组
1.jpg
2.jpg

这个典型的案例可以看做一个经典的分类器运用。主要是医学方面的病情诊断。

还不快抢沙发

添加新评论