The import CSV
The from sklearn. Model_selection import train_test_split
# Read in the CSV file and put the features into the list of dict and a list of class label
The DataSet=open (r '/home/ly/Desktop/CHY/SCIENCE_DATA/Data_Set_01labelDel0Col CSV', 'rb')
Reader=CSV. Reader (DataSet) # this function can be read by line content
Headers=reader. Next () the first line of the # file, will comment out featureList header is printed in the
# print (headers)
# to create an empty list
FeatureList=[]
LabelList=[]
For the row in reader:
LabelList. Append (row [len (row) - 1)) # adds a list of tags to labelList value elements, will be the last column elements added to the labelList
RowDict={}
For I in range (0, len (row) - 1) : # minor cycle inside the large cycle, so the first cycle better circulation, continue to the next large cycle
RowDict [I]=row [I] # row [I] said a row (row) of the ith number
FeatureList. Append (rowDict)
FeatureList=[]
For s in featureList:
ChangeStrToFloat1={}
For t in s:
ChangeStrToFloat1 [t]=float (s [t])
FeatureList. Append (ChangeStrToFloat1)
Print FeatureList # & lt; The type 'list' & gt;
0-0 at dummyY=[0-0 at {}, {}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {at}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}, {1-0}]
X_train X_test, y_train, y_test=train_test_split (FeatureList dummyY, test_size=0.25, random_state=None)
Error:
X_train X_test, y_train, y_test=train_test_split (FeatureList dummyY, test_size=0.25, random_state=None)
ValueError: Found the input variables with the inconsistent Numbers of samples: [3384, 47]
I think it should be the problem of FeatureList, there are 47 dictionary in this list, there are 72 elements, each dictionary 47 * 72=3384.
However, I do not know how to correct...
CodePudding user response:
FeatureList dummyY, dimension is inconsistent