I have an original .csv file that pandas is reading as expected, and a pruned version of the file with duplicate rows removed and everything else the same. However pandas is reading the numerical values in the second file as strings and unable to perform maths operations on the dataframe.
df1 = pd.read_csv("file1.csv")
print(df1)
df1 = (df1 - df1.min())/(df1.max() - df1.min())
attr1 attr2 attr3 ... attr7 attr attr9
0 0.384 0.0893 -30.439 ... 75.499 140417 0
... ... ... ... ... ... ... ...
2109 0.745 0.5430 -8.137 ... 139.964 185267 1
[2110 rows x 11 columns]
Process finished with exit code 0
.
df2 = pd.read_csv("file2.csv")
print(df2)
df2 = (df2 - df2.min())/(df2.max() - df2.min())
attr1 attr2 attr3 ... attr7 attr8 attr9
0 0.866 0.7300 -8.201 ... 118.523 379266 2
.. ... ... ... ... ... ... ...
1853 0.377 0.0156 -28.435 ... 140.179 186331 0
[1853 rows x 11 columns]
TypeError: unsupported operand type(s) for -: 'str' and 'str'
CodePudding user response:
The function pd.read_csv
takes dtype
as input, so that you can specify the type of each column.
For example:
pd.read_csv('file1.csv', dtype={variable1: 'float'})
will read the column variable1
in as a floating type.
Alternatively, you can specify dtype
after reading in the file, as such:
df1['variable1'] = df1['variable1'].astype(float)