I'm trying to tidy up a long and messy csv file a bit, but my method doesn't seem to work until I tried splitting the raw data into several files. Just wondering if anyone can see what goes wrong here?
The original file looks like this, except there are 600 rows:
Code Item Size Location Available
DD2 Cap Blue S NY 3
DD2 Cap Blue S NY 6
DD2 Cap Blue S CA 18
DD2 Cap Blue S PA 20
DD3 Cap Blue L CA 5
DA5 Tee Red S NY 1
DA7 Tee White S PA 203
DA7 Tee White S PA 204
I would like to turn it into:
Code Item Size Location Available
DD2 Cap Blue S NY 9
CA 18
PA 20
DD3 Cap Blue L CA 5
DA5 Tee Red S NY 1
DA7 Tee White S PA 407
, so that I can then use the pivot_table function to make it tidy.
The method I'm using is
df2 = df.groupby(['Code', 'Item', 'Size', 'Location'])['Available'].sum()
print(df2)
However, pandas merges the values in 'Available' as the numbers are plain text, i.e. the result looks like
Code Item Size Location Available
DD2 Cap Blue S NY 36
CA 18
PA 20
DD3 Cap Blue L CA 5
DA5 Tee Red S NY 1
DA7 Tee White S PA 203204
What I can't get my head around is, if I split the data, say I only take 20 rows out and run the command, it would work perfectly.
I'm very new to python and pandas, any help is appreciated. Thanks in advance.
CodePudding user response:
Change the data-type of the Available
column, e.g. by:
df2["Available"] = df2["Available"].values.astype(float)