I have a CSV dataset like a bellow
f1 f2 f3 ... label1 label2
float float float ... 00010 00001
... 01000 00010
each label column is in binary format. I want to change one column to multiple columns with keeping the heading. each zeros and ones should be separate and placed in separate columns like bellow
f1 f2 f3 ... label1 label2
float float float ... 0,0,0,1,0 0,0,0,0,1
... 0,1,0,0,0 0,0,0,1,0
Could you guide me on how to do this in Python? thanks.
CodePudding user response:
You can do this very efficeintly with a pandas dataframe
like this:
(note: there are other methods, like read csv and edit each line).
import pandas as pd
# some test data using strings for their binary equivalents.
example_data = {'f':['a', 'b', 'c'], 'binary_data':['111','101','001']}
df = pd.DataFrame(example_data)
print(df)
def split_parts(row):
return [x for x in row['binary_data']]
df['split_data']=df.apply(split_parts, axis=1)
print(df)
print(type(df['split_data']))
This is the "sample" input:
f binarydata
0 a 111
1 b 101
2 c 100
This is the result:
f binary_data split_data
0 a 111 [1, 1, 1]
1 b 101 [1, 0, 1]
2 c 001 [0, 0, 1]
The column split_data
above is a list of strings each value representing each part of the binary data.