Home > other >  separate values of labels in multiple sub columns with python
separate values of labels in multiple sub columns with python

Time:03-12

I have a CSV dataset like a bellow

f1       f2       f3   ...   label1    label2   
float   float    float ...   00010     00001
...                          01000     00010


each label column is in binary format. I want to change one column to multiple columns with keeping the heading. each zeros and ones should be separate and placed in separate columns like bellow

f1       f2       f3   ...   label1        label2   
float   float    float ...   0,0,0,1,0     0,0,0,0,1
...                          0,1,0,0,0     0,0,0,1,0

Could you guide me on how to do this in Python? thanks.

CodePudding user response:

You can do this very efficeintly with a pandas dataframe like this: (note: there are other methods, like read csv and edit each line).

import pandas as pd

# some test data using strings for their binary equivalents.
example_data = {'f':['a', 'b', 'c'], 'binary_data':['111','101','001']}

df = pd.DataFrame(example_data)

print(df)

def split_parts(row):
    return [x for x in row['binary_data']]


df['split_data']=df.apply(split_parts, axis=1)

print(df)
print(type(df['split_data']))

This is the "sample" input:

   f  binarydata
0  a         111
1  b         101
2  c         100

This is the result:

   f binary_data split_data
0  a         111  [1, 1, 1]
1  b         101  [1, 0, 1]
2  c         001  [0, 0, 1]

The column split_data above is a list of strings each value representing each part of the binary data.

  • Related