Home > Software design >  pandas translate from a column that is a list to create new columns with all options as a binary yes
pandas translate from a column that is a list to create new columns with all options as a binary yes

Time:11-03

given the data set

#Create Series
s = pd.Series([[1,2,3,],[1,10,11],[2,11,12]],['buz','bas','bur'])
k = pd.Series(['y','n','o'],['buz','bas','bur'])

#Create DataFrame df from two series
df = pd.DataFrame({'first':s,'second':k})

I was able to create new columns based on all possible values of 'first'

def text_to_list(df,col):
    val=df[col].explode().unique()
    return val

unique=text_to_list(df,'first')

for options in unique :
    df[options]=0


now I need to check off (or turn the value to '1') in each row and column where that value exists in the original list of 'first'

I'm pretty sure its a combination of .isin and/or .apply, but i'm struggling

the end result should be for row
buz: cols 1,2,3 are 1
bas: cols 1,10,11 are 1
bur: cols 2,11,12 are 1

          first   second  1  2  3  10  11  12
buz     [1, 2, 3]      y  1  1  1   0   0   0
bas    [1, 10, 11]     n  1  0  0   1   1   0
bur    [2, 11, 12]     o  0  1  0   0   1   1

adding the solution provided by -https://stackoverflow.com/users/3558077/ashutosh-porwal


df1=df.join(pd.get_dummies(df['first'].apply(pd.Series).stack()).sum(level=0))
print(df1)

Note: this solution did not require my hack job of creating the columns beforehand by explode column 'first'

CodePudding user response:

From your update it seems that what you need is simply:

for opt in unique :
    df[opt]=df['first'].apply(lambda x: int(opt in x))

Output:

           first second  1  2  3  10  11  12
buz    [1, 2, 3]      y  1  1  1   0   0   0
bas  [1, 10, 11]      n  1  0  0   1   1   0
bur  [2, 11, 12]      o  0  1  0   0   1   1

CodePudding user response:

Use pd.merge and pivot_table:

out = df.reset_index().explode('first') \
        .pivot_table(values='index', index='second', columns='first',
                     aggfunc='any', fill_value=False, sort=False).astype(int)

out = df.merge(out, on='second')

Output:

>>> out
         first second  1  2  3  10  11  12
0    [1, 2, 3]      y  1  1  1   0   0   0
1  [1, 10, 11]      n  1  0  0   1   1   0
2  [2, 11, 12]      o  0  1  0   0   1   1

CodePudding user response:

Data:

>>> import pandas as pd
>>> s = pd.Series([[1,2,3,],[1,10,11],[2,11,12]],['buz','bas','bur'])
>>> k = pd.Series(['y','n','o'],['buz','bas','bur'])
>>> df = pd.DataFrame({'first':s,'second':k})
>>> df
           first second
buz    [1, 2, 3]      y
bas  [1, 10, 11]      n
bur  [2, 11, 12]      o

Solution:

>>> df[df['first'].explode().to_list()] = 0
>>> df = df[['first', 'second']].join(df.apply(lambda x:x.loc[x['first']], axis=1).replace({0 : 1, np.nan : 0}).astype(int))
>>> df 

           first second  1  2  3  10  11  12
buz    [1, 2, 3]      y  1  1  1   0   0   0
bas  [1, 10, 11]      n  1  0  0   1   1   0
bur  [2, 11, 12]      o  0  1  0   0   1   1

  • Related