I have a Pandas Dataframe that has a list column. I'd like to split this list column into multiple column based on the value. Returning yes_value or no_value for each record based on the column name.
Example input:
id | values
---|----------
1 | [A,B,C,D]
2 | [D,E,F]
3 | [A,D]
4 | [K]
Expected output:
id | values | A | B | C | D | E | F | K |
---|----------|-------|-------|-------|-------|-------|-------|-------|
1 | [A,B,C,D]| yes_A | yes_B | yes_C | yes_D | no_E | no_F | no_K |
2 | [D,E,F] | no_A | no_B | no_C | yes_D | yes_E | yes_F | no_K |
3 | [A,D] | yes_A | no_B | no_C | yes_D | no_E | no_F | no_K |
4 | [K] | no_A | no_B | no_C | no_D | no_E | no_F | yes_K |
CodePudding user response:
Code:
#Input
df = pd.DataFrame({'id':[1,2,3,4], 'Values':[['A','B','C','D'], ['D','E','F'], ['A','D'], ['K']]})
#STEP 1 merging the Values column all lists into one and findout the unique values using set
#so the output will be {'A', 'B', 'C', 'D', 'E', 'F', 'K'} and then looping on it as below
for i in sorted(set(sum(df['Values'].tolist(),[]))):
#STEP 2 Creating new column and check if column in list or not
df[i] = df['Values'].apply(lambda x: f'yes_{i}' if i in x else f'no_{i}')
df
Output:
id Values A B C D E F K
0 1 [A, B, C, D] yes_A yes_B yes_C yes_D no_E no_F no_K
1 2 [D, E, F] no_A no_B no_C yes_D yes_E yes_F no_K
2 3 [A, D] yes_A no_B no_C yes_D no_E no_F no_K
3 4 [K] no_A no_B no_C no_D no_E no_F yes_K
CodePudding user response:
You can use a crosstab
to reshape:
df2 = df.explode('values')
df3 = pd.crosstab(df2['id'], df2['values']).replace({0: 'no_', 1: 'yes_'})
out = df.merge(df3.add(df3.columns), left_on='id', right_index=True)
Or str.get_dummies
:
df2 = df['values'].agg('|'.join).str.get_dummies().replace({0: 'no_', 1: 'yes_'})
out = df.join(df2.add(df2.columns))
output:
id values A B C D E F K
0 1 [A, B, C, D] yes_A yes_B yes_C yes_D no_E no_F no_K
1 2 [D, E, F] no_A no_B no_C yes_D yes_E yes_F no_K
2 3 [A, D] yes_A no_B no_C yes_D no_E no_F no_K
3 4 [K] no_A no_B no_C no_D no_E no_F yes_K