I have this dataframe:
Text feat1 feat2 feat3 feat4
string1 1 1 0 0
string2 0 0 0 1
string3 0 0 0 0
I want to create 2 other columns this way:
Text feat1 feat2 feat3 feat4 all_feat count_feat
string1 1 1 0 0 ["feat1","feat2"] 2
string2 0 0 0 1 ["feat4"] 1
string3 0 0 0 0 [] 0
What's the best approach to do it in Python?
The columns names can be any string.
CodePudding user response:
You can use:
df1 = (df.filter(like='feat').mul(df.columns[1:]).apply(lambda x: [i for i in x if i], axis=1)
.to_frame('all_feat').assign(count=lambda x: x['all_feat'].str.len()))
df = pd.concat([df, df1], axis=1)
print(df)
# Output
Text feat1 feat2 feat3 feat4 all_feat count
0 string1 1 1 0 0 [feat1, feat2] 2
1 string2 0 0 0 1 [feat4] 1
2 string3 0 0 0 0 [] 0
CodePudding user response:
You can use a groupby
:
df2 = df.filter(like='feat').melt(ignore_index=False)
g = df2.groupby(level=0)
df['all_feat'] = g.apply(lambda g: list(g.loc[g['value'].eq(1), 'variable']))
df['count_feat'] = g['value'].sum()
output:
Text feat1 feat2 feat3 feat4 all_feat count_feat
0 string1 1 1 0 0 [feat1, feat2] 2
1 string2 0 0 0 1 [feat4] 1
2 string3 0 0 0 0 [] 0
CodePudding user response:
Check out this link
df = pd.DataFrame(data) # if this was your dataframe
new_col = ["item1", "item2"] # new col
df["new_col"] = new_col # add the new col