I have the following dataframe
id rule1 rule2 rule3
1 True True False
2 True True True
3 False False False
4 False True False
5 True False True
..
and a dictionary:
{'rule1': 'Rule one', 'rule2': 'Rule two', 'rule3': 'Rule three'}
And I want to get an additional column list_of_rules
, which is a list of rules from the dictionary that are True
in the dataframe above.
id rule1 rule2 rule3 list_of_rules
1 True True False ['Rule one', 'Rule two']
2 True True True ['Rule one', 'Rule two', 'Rule three']
3 False False False ['']
4 False True False ['Rule two']
5 True False True ['Rule one', 'Rule three']
..
So far, I have the following solution:
df.loc[df['rule1'] == True, 'rule1'] = 'Rule one'
df.loc[df['rule2'] == True, 'rule2'] = 'Rule two'
df.loc[df['rule3'] == True, 'rule3'] = 'Rule three'
df.loc[df['rule1'] == False, 'rule1'] = ''
df.loc[df['rule2'] == False, 'rule2'] = ''
df.loc[df['rule3'] == False, 'rule3'] = ''
df['list_of_rules'] = df[['rule1', 'rule2', 'rule3']].apply("-".join, axis=1).str.strip('-').str.split('-')
df
which gives the following output:
id rule1 rule2 rule3 list_of_rules
1 True True False ['Rule one', 'Rule two']
2 True True True ['Rule one', 'Rule two', 'Rule three']
3 False False False ['']
4 False True False ['Rule two']
5 True False True ['Rule one', , 'Rule three']
..
Is there a way to fix fifth line, so there would be no double commas? Also, I would like to use the dictionary that I have above directly.
Thank you in advance
CodePudding user response:
Given:
df = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'rule1': {0: True, 1: True, 2: False, 3: False, 4: True}, 'rule2': {0: True, 1: True, 2: False, 3: True, 4: False}, 'rule3': {0: False, 1: True, 2: False, 3: False, 4: True}})
df.set_index("id", inplace=True)
d = {'rule1': 'Rule one', 'rule2': 'Rule two', 'rule3': 'Rule three'}
You can use this:
df.apply(lambda row: [d[rule] for rule in df.columns if row[rule]], axis=1)
Or can use the fact that True evaluates to 1 and False to 0.
df.mul(d).apply(lambda row: list(filter(None, row)), axis=1)
Both gives you the desired output without having to deal with join and strip and split.
CodePudding user response:
Try this using a little trick with pandas.Dataframe.dot
:
import pandas as pd
data_dict = {'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'rule1': {0: True, 1: True, 2: False, 3: False, 4: True},
'rule2': {0: True, 1: True, 2: False, 3: True, 4: False},
'rule3': {0: False, 1: True, 2: False, 3: False, 4: True}}
df = pd.DataFrame(data_dict)
df = df.set_index('id')
d = {'rule1': 'Rule one', 'rule2': 'Rule two', 'rule3': 'Rule three'}
dfr = df.rename(columns=d)
df['list_of_rules'] = dfr.dot(dfr.columns '-').str.strip('-').str.split('-')
df.reset_index()
Output:
id rule1 rule2 rule3 list_of_rules
0 1 True True False [Rule one, Rule two]
1 2 True True True [Rule one, Rule two, Rule three]
2 3 False False False []
3 4 False True False [Rule two]
4 5 True False True [Rule one, Rule three]