Home > Blockchain >  How to insert a list in a dataframe pandas
How to insert a list in a dataframe pandas

Time:10-19

I have the following dataframe

id  rule1 rule2 rule3
1   True  True  False
2   True  True  True
3   False False False
4   False True  False
5   True  False True
..

and a dictionary:

{'rule1': 'Rule one', 'rule2': 'Rule two', 'rule3': 'Rule three'}

And I want to get an additional column list_of_rules, which is a list of rules from the dictionary that are True in the dataframe above.

id  rule1 rule2 rule3  list_of_rules
1   True  True  False  ['Rule one', 'Rule two']
2   True  True  True   ['Rule one', 'Rule two', 'Rule three']
3   False False False  ['']
4   False True  False  ['Rule two']
5   True  False True   ['Rule one', 'Rule three']
..

So far, I have the following solution:

df.loc[df['rule1'] == True, 'rule1'] = 'Rule one'
df.loc[df['rule2'] == True, 'rule2'] = 'Rule two'
df.loc[df['rule3'] == True, 'rule3'] = 'Rule three'

df.loc[df['rule1'] == False, 'rule1'] = ''
df.loc[df['rule2'] == False, 'rule2'] = ''
df.loc[df['rule3'] == False, 'rule3'] = ''

df['list_of_rules'] = df[['rule1', 'rule2', 'rule3']].apply("-".join, axis=1).str.strip('-').str.split('-')

df

which gives the following output:

id  rule1 rule2 rule3  list_of_rules
1   True  True  False  ['Rule one', 'Rule two']
2   True  True  True   ['Rule one', 'Rule two', 'Rule three']
3   False False False  ['']
4   False True  False  ['Rule two']
5   True  False True   ['Rule one', , 'Rule three']
..

Is there a way to fix fifth line, so there would be no double commas? Also, I would like to use the dictionary that I have above directly.

Thank you in advance

CodePudding user response:

Given:

df = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'rule1': {0: True, 1: True, 2: False, 3: False, 4: True}, 'rule2': {0: True, 1: True, 2: False, 3: True, 4: False}, 'rule3': {0: False, 1: True, 2: False, 3: False, 4: True}})
df.set_index("id", inplace=True)
d = {'rule1': 'Rule one', 'rule2': 'Rule two', 'rule3': 'Rule three'}

You can use this:

df.apply(lambda row: [d[rule] for rule in df.columns if row[rule]], axis=1)

Or can use the fact that True evaluates to 1 and False to 0.

df.mul(d).apply(lambda row: list(filter(None, row)), axis=1)

Both gives you the desired output without having to deal with join and strip and split.

CodePudding user response:

Try this using a little trick with pandas.Dataframe.dot:

import pandas as pd

data_dict = {'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
             'rule1': {0: True, 1: True, 2: False, 3: False, 4: True},
             'rule2': {0: True, 1: True, 2: False, 3: True, 4: False},
             'rule3': {0: False, 1: True, 2: False, 3: False, 4: True}}
df = pd.DataFrame(data_dict)
df = df.set_index('id')
d = {'rule1': 'Rule one', 'rule2': 'Rule two', 'rule3': 'Rule three'}
dfr = df.rename(columns=d)
df['list_of_rules'] = dfr.dot(dfr.columns '-').str.strip('-').str.split('-')
df.reset_index()

Output:

   id  rule1  rule2  rule3                     list_of_rules
0   1   True   True  False              [Rule one, Rule two]
1   2   True   True   True  [Rule one, Rule two, Rule three]
2   3  False  False  False                                []
3   4  False   True  False                        [Rule two]
4   5   True  False   True            [Rule one, Rule three]
  • Related