Home > other >  pandas dataframe manipulation with DictVectorize
pandas dataframe manipulation with DictVectorize

Time:08-25

What I have:

target var1 var2
1 jack, jones, phil en-us
0 don, sam, bob vn-en
1 jones, alex, sam en-us

What I want:

var1 target no_target var2
jones 2 0 en-us

target column is how much 'jones' appears when target is 1

so far I have tried to use DictVetorize, and it helped me to count number of times target equal to 1 for jones but I done know how to get var2 column

CodePudding user response:

IIUC, you can use:

df['target'].mul(df['var1'].str.count('jones')).sum()

output: 2

For a more generic method to get all names:

out = (df
 .assign(var1=df['var1'].str.split(',\s*').where(df['target'].eq(1)))
 .explode('var1')
 .groupby(['var1', 'var2'], as_index=False)
 .sum()
)

output:

    var1   var2  target
0   alex  en-us       1
1   jack  en-us       1
2  jones  en-us       2
3   phil  en-us       1
4    sam  en-us       1

counting target/no_target

(df
 .assign(var1=df['var1'].str.split(',\s*'),
         target=np.where(df['target'].eq(1), 'target', 'no_target'),
         value=1
        )
 .explode('var1')
 .pivot_table(index=['var1', 'var2'], columns='target', values='value',
              aggfunc='sum', fill_value=0)
 .reset_index().rename_axis(columns=None)
 .groupby(['var1'], as_index=False)
 .agg({'target': 'sum', 'no_target': 'sum', 'var2': 'first'})
)

output:

    var1  target  no_target   var2
0   alex       1          0  en-us
1    bob       0          1  vn-en
2    don       0          1  vn-en
3   jack       1          0  en-us
4  jones       2          0  en-us
5   phil       1          0  en-us
6    sam       1          1  en-us
  • Related