Home > other >  Create new column based on the sum of selected columns in python
Create new column based on the sum of selected columns in python

Time:05-19

I have a dataframe like this

ID  Q001    Q002    Q003    Q004    Q005    Q006    Q007    Q008    Win
A   1       1       1       1       1       1       1       1       Yes
B   0       1       0       1       0       1       0       1       No
C   0       1       0       1       0       1       0       1       No
D   1       1       0       1       1       1       1       1       Yes
E   1       1       0       1       1       1       1       1       Yes
F   1       1       0       1       1       1       1       1       Yes
G   0       0       1       0       0       0       0       0       No
H   0       0       0       0       0       0       0       0       No
I   1       0       1       0       1       0       1       0       No

In the above dataframe, I want to create the colum 'Win' and assign the values 'Yes' if the sum of Q001 and Q002 is equal or higher than 2 and 'No', if lower than 2. How can I do this in Python?

CodePudding user response:

Use np.where() to return a value conditional on other columns.

df['Win'] = np.where(df['Q001']   df['Q002'] >= 2, 'Yes', 'No')

CodePudding user response:

Check

df['Win'] = np.where(df[['Q001','Q002']].sum(1)>=2,'Yes','No')
df
Out[680]: 
  ID  Q001  Q002  Q003  Q004  Q005  Q006  Q007  Q008  Win
0  A     1     1     1     1     1     1     1     1  Yes
1  B     0     1     0     1     0     1     0     1   No
2  C     0     1     0     1     0     1     0     1   No
3  D     1     1     0     1     1     1     1     1  Yes
4  E     1     1     0     1     1     1     1     1  Yes
5  F     1     1     0     1     1     1     1     1  Yes
6  G     0     0     1     0     0     0     0     0   No
7  H     0     0     0     0     0     0     0     0   No
8  I     1     0     1     0     1     0     1     0   No

CodePudding user response:

Simply use:

import numpy as np

cols = ['Q001', 'Q002']
df['Win'] = np.where(df[cols].sum(axis=1).ge(2),
                     'Yes', 'No')

You can scale this up to any number of columns.

Output:

  ID  Q001  Q002  Q003  Q004  Q005  Q006  Q007  Q008  Win
0  A     1     1     1     1     1     1     1     1  Yes
1  B     0     1     0     1     0     1     0     1   No
2  C     0     1     0     1     0     1     0     1   No
3  D     1     1     0     1     1     1     1     1  Yes
4  E     1     1     0     1     1     1     1     1  Yes
5  F     1     1     0     1     1     1     1     1  Yes
6  G     0     0     1     0     0     0     0     0   No
7  H     0     0     0     0     0     0     0     0   No
8  I     1     0     1     0     1     0     1     0   No

CodePudding user response:

This works

df = pd.DataFrame({'ID':   ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
                   'Q001': [1, 0, 0, 1, 1, 1, 0, 0, 1],
                   'Q002': [1, 1, 1, 1, 1, 1, 0, 0, 0],
                   'Q003': [1, 0, 0, 0, 0, 0, 1, 0, 1],
                   'Q004': [1, 1, 1, 1, 1, 1, 0, 0, 0],
                   'Q005': [1, 0, 0, 1, 1, 1, 0, 0, 1],
                   'Q006': [1, 1, 1, 1, 1, 1, 0, 0, 0],
                   'Q007': [1, 0, 0, 1, 1, 1, 0, 0, 1],
                   'Q008': [1, 1, 1, 1, 1, 1, 0, 0, 0]})
# initialize Win column
df['Win'] = 'No'
# assign Yes to Win column rows where Q001 Q002>=2
df.loc[df['Q001']   df['Q002'] >= 2, 'Win'] = 'Yes'
print(df)
  ID  Q001  Q002  Q003  Q004  Q005  Q006  Q007  Q008  Win
0  A     1     1     1     1     1     1     1     1  Yes
1  B     0     1     0     1     0     1     0     1   No
2  C     0     1     0     1     0     1     0     1   No
3  D     1     1     0     1     1     1     1     1  Yes
4  E     1     1     0     1     1     1     1     1  Yes
5  F     1     1     0     1     1     1     1     1  Yes
6  G     0     0     1     0     0     0     0     0   No
7  H     0     0     0     0     0     0     0     0   No
8  I     1     0     1     0     1     0     1     0   No

CodePudding user response:

You could calculate the column as a Boolean series and replace the values with Yes and No (if you must):

df['Win'] = (df['Q001']   df['Q002'] >= 2).replace({False: 'No', True: 'Yes'})
  • Related