Home > Net >  Python create a new column and populates it based on two other columns values
Python create a new column and populates it based on two other columns values

Time:06-09

I have this df:

ID A
1 40
1 0
2 60
2 60
3 0
3 30

I need to add a new column to indicate whether each id has two values in a new column like the below:

ID A B
1 40 1
1 0 1
2 60 2
2 60 2
3 0 1
3 30 1

I tried this but it is not returning the right values

df= pd.concat(g for _, g in df.groupby('ID') if len(g) > 1)
df= df[df['A'] != 0]
df= pd.concat(g for _, g in df.groupby('A') if len(g) > 1)
df['B'] = 2

CodePudding user response:

using groupby to take a count of values greater than 0 for each ID, and then merging back with the original df

df.merge(df[df['A'] > 0].groupby('ID').size().reset_index(name='count'), on='ID')
    ID  A   count
0   1   40  1
1   1   0   1
2   2   60  2
3   2   60  2
4   3   0   1
5   3   30  1

CodePudding user response:

Another option, no need to merge, let pandas handle alignment using groupby and transform.

df['count'] = df.assign(count=df['A'] != 0).groupby('ID')['count'].transform('sum')
df

Output:

   ID   A  count
0   1  40      1
1   1   0      1
2   2  60      2
3   2  60      2
4   3   0      1
5   3  30      1
  • Related