Python create a new column and populates it based on two other columns values-CodePudding

I have this df:

ID	A
1	40
1	0
2	60
2	60
3	0
3	30

I need to add a new column to indicate whether each id has two values in a new column like the below:

ID	A	B
1	40	1
1	0	1
2	60	2
2	60	2
3	0	1
3	30	1

I tried this but it is not returning the right values

df= pd.concat(g for _, g in df.groupby('ID') if len(g) > 1)
df= df[df['A'] != 0]
df= pd.concat(g for _, g in df.groupby('A') if len(g) > 1)
df['B'] = 2

CodePudding user response：

using groupby to take a count of values greater than 0 for each ID, and then merging back with the original df

df.merge(df[df['A'] > 0].groupby('ID').size().reset_index(name='count'), on='ID')

    ID  A   count
0   1   40  1
1   1   0   1
2   2   60  2
3   2   60  2
4   3   0   1
5   3   30  1

CodePudding user response：

Another option, no need to merge, let pandas handle alignment using groupby and transform.

df['count'] = df.assign(count=df['A'] != 0).groupby('ID')['count'].transform('sum')
df

Output:

   ID   A  count
0   1  40      1
1   1   0      1
2   2  60      2
3   2  60      2
4   3   0      1
5   3  30      1