I have this df:
ID | A |
---|---|
1 | 40 |
1 | 0 |
2 | 60 |
2 | 60 |
3 | 0 |
3 | 30 |
I need to add a new column to indicate whether each id has two values in a new column like the below:
ID | A | B |
---|---|---|
1 | 40 | 1 |
1 | 0 | 1 |
2 | 60 | 2 |
2 | 60 | 2 |
3 | 0 | 1 |
3 | 30 | 1 |
I tried this but it is not returning the right values
df= pd.concat(g for _, g in df.groupby('ID') if len(g) > 1)
df= df[df['A'] != 0]
df= pd.concat(g for _, g in df.groupby('A') if len(g) > 1)
df['B'] = 2
CodePudding user response:
using groupby to take a count of values greater than 0 for each ID, and then merging back with the original df
df.merge(df[df['A'] > 0].groupby('ID').size().reset_index(name='count'), on='ID')
ID A count
0 1 40 1
1 1 0 1
2 2 60 2
3 2 60 2
4 3 0 1
5 3 30 1
CodePudding user response:
Another option, no need to merge, let pandas handle alignment using groupby and transform.
df['count'] = df.assign(count=df['A'] != 0).groupby('ID')['count'].transform('sum')
df
Output:
ID A count
0 1 40 1
1 1 0 1
2 2 60 2
3 2 60 2
4 3 0 1
5 3 30 1