I have a Dataframe with 3 columns:
ID Round Investor
X 1 A
X 2 A
X 2 B
X 3 A
Y 1 A
Y 1 B
Y 1 C
Y 2 B
Y 2 D
And want to count the number of unique investors for each round for each ID. But I dont want it to recount the investor if it has been in the previous round. The code I am using is:
print(df.groupby(['ID', 'Round'])['Investor'].nunique())
Which results in:
ID Round Unique Investor
X 1 1
2 2
2 2
3 1
Y 1 3
1 3
1 3
2 2
2 2
But I dont what it to count when an investor have invested in a earlier round for the same ID:
ID Round Unique Investor
X 1 1
2 1
2 1
3 0
Y 1 3
1 3
1 3
2 1
2 1
Any help is greatly appreciated!
CodePudding user response:
You can define a helper column Investor2
which is grouped under ID
and dropped duplicates within the same ID
with Series.drop_duplicates
Then, group by ID
and Round
as you did before on this Investor2
column with .transform()
and nunique
, as follows:
df['Unique Investor'] = (
df.assign(Investor2=df.groupby('ID')['Investor'].apply(pd.Series.drop_duplicates).droplevel(0))
.groupby(['ID', 'Round'])['Investor2'].transform('nunique')
)
Result:
print(df)
ID Round Investor Unique Investor
0 X 1 A 1
1 X 2 A 1
2 X 2 B 1
3 X 3 A 0
4 Y 1 A 3
5 Y 1 B 3
6 Y 1 C 3
7 Y 2 B 1
8 Y 2 D 1
CodePudding user response:
One possible solution is to drop the duplicates, based on 'ID' and Investor, groupby ID and Round to get number of uniques, and merge the result back to the main dataframe:
dups = ['ID', 'Investor']
group = ['ID', 'Round']
mapping = (df.drop_duplicates(subset = dups)
.groupby(group)
.Investor
.nunique()
)
(df.filter(group)
.merge(mapping, left_on = group,
right_index = True, how = 'left')
.fillna(0, downcast='infer')
)
ID Round Investor
0 X 1 1
1 X 2 1
2 X 2 1
3 X 3 0
4 Y 1 3
5 Y 1 3
6 Y 1 3
7 Y 2 1
8 Y 2 1