My df
id date dummy
0 A 2019Q1 1
1 A 2019Q2 0
2 A 2019Q3 0
3 B 2019Q1 1
4 B 2019Q2 1
5 B 2019Q3 0
How can I groupby id and then convert the earliest value to NaN?
output
id date dummy
0 A 2019Q1 NaN
1 A 2019Q2 0
2 A 2019Q3 0
3 B 2019Q1 NaN
4 B 2019Q2 1
5 B 2019Q3 0
CodePudding user response:
Use a boolean mask (assuming each rows are already sorted for each group):
df.loc[~df['id'].duplicated(), 'dummy'] = np.nan
print(df)
# Output
id date dummy
0 A 2019Q1 NaN
1 A 2019Q2 0.0
2 A 2019Q3 0.0
3 B 2019Q1 NaN
4 B 2019Q2 1.0
5 B 2019Q3 0.0
Or:
df.loc[df.groupby('id').cumcount().eq(0), 'dummy'] = np.nan
print(df)
# Output
id date dummy
0 A 2019Q1 NaN
1 A 2019Q2 0.0
2 A 2019Q3 0.0
3 B 2019Q1 NaN
4 B 2019Q2 1.0
5 B 2019Q3 0.0
CodePudding user response:
import pandas as pd
indices = df.reset_index().groupby("id")["index"].first().to_list()
df.loc[indices,'dummy'] = np.NaN