I have the following id, i would like to groupby id and then replace value X
with NaN
. My current df.
ID Date X other variables..
1 1/1/18 0.118758835
1 1/1/18 0.148103273
1 1/1/18 0.365541214
1 1/2/18 0.405002687
1 1/2/18 0.130580643
1 1/2/18 0.395113106
2 1/1/18 0.425580038
2 1/1/18 0.889677796
2 1/1/18 0.835311629
2 1/2/18 0.8375818
2 1/2/18 0.648162611
2 1/2/18 0.639060695
desired output
ID Date X other variables..
1 1/1/18 NaN
1 1/1/18 NaN
1 1/1/18 NaN
1 1/2/18 0.405002687
1 1/2/18 0.130580643
1 1/2/18 0.395113106
2 1/1/18 NaN
2 1/1/18 NaN
2 1/1/18 NaN
2 1/2/18 0.8375818
2 1/2/18 0.648162611
2 1/2/18 0.639060695
CodePudding user response:
You can call min
in groupby.transform
to get the earliest dates for each ID; then compare it with "Date" to get a boolean mask; finally use the mask to mask
earliest "X"s:
df['X'] = df['X'].mask(df.groupby('ID')['Date'].transform('min').eq(df['Date']))
Output:
ID Date X
0 1 1/1/18 NaN
1 1 1/1/18 NaN
2 1 1/1/18 NaN
3 1 1/2/18 0.405003
4 1 1/2/18 0.130581
5 1 1/2/18 0.395113
6 2 1/1/18 NaN
7 2 1/1/18 NaN
8 2 1/1/18 NaN
9 2 1/2/18 0.837582
10 2 1/2/18 0.648163
11 2 1/2/18 0.639061
CodePudding user response:
You can try
m = (df.groupby('ID')
.apply(lambda g: g['Date'].eq(pd.to_datetime(df['Date']).min().strftime('%-m/%-d/%y')))
.reset_index(drop=True))
df.loc[m, 'X'] = pd.NA
print(df)
index ID Date X Y Category
0 0 1 1/1/18 <NA> 0.954677 A
1 1 1 1/1/18 <NA> 0.976618 B
2 2 1 1/1/18 <NA> 0.551642 C
3 3 1 1/2/18 0.405003 0.343279 A
4 4 1 1/2/18 0.130581 0.144487 B
5 5 1 1/2/18 0.395113 0.113119 C
6 6 2 1/1/18 <NA> 0.725166 A
7 7 2 1/1/18 <NA> 0.386824 B
8 8 2 1/1/18 <NA> 0.363803 C
9 9 2 1/2/18 0.837582 0.769266 A
10 10 2 1/2/18 0.648163 0.075286 B
11 11 2 1/2/18 0.639061 0.791222 C