I have the following df. As you can see there may be multiple prices per token from the same day. So I want to take the median day price.
token_address date close
0 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 2021-08-18 1.078874e-07
1 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 2021-08-18 1.088589e-07
2 0xa80f2c8f61c56546001f5fc2eb8d6e4e72c45d4c 2021-08-18 2.835567e-07
3 0xbbc2ae13b23d715c30720f079fcd9b4a74093505 2021-08-18 1.633322e-06
4 0xd2877702675e6ceb975b4a1dff9fb7baf4c91ea9 2021-08-18 3.201045e-06
...
So I did the following
df['date'] = pd.to_datetime(df['date'])
df['median'] = df.set_index('date').groupby('token_address')['close'].median()
but somehow i get NaNs?
token_address date close median
0 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 2021-08-18 1.078874e-07 NaN
1 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 2021-08-18 1.088589e-07 NaN
2 0xa80f2c8f61c56546001f5fc2eb8d6e4e72c45d4c 2021-08-18 2.835567e-07 NaN
3 0xbbc2ae13b23d715c30720f079fcd9b4a74093505 2021-08-18 1.633322e-06 NaN
4 0xd2877702675e6ceb975b4a1dff9fb7baf4c91ea9 2021-08-18 3.201045e-06 NaN
CodePudding user response:
try this out
in your attempt above, the groupby results (having fewer rows) don't match with the count of df rows, hence the error.
Transform applies the grouped value to all rows that makes up the group
df['median'] = df.groupby(['token_address','date'])['close'].transform('median')
df
token_address date close median
0 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 2021-08-18 1.078874e-07 1.083732e-07
1 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 2021-08-18 1.088589e-07 1.083732e-07
2 0xa80f2c8f61c56546001f5fc2eb8d6e4e72c45d4c 2021-08-18 2.835567e-07 2.835567e-07
3 0xbbc2ae13b23d715c30720f079fcd9b4a74093505 2021-08-18 1.633322e-06 1.633322e-06
4 0xd2877702675e6ceb975b4a1dff9fb7baf4c91ea9 2021-08-18 3.201045e-06 3.201045e-06