I have some pandas data.
Date | A | B |
---|---|---|
2021-05-06 12.00 | 18.5 | 0 |
2021-05-06 13.00 | 20.3 | 9.7% |
2021-05-06 14.00 | 16.1 | -20.7% |
2022-05-06 10.00 | 50.4 | 4.0% |
2022-05-06 11.00 | 52.4 | 3.9% |
How to get lists of 5 most frequent value in column B for each hour of the day?
For example:
hour | most common value |
---|---|
0.00 | 3%, 7%, -9%, 0.5%, 18% |
12.00 | 6%, -3%, 4% |
13.00 | 1, 4% |
19.00 | 2%,-5% |
23.00 | 11%, -4% |
CodePudding user response:
Starting from:
df = pd.DataFrame(
[
{"Date": "2021-05-06 10.00", "A": 18.5, "B": "9.7%"},
{"Date": "2021-05-06 10.00", "A": 20.3, "B": "9.7%"},
{"Date": "2021-05-06 14.00", "A": 16.1, "B": "-20.7%"},
{"Date": "2022-05-06 10.00", "A": 50.4, "B": "4.0%"},
{"Date": "2022-05-06 11.00", "A": 52.4, "B": "3.9%"},
]
)
df
# Date A B
#0 2021-05-06 10.00 18.5 9.7%
#1 2021-05-06 10.00 20.3 9.7%
#2 2021-05-06 14.00 16.1 -20.7%
#3 2022-05-06 10.00 50.4 4.0%
#4 2022-05-06 11.00 52.4 3.9%
You can do this:
# Convert to correct pandas time format
df["Date"] = pd.to_datetime(df["Date"], format="%Y-%m-%d %H.%M")
# Apply transformations
x = (df.groupby([df.Date.dt.hour])["B"].value_counts()
.loc[lambda x: x<5]
.reset_index(name="c")
.pivot(index="Date", columns="c", values="B"))
# Create column
x["most_freq"] = x.apply(
lambda x: ", ".join([y for y in x.astype(str) if y != "nan"]), axis=1
)
result = x[['most_freq']].rename_axis(None, axis=1).reset_index()
This is result
:
Date most_freq
0 10 4.0%, 9.7%
1 11 3.9%
2 14 -20.7%
CodePudding user response:
Above answers, didn't work, but helped me to find solution:
x = (df.groupby([df.Date.dt.hour])["B"].value_counts()
.loc[lambda x: x<1000]
.reset_index(name="c"))
for j in range(0,24):
print(x.loc[x['Date']==j][:5])
The solution is not perfect. Where x<1000 is max row for every hour, that's enough for my data. As a result, I received 24 pandas series of 5 most frequent value for each hour of the day.