Pandas: How to get lists of 5 most frequent value in column for each hour of the day?-CodePudding

I have some pandas data.

Date	A	B
2021-05-06 12.00	18.5	0
2021-05-06 13.00	20.3	9.7%
2021-05-06 14.00	16.1	-20.7%
2022-05-06 10.00	50.4	4.0%
2022-05-06 11.00	52.4	3.9%

How to get lists of 5 most frequent value in column B for each hour of the day?

For example:

hour	most common value
0.00	3%, 7%, -9%, 0.5%, 18%
12.00	6%, -3%, 4%
13.00	1, 4%
19.00	2%,-5%
23.00	11%, -4%

CodePudding user response：

Starting from:

df = pd.DataFrame(
    [
        {"Date": "2021-05-06 10.00", "A": 18.5, "B": "9.7%"},
        {"Date": "2021-05-06 10.00", "A": 20.3, "B": "9.7%"},
        {"Date": "2021-05-06 14.00", "A": 16.1, "B": "-20.7%"},
        {"Date": "2022-05-06 10.00", "A": 50.4, "B": "4.0%"},
        {"Date": "2022-05-06 11.00", "A": 52.4, "B": "3.9%"},
    ]
)
df

#               Date    A       B
#0  2021-05-06 10.00    18.5    9.7%
#1  2021-05-06 10.00    20.3    9.7%
#2  2021-05-06 14.00    16.1    -20.7%
#3  2022-05-06 10.00    50.4    4.0%
#4  2022-05-06 11.00    52.4    3.9%

You can do this:

# Convert to correct pandas time format
df["Date"] = pd.to_datetime(df["Date"],  format="%Y-%m-%d %H.%M")

# Apply transformations
x = (df.groupby([df.Date.dt.hour])["B"].value_counts()
       .loc[lambda x: x<5]
       .reset_index(name="c")
       .pivot(index="Date", columns="c", values="B"))

# Create column
x["most_freq"] = x.apply(
    lambda x: ", ".join([y for y in x.astype(str) if y != "nan"]), axis=1
)

result = x[['most_freq']].rename_axis(None, axis=1).reset_index()

This is result:

    Date    most_freq
0   10      4.0%, 9.7%
1   11      3.9%
2   14      -20.7%

CodePudding user response：

Above answers, didn't work, but helped me to find solution:

x = (df.groupby([df.Date.dt.hour])["B"].value_counts()  
   .loc[lambda x: x<1000]  
   .reset_index(name="c"))  

for j in range(0,24):  
   print(x.loc[x['Date']==j][:5])

The solution is not perfect. Where x<1000 is max row for every hour, that's enough for my data. As a result, I received 24 pandas series of 5 most frequent value for each hour of the day.