Home > other >  Pandas: How to get lists of 5 most frequent value in column for each hour of the day?
Pandas: How to get lists of 5 most frequent value in column for each hour of the day?

Time:06-06

I have some pandas data.

Date A B
2021-05-06 12.00 18.5 0
2021-05-06 13.00 20.3 9.7%
2021-05-06 14.00 16.1 -20.7%
2022-05-06 10.00 50.4 4.0%
2022-05-06 11.00 52.4 3.9%

How to get lists of 5 most frequent value in column B for each hour of the day?

For example:

hour most common value
0.00 3%, 7%, -9%, 0.5%, 18%
12.00 6%, -3%, 4%
13.00 1, 4%
19.00 2%,-5%
23.00 11%, -4%

CodePudding user response:

Starting from:

df = pd.DataFrame(
    [
        {"Date": "2021-05-06 10.00", "A": 18.5, "B": "9.7%"},
        {"Date": "2021-05-06 10.00", "A": 20.3, "B": "9.7%"},
        {"Date": "2021-05-06 14.00", "A": 16.1, "B": "-20.7%"},
        {"Date": "2022-05-06 10.00", "A": 50.4, "B": "4.0%"},
        {"Date": "2022-05-06 11.00", "A": 52.4, "B": "3.9%"},
    ]
)
df

#               Date    A       B
#0  2021-05-06 10.00    18.5    9.7%
#1  2021-05-06 10.00    20.3    9.7%
#2  2021-05-06 14.00    16.1    -20.7%
#3  2022-05-06 10.00    50.4    4.0%
#4  2022-05-06 11.00    52.4    3.9%

You can do this:

# Convert to correct pandas time format
df["Date"] = pd.to_datetime(df["Date"],  format="%Y-%m-%d %H.%M")

# Apply transformations
x = (df.groupby([df.Date.dt.hour])["B"].value_counts()
       .loc[lambda x: x<5]
       .reset_index(name="c")
       .pivot(index="Date", columns="c", values="B"))

# Create column
x["most_freq"] = x.apply(
    lambda x: ", ".join([y for y in x.astype(str) if y != "nan"]), axis=1
)

result = x[['most_freq']].rename_axis(None, axis=1).reset_index()

This is result:

    Date    most_freq
0   10      4.0%, 9.7%
1   11      3.9%
2   14      -20.7%

CodePudding user response:

Above answers, didn't work, but helped me to find solution:

x = (df.groupby([df.Date.dt.hour])["B"].value_counts()  
   .loc[lambda x: x<1000]  
   .reset_index(name="c"))  

for j in range(0,24):  
   print(x.loc[x['Date']==j][:5])  

The solution is not perfect. Where x<1000 is max row for every hour, that's enough for my data. As a result, I received 24 pandas series of 5 most frequent value for each hour of the day.

  • Related