I'm still pretty new to pandas, and can't quite wrap my head around this one. I have a table that looks like this:
Timestamp | Sport | Place |
---|---|---|
10/1/2022 12:05:00 | basketball | place1 |
10/1/2022 4:05:00 | basketball | place1 |
10/2/2022 16:05 | basketball | place1 |
10/2/2022 9:00 | basketball | place2 |
10/2/2022 11:00 | football | place3 |
10/2/2022 11:15 | soccer | place1 |
10/3/2022 11:15 | baseball | place4 |
10/3/2022 11:20 | tennis | place5 |
10/3/2022 11:30 | tennis | place5 |
And I'd like to read have a python script that uses pandas to produce this:
Date | Sport | Place |
---|---|---|
1-Oct | 2x basketball | place1 |
2-Oct | 2x basketball, 1x soccer | place1 |
1x basketball | place2 | |
1x football | place3 | |
3-Oct | 1x baseball | place4 |
2x tennis | place5 |
I can't quite figure out how to do the groupby correctly and then to concatenate the columns.
(And i'm going to want to do this for many similar tables, so i want to use a script.)
This gets me very close:
val_cnts = df.groupby(['date', 'Place'])['Sport'].value_counts()
But this returns a multi-indexed series of just the sport counts, so it looks like I need to turn some of the indices into columns, and then do some concatenations. When I do a .reset_index() on the Place and Sport indices though, that still doesn't work because the groupings don't seem right anymore.
CodePudding user response:
Try:
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
df["Date"] = df["Timestamp"].dt.strftime("%-d-%b")
df = (
df.groupby(["Date", "Place"])["Sport"]
.value_counts()
.to_frame(name="x")
.reset_index(level=2)
)
df["Sport"] = df["x"].astype(str) "x " df["Sport"]
df = df.groupby(level=[0, 1])["Sport"].agg(", ".join).reset_index()
print(df)
Prints:
Date Place Sport
0 1-Oct place1 2x basketball
1 2-Oct place1 1x basketball, 1x soccer
2 2-Oct place2 1x basketball
3 2-Oct place3 1x football
4 3-Oct place4 1x baseball
5 3-Oct place5 2x tennis