I have a dataset that has the same layout as the one below. I want to categorize where each person went in their car for a given month.
Date | Visitor | Car Name | Location |
---|---|---|---|
6-2022 | Michael | Chrysler | Work |
6-2022 | Jim | Subaru | Home |
6-2022 | Jim | Subaru | Work |
6-2022 | Michael | Chrysler | Karate |
7-2022 | Michael | Chrysler | Work |
7-2022 | Jim | Subaru | Home |
7-2022 | Jim | Subaru | Work |
The solution would look something like this:
Date | Visitor | Car Name | Location |
---|---|---|---|
6-2022 | Michael | Chrysler | Work & Karate |
6-2022 | Jim | Subaru | Work & Home |
7-2022 | Michael | Chrysler | Work |
7-2022 | Jim | Subaru | Work & Home |
I HAVE a solution but it scales linearly with time as the number of rows increases. The fastest I have gotten it to run over ~32,000 rows is ~15 minutes.
CodePudding user response:
You could try as follows:
- Use
df.groupby
onDate, Visitor, Car Name
, and applyjoin
to the values ofLocation
. - To change the last
,
delimiter to an ampersand, you could useSeries.replace
with a regex pattern.
res = df.groupby(['Date','Visitor','Car Name'],
as_index=False, sort=False)['Location'].agg(', '.join)
# changing each last `,` delimiter into an `&`
res['Location'] = res['Location'].replace(r',\s(?=[^,]*$)',' & ', regex=True)
print(res)
Date Visitor Car Name Location
0 6-2022 Michael Chrysler Work & Karate
1 6-2022 Jim Subaru Home & Work
2 7-2022 Michael Chrysler Work
3 7-2022 Jim Subaru Home & Work
If you expect duplicates per group (and want to get rid of them), you could use .agg(lambda x: ', '.join(set(x)))
instead of .agg(', '.join)
. Mind you, this will affect the order, not sure if this is a problem. If so, there are of course also ways to get rid of duplicates in a list, while maintained order.
CodePudding user response:
Try this:
df.groupby(['Date', 'Visitor', 'Car Name'], as_index=False)['Location'].agg(' & '.join)
Output:
Date Visitor Car Name Location
0 6-2022 Jim Subaru Home & Work
1 6-2022 Michael Chrysler Work & Karate
2 7-2022 Jim Subaru Home & Work
3 7-2022 Michael Chrysler Work