I have a table with ids and locations they have been to.
id | Location |
---|---|
1 | Maryland |
1 | Iowa |
2 | Maryland |
2 | Texas |
3 | Georgia |
3 | Iowa |
4 | Maryland |
4 | Iowa |
5 | Maryland |
5 | Iowa |
5 | Texas |
I'd like to perform a query that would allow me to get the number of ids per combination.
In this example table, the output would be -
Maryland, Iowa - 2
Maryland, Texas - 1
Georgia, Iowa - 1
Maryland, Iowa, Texas - 1
My original thought was to add the ASCII values of the distinct locations of each id, and see how many have each value, and what the combinations are that correspond to the value. I was not able to do that as SQL server would not let me cast an nvarchar as a numeric data type. Is there any other way I could use SQL to get the number of devices per combination? Using python to get the number of ids per combination is also acceptable, however, SQL is preferred.
CodePudding user response:
If you want to solve this in SQL and you are running SQL Server 2017 or later, you can use a CTE to aggregate the locations for each id
using STRING_AGG
, and then count the occurrences of each aggregated string:
WITH all_locations AS (
SELECT STRING_AGG(Location, ', ') WITHIN GROUP (ORDER BY Location) AS aloc
FROM locations
GROUP BY id
)
SELECT aloc, COUNT(*) AS cnt
FROM all_locations
GROUP BY aloc
ORDER BY cnt, aloc
Output:
aloc cnt
Georgia, Iowa 1
Iowa, Maryland, Texas 1
Maryland, Texas 1
Iowa, Maryland 2
Note that I have applied an ordering to the STRING_AGG
to ensure that someone who visits Maryland and then Iowa is treated the same way as someone who visits Iowa and then Maryland. If this is not the desired behaviour, simply delete the WITHIN GROUP
clause.
CodePudding user response:
Use groupby
agg
value_counts
:
new_df = df.groupby('id')['Location'].agg(list).str.join(', ').value_counts().reset_index()
Output:
>>> new_df
index Location
0 Maryland, Iowa 2
1 Maryland, Texas 1
2 Georgia, Iowa 1
3 Maryland, Iowa, Texas 1
CodePudding user response:
new_df index Location 0 Maryland, Iowa 2 1 Maryland, Texas 1 2 Georgia, Iowa 1 3 Maryland, Iowa, Texas 1