I have a dataset that I need to query from and aggregate the selected records based on a string similarity.
For example a simple SELECT, COUNT
query can return a set of records like:
NAME | COUNT |
---|---|
CAR | 6 |
CAR 1 | 12 |
CAR 22 | 2 |
CAR 45 | 9 |
Van 2 | 4 |
Van 22 | 5 |
Van 12 | 16 |
UNKNOWN | 6 |
I want the result to be aggregate by a substring as:
NAME | COUNT |
---|---|
CAR | 29 |
VAN | 25 |
UNKOWN | 6 |
A sample of the current query I have is:
SELECT "vehicle_name" AS "NAME", count(*) AS "Count"
FROM "vehicles"
WHERE ("registered" = 'true'GROUP BY "public"."players"."device_name"
ORDER BY "vehicle_name" ASC
CodePudding user response:
Please try like this
SELECT
SUM(COUNT),
SUBSTRING(vehicle_name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z.''0-9]%',LTRIM(RTRIM(vehicle_name))),0)-1,LEN(vehicle_name))) AS Value
FROM
vehicles
GROUP BY SUBSTRING(vehicle_name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z.''0-9]%',LTRIM(RTRIM(vehicle_name))),0)-1,LEN(vehicle_name)))
CodePudding user response:
You can use the metabase regexextract
function to solve your problem, matching the first word by exploiting the start of string symbol.
SELECT REGEXEXTRACT([NAME], "^[A-Za-z] ") AS "NAME",
SUM([COUNT]) AS "COUNT"
FROM vehicles
GROUP BY REGEXEXTRACT([NAME], "^[A-Za-z] ")
Does it solve your problem?