SQL aggregate similar records-CodePudding

I have a dataset that I need to query from and aggregate the selected records based on a string similarity. For example a simple SELECT, COUNT query can return a set of records like:

NAME	COUNT
CAR	6
CAR 1	12
CAR 22	2
CAR 45	9
Van 2	4
Van 22	5
Van 12	16
UNKNOWN	6

I want the result to be aggregate by a substring as:

NAME	COUNT
CAR	29
VAN	25
UNKOWN	6

A sample of the current query I have is:

SELECT "vehicle_name" AS "NAME", count(*) AS "Count"
FROM "vehicles"
WHERE ("registered" = 'true'GROUP BY "public"."players"."device_name" 
ORDER BY "vehicle_name" ASC

CodePudding user response：

Please try like this

SELECT 
SUM(COUNT),
SUBSTRING(vehicle_name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z.''0-9]%',LTRIM(RTRIM(vehicle_name))),0)-1,LEN(vehicle_name))) AS Value
FROM
vehicles
GROUP BY SUBSTRING(vehicle_name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z.''0-9]%',LTRIM(RTRIM(vehicle_name))),0)-1,LEN(vehicle_name)))

CodePudding user response：

You can use the metabase regexextract function to solve your problem, matching the first word by exploiting the start of string symbol.

SELECT REGEXEXTRACT([NAME], "^[A-Za-z] ") AS "NAME", 
       SUM([COUNT])                       AS "COUNT"
FROM vehicles
GROUP BY REGEXEXTRACT([NAME], "^[A-Za-z] ")

Does it solve your problem?