Home > Software design >  get the most common value for each column
get the most common value for each column

Time:05-13

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.

For example, with the following table contents:

cost columnA columnB
250 Foo Bar
200 Foo Bar
150 Bar Bar
250 Foo Bar

The result would need to be:

total_cost columnA_dominant columnB_dominant columnA_value columnB_value
850 Foo Bar 700 850

Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.

This is my current SQL:

SELECT 
    SUM(`cost`) AS `total_cost`,
    COUNT(`columnA`) AS `columnA_dominant` 
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC

UPDATE: Thanks to @Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:

SELECT 
    -- Retrieve total cost.
    SUM(`cost`) AS `total_cost`,
    -- Get dominant values.
    (
        SELECT `columnA`
        FROM `table`
        GROUP BY `columnA`
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) AS `columnA_dominant`,
    (
        SELECT `columnB`
        FROM `table`
        GROUP BY `columnB`
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) AS `columnB_dominant`
FROM `table`

However, I'm still having issues figuring out how to calculate the respective values.

CodePudding user response:

try this query

select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage

so if you want to get the percent and dominant value you must make their own query like this

select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc

then you can join with the sum query to get all value you want

hope this can help you

CodePudding user response:

You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count

SELECT 
    SUM(cost),
    (
        SELECT tt.columnA
        FROM T tt
        GROUP BY tt.columnA
        ORDER BY COUNT(*) DESC
        LIMIT 1
    )  AS columnA_dominant,
    (
        SELECT tt.columnB
        FROM T tt
        GROUP BY tt.columnB
        ORDER BY COUNT(*) DESC
        LIMIT 1
    )  AS columnB_dominant,
    (
        SELECT COUNT(*)
        FROM T tt
        GROUP BY tt.columnA
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) / COUNT(*) AS columnA_percentage,
    (
        SELECT COUNT(*)
        FROM T tt
        GROUP BY tt.columnB
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) / COUNT(*) AS columnB_percentage
FROM T t1

If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery

SELECT SUM(cost) OVER(),
       FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
       FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
       FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
       FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
  SELECT *,
         COUNT(*) OVER (PARTITION BY columnA) counter1,
         COUNT(*) OVER (PARTITION BY columnB) counter2  
  FROM T
) t1
LIMIT 1

sqlfiddle

  • Related