SQL query to replace multiple occurrences with the most frequent ones-CodePudding

I have a products table, which has the product description in two languages, one in English and one in an alternate language.
Let's say:

Product_Desc	Product_Desc_Alt
A	A1
A	A2
A	A1
A	A3
B	B1
B	B2
B	B2
C	C1

If I do a GROUP BY statement, there are multiple alternate language occurrences for the same product. So, let's say:

SELECT Product_Desc, COUNT(DISTINCT Product_Desc_Alt) AS CNT
FROM products
GROUP BY Product_Desc
ORDER BY CNT DESC

Product_Desc	CNT
A	3
B	2
C	1

I would like to replace the Product_Desc_Alt occurrences with the most frequent ones, so for example I would like the output to be:

Product_Desc	Product_Desc_Alt
A	A1
A	A1
A	A1
A	A1
B	B2
B	B2
B	B2
C	C1

Obviously, if a product has only one alter lang description, just keep that one.

There may be lots of ways to do that, but I can't think of one.

I am using Azure Databricks so this could also happen with PySpark, but I am interested in doing this the SQL way.
Thanks a lot!

CodePudding user response：

Is this what you mean?

UPDATE products
SET Product_Desc_Alt = (
  SELECT TOP 1 Product_Desc_Alt
  FROM   products P2
  WHERE  P2.Product_Desc = products.Product_Desc
  GROUP BY Product_Desc_Alt
  ORDER BY COUNT(*) DESC
)