I have a simple table. It's format is:
Column 1: Integer
Column 2: String which consists of a bunch of words separated by spaces
An example of column 2 might be: "foo bar bax bux qux"
Given 2 rows:
(1, "foo bar baz bux qux")
(2, "foo baz bux qux bax")
I would like to generate the following data-structure:
{
foo: 2,
baz: 2,
bux: 1,
qux: 2,
bax: 1
}
There will be thousands of tags.
Essentially, I want to calculate the # of occurrences of each word across all rows. What might be a good way to do this?
I can either use sqlite or duckdb.
CodePudding user response:
Here's a great answer from the DuckDB people:
Two functions will do the trick for you! String_split, and unnest. Unnest is a special function that takes lists and creates a separate row for each element.
With lists_split_into_rows as (
select
col1,
unnest(string_split(col2, ' ')) as new_column
from my_table
)
Select
new_column,
count(*) as my_count
from lists_split_into_rows
Group by
New_column