Home > Net >  How to Count Duplicated Names in a Varchar/Text Column in MySQL/Any SQL
How to Count Duplicated Names in a Varchar/Text Column in MySQL/Any SQL

Time:02-26

So, this is the situation: I have a CSV file who looks like this:

show_id   title      cast
1         Batman     Robert Pattinson, Collin Farrel, Zoë Kravitz
2         Twilight   Robert Pattinson, Kristen Stewart
3         Ava        Jessica Chastain, Collin Farrel

What I need to do is open this CSV in a Python function, do some stuff to fix white spaces and that.

Then I need to upload that in a SQL database, (whatever I want, but I choose MySQL), no problem with that.

PROBLEM

My main issue is that then I need (because mi challenge said that) to create a Query to count how many times an actor appears in all the movies in the list. So, in this case the query should display something like this:

Actor               Ammount_of_movies
Robert Pattinson    2
Collin Farrel       2
Zoë Kravitz         1
Kristen Stewart     1
Jessica Chastain    1

As you can see I don't have a name to search with LIKE or CONTAINS. So, how can I do that? Because, in the CSV, the list of actors per movie has, well, more than 1 actor, and I save them in a varchar or text type in the database, so each row has more than one actor.

Or should I create another table with actors and foreing keys to movies? Or is something than I can't do in MySQL but in other SQL is possible?

CodePudding user response:

If you look for something performance efficient you should rather split the data (create 3 tables in total - movies, actors, casts) and connect actors with movies using the casts, then write simple sql with some joins like:

Select actors.name as Actor, count(movies.title) as Amount_of_movies from actors,
inner join cast on cast.actor_id = actors.actor_id
inner join movies on movies.movie_id = cast.movie_id;

You are able to do it another way using https://sebhastian.com/mysql-split-string/ or using psql / plsql and stored procedure. I would just split the data if it's possible.

Cheers.

CodePudding user response:

You need to normalize your data and you might use:

select t1.cast as Actor ,count(t1.cast) as Ammount_of_movies
from test_tbl t1
inner join (select title,cast
            from test_tbl 
            group by title,cast
            ) t2 on t1.title=t2.title and t1.cast=t2.cast
group by t1.cast
order by Ammount_of_movies desc;

Demo

  • Related