Home > Blockchain >  Get the most common word in a MySQL table using Python
Get the most common word in a MySQL table using Python

Time:12-10

I have a table containing full of movie genre, like this:

id | genre
--- ----------------------------
1  | Drama, Romance, War
2  | Drama, Musical, Romance
3  | Adventure, Biography, Drama

Im looking for a way to get the most common word in the whole genre column and return it to a variable for further step in python.

I'm new to Python so I really don't know how to do it. Currently, I have these lines to connect to the database but don't know the way to get the most common word mentioned above.


    conn = mysql.connect()
    cursor = conn.cursor()
    most_common_word = cursor.execute()
    cursor.close()
    conn.close()

CodePudding user response:

First you need get list of words in each column. i.e create another table like

genre_words(genre_id bigint, word varchar(50))

For clues how to do that you may check this question:

SQL split values to multiple rows

You can do that as temporary table if you wish or use transaction and rollback. Which one to choose depend of your data size and PC on which DB running.

After that query will be really simple

select count(*) as c, word from genre_word group by word order by count(*) desc limit 1;

You also can do it using python, but if so it will not be a MySQL question at all. Need read table, create simple list of word counter. If it new, add it, if exist - increase counter.

CodePudding user response:

To get the most common word in the genre column of your table, you can use the following steps:

  1. Execute a SQL query to select the genre column from the table. This will return a list of all the genre strings in the table.

    SELECT genre FROM movies
    
  2. Use the split() method to split each genre string into a list of words.

    SELECT split(genre, ', ') FROM movies
    
  3. Use the flatten() function to flatten the list of lists into a single list of words.

    SELECT flatten(split(genre, ', ')) FROM movies
    
  4. Use the count() function to count the number of times each word appears in the list.

    SELECT word, count(word) as frequency
    FROM (SELECT flatten(split(genre, ', ')) as word FROM movies) as words
    GROUP BY word
    
  5. Use the order by clause to sort the words by their frequency in descending order.

    SELECT word, count(word) as frequency
    FROM (SELECT flatten(split(genre, ', ')) as word FROM movies) as words
    GROUP BY word
    ORDER BY frequency DESC
    
  6. Use the limit clause to limit the number of results to 1, to get the most common word.

    SELECT word, count(word) as frequency
    FROM (SELECT flatten(split(genre, ', ')) as word FROM movies) as words
    GROUP BY word
    ORDER BY frequency DESC
    LIMIT 1
    

To execute these queries and get the results in Python, you can use the cursor.execute() method and the cursor.fetchone() method as follows:

    conn = mysql.connect()
    cursor = conn.cursor()

    # Execute the SQL query
    cursor.execute("SELECT word, count(word) as frequency FROM (SELECT flatten(split(genre, ', ')) as word FROM movies) as words GROUP BY word ORDER BY frequency DESC LIMIT 1")

    # Get the result from the query
    most_common_word = cursor.fetchone()

    cursor.close()
    conn.close()

This code will execute the SQL query and return the most common word in the genre column as a tuple containing the word and its frequency. You can then access the word using the tuple index, like this:

    most_common_word = cursor.fetchone()
    print(most_common_word[0]) # Prints the most common word

CodePudding user response:

from collections import Counter

Connect to database and get rows from table

rows = ...

Create a list to hold all of the genres

genres = []

Loop through each row and split the genre string by the comma character

to create a list of individual genres

for row in rows: genre_list = row['genre'].split(',') genres.extend(genre_list)

Use a Counter to count the number of occurrences of each genre

genre_counts = Counter(genres)

Get the most common genre

most_common_genre = genre_counts.most_common(1)

Print the most common genre

print(most_common_genre)

  • Related