Count how often each name appears in CSV file and plot it (Python)-CodePudding

I have a csv file with different columns, one of which is called 'names' and contains many different first names. I want to count how often each name appears in the csv file and after that I want to plot the 10 most common names (in a bar graph or something similar)

CodePudding user response：

The best way to achieve this is by creating a document-term matrix with the CountVectorizer library.

You must import your .csv file with pandas library

import pandas as pd

df = pd.read_csv('./your_table.csv', encoding=DATASET_ENCODING, usecols=DATASET_COLUMNS)

After that use the CountVectorizer to create a document term matrix.

from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(stop_words='english')
your_table_cv = cv.fit_transform(df.your_column)
your_dtm = pd.DataFrame(your_table_cv.toarray(), columns=cv.get_feature_names_out())

CodePudding user response：

For the dataframe I use value_count with sort = True https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html?highlight=value_count For plot I use matplotlib.pyplot

Here is an example: enter image description here