How can I count frequency of all values in several csv columns-CodePudding

I have several csv columns that store lottery numbers and some other info, like date when the number was drawn. I need to get a dictionary <number drawn, number of times this number occurred throughout all the columns> as my output. So far I have been able to print number of occurrences in each column individually.

# Import libraries
import pandas as pd 
from IPython.display import display

# Turn csv file into a pandas dataframe
df = pd.read_csv("LOTTOMAX.csv")

# Only select columns that I'm interested in. Csv file contains additional useless info.
selection = df[['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'NUMBER DRAWN 7']]

# Loop over columns and apply value_counts(). Output to terminal.
for col in selection.columns:
    # I have included this to make terminal output more readable.
    print('-' * 40   col   '-' * 40 , end='\n')
    display(selection[col].value_counts().to_string())

CodePudding user response：

I did this project for fun. Wanted to replicate a feauture on bclc website. Perhaps this will help someone.

# Import libraries
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt

# Read csv file
df = pd.read_csv("LOTTOMAX.csv") #csv file https://www.playnow.com/resources/documents/downloadable-numbers/LOTTOMAX.zip

cols = ['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'NUMBER DRAWN 7']

results = []

# Add data to a list
for i in cols:
    results  = df[i].tolist()

# Count occurrences 
occurr = Counter(results)

# Display histogram
plt.bar(list(occurr.keys()), occurr.values(), color='g')
plt.xlabel("Numbers Drawn")
plt.ylabel("Frequency")
plt.show()

This solution is imperfect, but it works.