I have several csv columns that store lottery numbers and some other info, like date when the number was drawn. I need to get a dictionary <number drawn, number of times this number occurred throughout all the columns> as my output. So far I have been able to print number of occurrences in each column individually.
# Import libraries
import pandas as pd
from IPython.display import display
# Turn csv file into a pandas dataframe
df = pd.read_csv("LOTTOMAX.csv")
# Only select columns that I'm interested in. Csv file contains additional useless info.
selection = df[['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'NUMBER DRAWN 7']]
# Loop over columns and apply value_counts(). Output to terminal.
for col in selection.columns:
# I have included this to make terminal output more readable.
print('-' * 40 col '-' * 40 , end='\n')
display(selection[col].value_counts().to_string())
CodePudding user response:
I did this project for fun. Wanted to replicate a feauture on bclc website. Perhaps this will help someone.
# Import libraries
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
# Read csv file
df = pd.read_csv("LOTTOMAX.csv") #csv file https://www.playnow.com/resources/documents/downloadable-numbers/LOTTOMAX.zip
cols = ['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'NUMBER DRAWN 7']
results = []
# Add data to a list
for i in cols:
results = df[i].tolist()
# Count occurrences
occurr = Counter(results)
# Display histogram
plt.bar(list(occurr.keys()), occurr.values(), color='g')
plt.xlabel("Numbers Drawn")
plt.ylabel("Frequency")
plt.show()
This solution is imperfect, but it works.