I'm working on a project that counts the number of occurrences of each item/letter in a dataframe (monthly data). Sample dataframe looks as follows:
df = {'Jan 2018': [a, a, b, c, d, d], 'Feb 2018': [a, b , d, e, e, f}
I am fairly new to Python so bear with me - I have managed to work out how to find the value of an individual item in a chosen month using the following
count_df = df['Jan 2018'].value_counts()['a']
which gives the output 2 as an int. My question is how can I create a large dataframe that automatically counts all instances of a value for each month?
I hope to have it in a df like this with each letter in the index, so I can plot on a bar chart
df2 = {'Jan 2018': [count for each letter], 'Feb 2018': [count for each letter}
Would hugely appreciate any help you may have!
Thanks
CodePudding user response:
Here's a naive way:
new_dict = dict()
for col in df:
new_dict[col] = dict(df[col].value_counts())
print(new_dict)
df2 = pd.DataFrame(new_dict)
print(df2)
# new_dict
{'Jan 2018': {'a': 2, 'd': 2, 'b': 1, 'c': 1}, 'Feb 2018': {'e': 2, 'a': 1, 'b': 1, 'd': 1, 'f': 1}}
# df2
Jan 2018 Feb 2018
a 2.0 1.0
d 2.0 1.0
b 1.0 1.0
c 1.0 NaN
e NaN 2.0
f NaN 1.0
Here's a bar chart using matplotlib and new_dict
:
import matplotlib.pyplot as plt
plt.bar(new_dict['Jan 2018'].keys(), new_dict['Jan 2018'].values())
plt.show()
But here's a much better chart with df2
:
df2.plot.bar()
CodePudding user response:
I think you are looking for something like
df.groupby(['month', 'value']).value_counts()
I assume that you have this date values in a column named 'month' and the values in 'value'. I group by both of them and then output the number of times, this combination of month and value appears.
Edit: If you prefer to have the value counts then rather horizontally next to the month you can from there go on like this:
df.groupby(['month', 'value']).value_counts().reset_index().pivot('month', 'value', 0)
This will pivot the dataframe such, that you have a row for every month (the first argument to pivot
) and a column for every distinct value (the second argument to pivot
) and will use the value_counts as the values for this combinations (This is the last argument to pivot
. After the reset_index() this column only has the name 0.)
CodePudding user response:
I'm not sure what you're trying to achieve, so let me know if you have any corrections.
Here's a quick solution I came up with.There are two functions in case you don't know what elements you can have in your dataframe.
import copy
a,b,c,d,e,f = 'a','b','c','d','e','f' # Because I want to
data = {'Jan 2018': [a, a, b, c, d, d], 'Feb 2018': [a, b , d, e, e, f]}
def ultimate_counter(df):
df = copy.deepcopy(df) # if you don't do that, you'll delete the lists in data dict
counter = []
for j in df:
ad = df[j]
element_counter = []
element_list = []
while ad != []: # I have no idea what to expect, so I just check the first element, count it and delete duplicates
element = ad[0]
element_list.append(element)
element_counter.append(ad.count(element))
while True:
ad.remove(element)
if ad.count(element) == 0:
break
counter.append(dict(zip(element_list,element_counter)))
return counter
def narrow_counter(df):
list = ['a','b','c','d','e','f'] # We know what to expect
value_list = []
counter = []
for j in df:
for k in list:
value_list.append(df[j].count(k))
counter.append(dict(zip(list, value_list)))
return counter
CodePudding user response:
here is one way do it
use melt to flatten the dataframe, create a temporary count variable and finally use the pivot-table
df = pd.DataFrame(
{'Jan 2018': ['a', 'a', 'b', 'c', 'd', 'd'],
'Feb 2018': ['a', 'b', 'd', 'e', 'e', 'f']})
df.melt(var_name='month').assign(count=1).pivot_table(
index='value', columns='month', values='count', aggfunc='sum').fillna(0).astype(int)
month Feb 2018 Jan 2018
value
a 1 2
b 1 1
c 0 1
d 1 2
e 2 0
f 1 0