Looping through a dataframe to extract count of each value-CodePudding

I'm working on a project that counts the number of occurrences of each item/letter in a dataframe (monthly data). Sample dataframe looks as follows:

df = {'Jan 2018': [a, a, b, c, d, d], 'Feb 2018': [a, b , d, e, e, f}

I am fairly new to Python so bear with me - I have managed to work out how to find the value of an individual item in a chosen month using the following

count_df = df['Jan 2018'].value_counts()['a']

which gives the output 2 as an int. My question is how can I create a large dataframe that automatically counts all instances of a value for each month?

I hope to have it in a df like this with each letter in the index, so I can plot on a bar chart

df2 = {'Jan 2018': [count for each letter], 'Feb 2018': [count for each letter}

Would hugely appreciate any help you may have!

Thanks

CodePudding user response：

Here's a naive way:

new_dict = dict()
for col in df:
    new_dict[col] = dict(df[col].value_counts())
print(new_dict)

df2 = pd.DataFrame(new_dict)
print(df2)

# new_dict
{'Jan 2018': {'a': 2, 'd': 2, 'b': 1, 'c': 1}, 'Feb 2018': {'e': 2, 'a': 1, 'b': 1, 'd': 1, 'f': 1}}

# df2
   Jan 2018  Feb 2018
a       2.0       1.0
d       2.0       1.0
b       1.0       1.0
c       1.0       NaN
e       NaN       2.0
f       NaN       1.0

Here's a bar chart using matplotlib and new_dict:

import matplotlib.pyplot as plt

plt.bar(new_dict['Jan 2018'].keys(), new_dict['Jan 2018'].values())
plt.show()

But here's a much better chart with df2:

df2.plot.bar()

CodePudding user response：

I think you are looking for something like

df.groupby(['month', 'value']).value_counts()

I assume that you have this date values in a column named 'month' and the values in 'value'. I group by both of them and then output the number of times, this combination of month and value appears.

Edit: If you prefer to have the value counts then rather horizontally next to the month you can from there go on like this:

df.groupby(['month', 'value']).value_counts().reset_index().pivot('month', 'value', 0)

This will pivot the dataframe such, that you have a row for every month (the first argument to pivot) and a column for every distinct value (the second argument to pivot) and will use the value_counts as the values for this combinations (This is the last argument to pivot. After the reset_index() this column only has the name 0.)

CodePudding user response：

I'm not sure what you're trying to achieve, so let me know if you have any corrections.
Here's a quick solution I came up with.There are two functions in case you don't know what elements you can have in your dataframe.

import copy

a,b,c,d,e,f = 'a','b','c','d','e','f' # Because I want to

data = {'Jan 2018': [a, a, b, c, d, d], 'Feb 2018': [a, b , d, e, e, f]}

def ultimate_counter(df):
    df = copy.deepcopy(df) # if you don't do that, you'll delete the lists in data dict
    counter = []
    for j in df:
        ad = df[j]
        element_counter = []
        element_list = []
        while ad != []:   # I have no idea what to expect, so I just check the first element, count it and delete duplicates
            element = ad[0]
            element_list.append(element)
            element_counter.append(ad.count(element))
            while True:
                ad.remove(element)
                if ad.count(element) == 0:
                    break
        counter.append(dict(zip(element_list,element_counter)))
    return counter
            


def narrow_counter(df):
    list = ['a','b','c','d','e','f'] # We know what to expect
    value_list = []
    counter = []
    for j in df:
        for k in list:
            value_list.append(df[j].count(k))
        counter.append(dict(zip(list, value_list)))
    return counter

CodePudding user response：

here is one way do it

use melt to flatten the dataframe, create a temporary count variable and finally use the pivot-table

df = pd.DataFrame(
    {'Jan 2018': ['a', 'a', 'b', 'c', 'd', 'd'], 
     'Feb 2018': ['a', 'b', 'd', 'e', 'e', 'f']})



df.melt(var_name='month').assign(count=1).pivot_table(
    index='value', columns='month', values='count', aggfunc='sum').fillna(0).astype(int)

month   Feb 2018    Jan 2018
value       
    a          1        2    
    b          1        1    
    c          0        1
    d          1        2
    e          2        0
    f          1        0