Home > Software engineering >  Python, Pandas - Trouble with code summing counts in pandas
Python, Pandas - Trouble with code summing counts in pandas

Time:06-18

Having trouble with code I've written.

The first df 'piv_1' contains a column 'Category' (Either 3.0 or 4.0) and columns ['Start', 'Issue', 'Comments Rec'] containing values 1,2,3...12 (numerical month values representing Jan-Dec).

What I'm trying to achieve in 'df_sum' is, a sum-count of occurrences for ['Start', 'Issue', 'Comments Rec'] from df 'piv_1' by 'Category' (Either 3.0 or 4.0).

First df - 'piv_1' (contains category 3 or 4, other columns month values):

Category Start Issue Comments Rec
3 5 6 7
4 6 6 7
4 6 7 9
3 6 7 8
3 7 8 8

Second df - 'df_sum' (Blank df to populate the count of months for each column, by category 3 or 4):

Month Month Value Category Start Issue Comments Rec
Jan 1 3 0 0 0
Feb 2 3 0 0 0
Mar 3 3 0 0 0
Apr 4 3 0 0 0
May 5 3 0 0 0
Jun 6 3 0 0 0
Jul 7 3 0 0 0
Aug 8 3 0 0 0
Sep 9 3 0 0 0
Oct 10 3 0 0 0
Nov 11 3 0 0 0
Dec 12 3 0 0 0
Jan 1 4 0 0 0
Feb 2 4 0 0 0
Mar 3 4 0 0 0
Apr 4 4 0 0 0
May 5 4 0 0 0
Jun 6 4 0 0 0
Jul 7 4 0 0 0
Aug 8 4 0 0 0
Sep 9 4 0 0 0
Oct 10 4 0 0 0
Nov 11 4 0 0 0
Dec 12 4 0 0 0

Expected result 'df_sum':

Month Month Value Category Start Issue Comments Rec
Jan 1 3 0 0 0
Feb 2 3 0 0 0
Mar 3 3 0 0 0
Apr 4 3 0 0 0
May 5 3 1 0 0
Jun 6 3 1 1 0
Jul 7 3 1 1 1
Aug 8 3 0 1 2
Sep 9 3 0 0 0
Oct 10 3 0 0 0
Nov 11 3 0 0 0
Dec 12 3 0 0 0
Jan 1 4 0 0 0
Feb 2 4 0 0 0
Mar 3 4 0 0 0
Apr 4 4 0 0 0
May 5 4 0 0 0
Jun 6 4 2 1 0
Jul 7 4 0 1 1
Aug 8 4 0 0 0
Sep 9 4 0 0 1
Oct 10 4 0 0 0
Nov 11 4 0 0 0
Dec 12 4 0 0 0

I've attempted with the following code:

df_sum['Start'] = sum((df_sum[df_sum['Month value']]) == (piv_1[piv_1['Start']]) and (df_sum[df_sum['Category']]) == (piv_1[piv_1['Category']]))

I planned to replicate code for each column. That was if it worked...

Appreciate the help.

Kind Regards, Mike.

CodePudding user response:

If you have df_sum dataframe for all Category values and all Month, Month Value at each Category,

It'll work for that.

from collections import Counter
df_grp = df.groupby('Category').aggregate({"Start":pd.Series.to_list, "Issue":pd.Series.to_list, "Comments Rec": pd.Series.to_list})
df_grp = df_grp.applymap(lambda x: Counter(x))

for category, row in df_grp.iterrows():
    for col in ["Start", "Issue", "Comments Rec"]:
        for key, value in row[col].items():
            idx = df_sum.query("Category == @category and `Month Value` == @key".format(col)).index
            df_sum.loc[idx, col] = value

Here is an example. If df looks like bellow,


Category    Start   Issue   Comments Rec
0   3   5   6   7
1   4   6   6   7
2   4   6   7   9
3   3   6   7   8
4   3   7   8   8

df_grp is

    Start   Issue   Comments Rec
Category            
3   {5: 1, 6: 1, 7: 1}  {6: 1, 7: 1, 8: 1}  {7: 1, 8: 2}
4   {6: 2}  {6: 1, 7: 1}    {7: 1, 9: 1}

So you can map the value to df_sum with Category and Month(key of dictionary). I got index with query string with these values and updated with iloc function.

CodePudding user response:

You can also receive sum of the total occurrences by merging parts of the the first table to second table and grouping it by Month, Month Value and Category. How it can look like:

import pandas as pd

#table_1 is your first table
#table_2 is your second table

table_3 = table_2[['Month', 'Month Value', 'Category']].copy()

for i in table_1.columns[1:]:
    temp = ((table_2[['Month', 'Month Value', 'Category']].merge(table_1[['Category', i]],
                                                              left_on=['Category','Month Value'], 
                                                              right_on=['Category',i], 
                                                              how='left')
                                .groupby(['Month','Month Value','Category'], as_index=False)
                                .count()
                                .sort_values(['Category','Month Value'])))
    table_3 = table_3.merge(temp, on=['Month','Month Value','Category'])

And the result: enter image description here

  • Related