Home > Enterprise >  Count column value in a data frame
Count column value in a data frame

Time:09-16

This is the data frame

           Date    Name     Attempt   Count
0  2015-01-02    Adam       Yes      
1  2015-01-29    Adam       Yes      
2  2015-02-02    David      No       
3  2015-03-02    David      Yes      
4  2015-03-02    Hardik     Yes       
5  2015-03-02    David      No    

Now I need to count the number of attempts of a person month-wise. Want to count only if attempt value is 'yes' and insert the value in count column.

This is the expected output

           Date    Name     Attempt    Count
0  2015-01-02    Adam        Yes          2
1  2015-01-29    Adam        Yes          2
2  2015-02-02    David       No           0
3  2015-03-02    David       Yes          1
4  2015-03-02    Hardik      Yes          1 
5  2015-03-02    David       No           1

CodePudding user response:

Convert Date column to datetime type if not already, then assign a temporary column as month extracting the month value from the Date column.

Then filter out rows with Yes value for Attempt column, and group it by month, and Name columns, then call size to get count, finally call to_frame passing count as column name to create a dataframe out of it, and merge it with the dataframe with left merge passing the common keys for on parameter, then fillna(0), drop month column, and convert count column as integer type.

df['Date'] = pd.to_datetime(df['Date'])
df['month']=df['Date'].dt.month
(df.merge(df[df['Attempt'].eq('Yes')]
             .groupby(['month', 'Name'])
             .size()
             .to_frame('count'), 
        how='left', 
        on=['month', 'Name']
        ).fillna(0).drop(columns='month').astype({'count':int})
)
        Date    Name Attempt  count
0 2015-01-02    Adam     Yes      2
1 2015-01-29    Adam     Yes      2
2 2015-02-02   David      No      0
3 2015-03-02   David     Yes      1
4 2015-03-02  Hardik     Yes      1
5 2015-03-02   David      No      1

CodePudding user response:

You will need to group the date and names on a month frequency and count how many attempts were "Yes" by previously filtering them out:

res = df[df["Attempt"].eq("Yes")]
                      .groupby([pd.Grouper(key='date', freq='M'), 'Name'])["Attempt"]
                      .count()

Then you can assign the data using map as:

my_map = dict(zip(res.index.get_level_values('Name'), res.values))
df["Counts"] = df["Name"].map(my_map)

Output:

        date    Name Attempt  Counts
0 2015-01-02    Adam     Yes       2
1 2015-01-29    Adam     Yes       2
2 2015-02-02   David      No       1
3 2015-03-02   David     Yes       1
4 2015-03-02  Hardik     Yes       1
5 2015-03-02   David      No       1

(Note that David has 1 attempt on the '2015-02-02' that you don't want apparently. In order to solve that, just use simple slicing rules as:

df.loc[df["Attempt"].eq("Yes"), "Counts"] = df.loc[df["Attempt"]
                                              .eq("Yes"), "Name"]
                                              .map(my_map)

In this case you would get:

        date    Name Attempt  Counts
0 2015-01-02    Adam     Yes     2.0
1 2015-01-29    Adam     Yes     2.0
2 2015-02-02   David      No     NaN
3 2015-03-02   David     Yes     1.0
4 2015-03-02  Hardik     Yes     1.0
5 2015-03-02   David      No     NaN

After seeing the nice answer by @ThePyGuy. I was interested on checking performance of both his and my suggestions (none "correcter" than the other I hope ;)) and found out his suggestion to take 4.53 ms and mine 2.39 ms. Note that these statistics are for this small data set. It might be performance will scale weirdly with dataframe size.

  • Related