Home > Blockchain >  count number of occurrences within a date range
count number of occurrences within a date range

Time:06-16

max_date = datetime.today().strftime('%d-%m-%Y')
min_date = "06-08-2021"

I have a df that looks like this. For now it only has 1 row:

name       value 
Name1      23

Then I have another dataset df2 that looks like this:

date            group 
07-08-2021      A
07-08-2021      A
06-08-2021      A
09-08-2021      A
07-08-2021      A
07-08-2021      B
06-08-2021      B
03-08-2020      A

I want to iterate through all rows of df2 and if the date if within the range of min_date and max_date, I want to do a cummulative sum of all occurences of A and B.

This means that I want to count the number of times a particular group type occured within that range. Then I want to add the that value to my first dataset. Something like this:

name       value     count_A        count_B   
Name1      23        5              2

Note that the last row:

03-08-2020      A

is not counted since the date doesn't fall in the range.

EDIT: sample df:

details = {
    'Name' : ['Name1'],
    'Value' : [23],
}
df1 = pd.DataFrame(details)

details = {
    'Date' : ['07-08-2021', '07-08-2021', '06-08-2021', '09-08-2021','07-08-2021','07-08-2021','06-08-2021','03-08-2020'],
    'Group' : ['A', 'A', 'A', 'A','A','B','B','A'],
}
df2 = pd.DataFrame(details)

CodePudding user response:

details = {
    'Date' : ['07-08-2021', '07-08-2021', '06-08-2021', '09-08-2021','07-08-2021','07-08-2021','06-08-2021','03-08-2020'],
    'Group' : ['A', 'A', 'A', 'A','A','B','B','A'],
}
details1 = {
    'Name' : ['Name1'],
    'Value' : [23],
}
df1 = pd.DataFrame(details1)  
    
df = pd.DataFrame(details)
max_date = datetime.today().strftime('%d-%m-%Y')
min_date = "06-08-2021"
df = df[(df['Date'] <= max_date) & (df['Date'] > min_date)]
df = df.groupby('Group').count()
df1_transposed = df.T
df1_transposed = df1_transposed[['A', 'B']]
df1_transposed = df1_transposed.reset_index()
df1 = pd.merge(df1, df1_transposed, left_index=True, right_index=True)
df1 = df1[['Name', 'Value', 'A', 'B']]
df1.rename(columns = {'A':'count_A', 'B':'count_B'}, inplace = True)

print(df1)

output

Name  Value  count_A  count_B
  Name1     23        4        1

CodePudding user response:

Preferably work with datetime.date objects instead of strings:

from datetime import date

max_date = date.today()
min_date = date(2021,8,6)

If the dates in df2 are strings, you may convert them to datetime.date objects first, while iterating through all the rows:

# example for first iteration of df2
from datetime import date

# iterate over all dates in your df2 and include the following:
dash_date = '07-08-2021'
py_date = datetime.strptime(dash_date, '%d-%m-%Y').date()

# check if date of current iteration is between max_date and min_date
py_date > min_date and py_date < max_date

Based on the comparison you can decide whether you want to add the value to your first data set or not.

  • Related