Home > database >  Create a counter of date values for a given max-min interval
Create a counter of date values for a given max-min interval

Time:11-26

Be the following python pandas DataFrame:

| date       | column_1 | column_2 |
| ---------- | -------- | -------- |
| 2022-02-01 | val      | val2     |
| 2022-02-03 | val1     | val      |
| 2022-02-01 | val      | val3     |
| 2022-02-04 | val2     | val      |
| 2022-02-27 | val2     | val4     |

I want to create a new DataFrame, where each row has a value between the minimum and maximum date value from the original DataFrame. The counter column contains a row counter for that date.

| date       | counter  |
| ---------- | -------- |
| 2022-02-01 | 2        |
| 2022-02-02 | 0        |
| 2022-02-03 | 1        | 
| 2022-02-04 | 1        |
| 2022-02-05 | 0        |
...
| 2022-02-26 | 0        |
| 2022-02-27 | 1        |

CodePudding user response:

Count dates first & remove duplicates using Drop duplicates. Fill intermidiate dates with Pandas has asfreq function for datetimeIndex, this is basically just a thin, but convenient wrapper around reindex() which generates a date_range and calls reindex.

df['counts'] = df['date'].map(df['date'].value_counts())
df = df.drop_duplicates(subset='date', keep="first")

df.date = pd.to_datetime(df.date)
df = df.set_index('date').asfreq('D').reset_index()
df =  df.fillna(0)
print(df)

Gives #

        date  counts
0  2022-02-01     2.0
1  2022-02-02     0.0
2  2022-02-03     1.0
3  2022-02-04     1.0
4  2022-02-05     0.0
5  2022-02-06     0.0
6  2022-02-07     0.0
7  2022-02-08     0.0
8  2022-02-09     0.0
9  2022-02-10     0.0
10 2022-02-11     0.0
11 2022-02-12     0.0
12 2022-02-13     0.0
13 2022-02-14     0.0
14 2022-02-15     0.0
15 2022-02-16     0.0
16 2022-02-17     0.0
17 2022-02-18     0.0
18 2022-02-19     0.0
19 2022-02-20     0.0
20 2022-02-21     0.0
21 2022-02-22     0.0
22 2022-02-23     0.0
23 2022-02-24     0.0
24 2022-02-25     0.0
25 2022-02-26     0.0

CodePudding user response:

Many ways to do this. Here is mine. Probably not optimal, but at least I am not iterating rows, nor using .apply, which are both sure recipes to create slow solutions

import pandas as pd
import datetime

# A minimal example (you should provide such an example next time)
df=pd.DataFrame({'date':pd.to_datetime(['2022-02-01', '2022-02-03', '2022-02-01', '2022-02-04', '2022-02-27']), 'c1':['val','val1','val','val2','val2'], 'c2':range(5)})

# A delta of 1 day, to create list of date
dt=datetime.timedelta(days=1)

# Result dataframe, with a count of 0 for now
res=pd.DataFrame({'date':df.date.min() dt*np.arange((df.date.max()-df.date.min()).days 1), 'count':0})

# Cound dates
countDates=df[['date', 'c1']].groupby('date').agg('count')

# Merge the counted dates with the target array, filling missing values with 0
res['count']=res.merge(countDates, on='date', how='left').fillna(0)['c1']
  • Related