data
userid data_date
A 2019-02-03
A 2019-04-05
B 2019-03-04
B 2019-12-31
Goal
import pandas as pd
import janitor
new_dates = {"data_date": pd.date_range('2019-01-01', '2019-12-31', freq="D")}
d1=df.complete(new_dates,by=['userid'])
The result
userid data_date
A 2019-01-01
…… ……
A 2019-04-05
B 2019-01-01
…… ……
B 2019-12-31
My expected result
userid data_date
A 2019-02-03
…… ……
A 2019-04-05
B 2019-03-04
…… ……
B 2019-12-31
CodePudding user response:
Since you are building it per group, you can use a lambda expression to get the min and max per group - I will reuse @MarceloAndrioni's data:
# pip install pyjanitor
import pandas as pd
import janitor
df = pd.DataFrame(data=[['A', '2019-02-03'],
['A', '2019-04-05'],
['B', '2019-03-04'],
['B', '2019-12-31']], columns=['userid', 'data_date'])
df.data_date = pd.to_datetime(df.data_date)
new_dates = {'data_date' : lambda df: pd.date_range(df.min(), df.max(), freq='D')}
out = df.complete(new_dates, by = 'userid')
out.head(10)
userid data_date
0 A 2019-02-03
1 A 2019-02-04
2 A 2019-02-05
3 A 2019-02-06
4 A 2019-02-07
5 A 2019-02-08
6 A 2019-02-09
7 A 2019-02-10
8 A 2019-02-11
9 A 2019-02-12
out.tail(10)
userid data_date
355 B 2019-12-22
356 B 2019-12-23
357 B 2019-12-24
358 B 2019-12-25
359 B 2019-12-26
360 B 2019-12-27
361 B 2019-12-28
362 B 2019-12-29
363 B 2019-12-30
364 B 2019-12-31
You can have a look here for more examples.
CodePudding user response:
Maybe something like this where you create new dataframes using the minmax date and then concatenate them.
import numpy as np
import pandas as pd
df = pd.DataFrame(data=[['A', '2019-02-03'],
['A', '2019-04-05'],
['B', '2019-03-04'],
['B', '2019-12-31']], columns=['userid', 'data_date'])
dfs = []
for userid in df['userid'].unique():
series = pd.date_range(start=df[df['userid'] == userid]['data_date'].min(),
end=df[df['userid'] == userid]['data_date'].max(),
freq='D')
df2 = pd.DataFrame(data=zip(np.full(series.shape, userid), series),
columns=['userid', 'data_date'])
dfs.append(df2)
df = pd.concat(dfs)