I have a dataframe similar to what follows:
test = {"id": ["A", "A", "A", "B", "B", "B"],
"date": ["09-02-2013", "09-03-2013", "09-05-2013", "09-15-2013", "09-17-2013", "09-18-2013"],
"country": ["Poland", "Poland", "France", "Scotland", "Scotland", "Canada"]}
and I want a table which returns this :
id | date | country |
---|---|---|
A | 09-02-2013 | Poland |
A | 09-03-2013 | Poland |
A | 09-04-2013 | Poland |
A | 09-05-2013 | France |
B | 09-15-2013 | Scotland |
B | 09-16-2013 | Scotland |
B | 09-17-2013 | Scotland |
B | 09-18-2013 | Canada |
i.e. a table that fills in any date that I am missing but will only do it to the min/max of each id
I have looked around stack overflow but usually this problem just has one index or the person wants to drop an index anyway This is what I have got so far:
test_df = pd.DataFrame(test)
# get min date per id
dates = test_df.groupby("id")["date"].min().to_frame(name="min")
# get max date
dates["max"] = test_df.groupby("id")["date"].max().to_frame(name="max")
midx = pd.MultiIndex.from_frame(dates.apply(lambda x: pd.date_range(x["min"], x["max"], freq="D"), axis=1).explode().reset_index(name="date")[["date", "id"]])
test_df = test_df.set_index(["date", "id"])
test_df = test_df.reindex(midx).fillna(method="ffill")
test_df
Which gets me really close but not quite there, with the dates all there but no country:
id | date | country |
---|---|---|
A | 09-02-2013 | NaN |
A | 09-03-2013 | NaN |
A | 09-04-2013 | NaN |
A | 09-05-2013 | NaN |
B | 09-15-2013 | NaN |
B | 09-16-2013 | NaN |
B | 09-17-2013 | NaN |
B | 09-18-2013 | NaN |
Any ideas on how to fix it?
CodePudding user response:
IIUC, you could generate a date_range
per group, explode
, then merge
and ffill
the values per group:
out = (test_df
.merge(pd
.to_datetime(test_df['date'], dayfirst=False)
.groupby(test_df['id'])
.apply(lambda g: pd.date_range(g.min(), g.max(), freq='D'))
.explode().dt.strftime('%m-%d-%Y')
.reset_index(name='date'),
how='right'
)
.assign(country=lambda d: d.groupby('id')['country'].ffill())
)
output:
id date country
0 A 09-02-2013 Poland
1 A 09-03-2013 Poland
2 A 09-04-2013 Poland
3 A 09-05-2013 France
4 B 09-15-2013 Scotland
5 B 09-16-2013 Scotland
6 B 09-17-2013 Scotland
7 B 09-18-2013 Canada