I have a pandas DF which looks like this: [dataframe image]
And I want to add a date column here which goes from '2019-01-01' to '2019-12-01' since each row here is a month's observation. Is there a simple command that can get this job done? So far I have tried
pd.date_range(start='2019-01-01', end='2020-01-01', freq='M')
but this throws an error saying:
Length of values (12) does not match length of index (6876)
I am looking for something that looks like:
state1 district1 2019-01-01
state1 district1 2019-02-01
...
state1 district1 2019-12-01
state1 district2 2019-01-01
state1 district2 2019-02-01
...
Even though I know this is a possible solution, I strongly want to avoid splitting this into several 12 row data frames and (left) joining the date column to each of them.
CodePudding user response:
A somewhat stringent implementation would be
# dictionary of index -> month
months = dict(enumerate(pd.date_range(start='2019-01-01', periods=12, freq='MS')))
# Now create an enumeration of each item and map it to its month
df['month'] = df.groupby("ADM2_NAME").transform("cumcount").map(months)
If the dataframe is very long, I would work with pd.merge
instead for higher performance, but that requires some more effort to retain the order of the input, and I don't think it's required to have a more advanced solution.
Note I'm using freq="MS"
to get the first of every month.
CodePudding user response:
The following only assumes that the rows of each (state
, district
) group are sorted chronologically. It should work even if some groups don't have all the observations from the last few months.
dates = (
pd.Series(pd.date_range(start='2019-01-01', end='2020-01-01', freq='M'))
.rename('date')
)
df['date'] = (
df.groupby(['ADM1_NAME', 'ADM2_NAME'])
.cumcount()
.to_frame('date_idx')
.merge(dates, left_on='date_idx',
right_index=True, how='left')['date']
)