I'm trying to map an integer which represents Year-Month ('Period' column in code below) into a new column that represents the Year-Quarter ('DIST_PERIOD' column).
For instance,
202101, 202102, and 202103 become '20211'
202104, 202105, and 202106 become '20212'
etc.
My code below is running but not how I thought it would.
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '01') | (str(df['Period'])[4:] == '02') | (str(df['Period'])[4:] == '03'), str(df['Period'])[:4] '1', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '04') | (str(df['Period'])[4:] == '05') | (str(df['Period'])[4:] == '06'), str(df['Period'])[:4] '2', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '07') | (str(df['Period'])[4:] == '08') | (str(df['Period'])[4:] == '09'), str(df['Period'])[:4] '3', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '10') | (str(df['Period'])[4:] == '11') | (str(df['Period'])[4:] == '12'), str(df['Period'])[:4] '4', df['DIST_PERIOD'])
Not sure how to correct my str() so that I am correctly capturing the last two characters for each row.
CodePudding user response:
not sure if the np.where
is a good choice here, instead use the map
function from pandas.
create a dictionary objects of this mapping such for example: di= {'202101':'20211', '202102':'20211', '202103':'20211'}
same way, add the more dictionary object in above dictionary to create the mapping object.
after that do this:
df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di)
Note: it will create a NAN if cannot find the all the mapping object. If you want to map only few objects and leave other as untouched, then use:
df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di).fillna(df['yourcolumnnametobemapped'])
CodePudding user response:
numpy.where
doesn't work because you have multiple cases/conditions, and you are applying them sequentially, not in one go, as it should.
A better way is to convert that column to datetime and then access and combine the datetime properties year
and quarter
.
month_year = pd.to_datetime(df['DIST_PERIOD'], format="%Y%m")
df['DIST_PERIOD'] = month_year.dt.year.astype(str) month_year.dt.quarter.astype(str)