Home > database >  np.where() conversion failing
np.where() conversion failing

Time:11-09

I'm trying to map an integer which represents Year-Month ('Period' column in code below) into a new column that represents the Year-Quarter ('DIST_PERIOD' column).

For instance,

202101, 202102, and 202103 become '20211'

202104, 202105, and 202106 become '20212'

etc.

My code below is running but not how I thought it would.

df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '01') | (str(df['Period'])[4:] == '02') | (str(df['Period'])[4:] == '03'), str(df['Period'])[:4] '1', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '04') | (str(df['Period'])[4:] == '05') | (str(df['Period'])[4:] == '06'), str(df['Period'])[:4] '2', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '07') | (str(df['Period'])[4:] == '08') | (str(df['Period'])[4:] == '09'), str(df['Period'])[:4] '3', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '10') | (str(df['Period'])[4:] == '11') | (str(df['Period'])[4:] == '12'), str(df['Period'])[:4] '4', df['DIST_PERIOD'])

Not sure how to correct my str() so that I am correctly capturing the last two characters for each row.

CodePudding user response:

not sure if the np.where is a good choice here, instead use the map function from pandas.

create a dictionary objects of this mapping such for example: di= {'202101':'20211', '202102':'20211', '202103':'20211'}

same way, add the more dictionary object in above dictionary to create the mapping object.

after that do this:

df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di)

Note: it will create a NAN if cannot find the all the mapping object. If you want to map only few objects and leave other as untouched, then use:

df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di).fillna(df['yourcolumnnametobemapped'])

CodePudding user response:

numpy.where doesn't work because you have multiple cases/conditions, and you are applying them sequentially, not in one go, as it should.

A better way is to convert that column to datetime and then access and combine the datetime properties year and quarter.

month_year = pd.to_datetime(df['DIST_PERIOD'], format="%Y%m")
df['DIST_PERIOD'] = month_year.dt.year.astype(str)   month_year.dt.quarter.astype(str)
  • Related