I have a column as 'PRODUCT_ID' in my pandas dataframe. I want to create a calculated column based on this column that PRODUCT_IDs in [3, 5, 8] will be taking value 'old' and others 'new'.
Right now I'm using a for loop to check every single index of the dataframe.
portfoy['PRODUCT_TYPE'] = np.nan
for ind in portfoy.index:
if portfoy.loc[ind, 'PRODUCT_CODE'] in [3, 5, 8]:
portfoy.loc[ind, 'PRODUCT_TYPE'] = 'old'
else:
portfoy.loc[ind, 'PRODUCT_TYPE'] = 'new'
This code seems to take a lot of time. Is there a better way to do this?
My data looks like:
CUSTOMER | PRODUCT_ID | other columns |
---|---|---|
2345 | 3 | ------------- |
3456 | 5 | ------------- |
2786 | 5 | ------------- |
CodePudding user response:
Use numpy.where
with Series.isin
for vectorized fast solution:
portfoy['PRODUCT_TYPE'] = np.where(portfoy['PRODUCT_CODE'].isin([3, 5, 8]), 'old', 'new')
CodePudding user response:
you can use masks to conditional update the data frame
portfoy.loc[portfoy.PRODUCT_CODE.isin([3,5,8]),'PRODUCT_TYPE'] = 'old'
portfoy.loc[~portfoy.PRODUCT_CODE.isin([3,5,8]),'PRODUCT_TYPE'] = 'new'
portfoy.PRODUCT_CODE.isin([3,5,8] is the mask
~ is the negation of the mask