I am trying to calculate a column in pandas which categorises the row based on the values in other columns, however I am struggling to make my logic work where I have None values in the dataset.
I have a datafame like this which shows the value of record between two weeks.
prior_week_value | current_week_value |
---|---|
None | 100 |
500 | 600 |
250 | 100 |
I want to be able to create a column which summarieses the difference between the two into 4 categories
- value increased (occurs when current_week_value > prior_week_value)
- value decreased (occurs when current_week_value < prior_week_value)
- value not changed (occurs when current_week_value == prior_week_value)
- new record (occurs when prior_week_value = None)
In this case it would look something like this
prior_week_value | current_week_value | value_category |
---|---|---|
None | 100 | new record |
500 | 600 | value increased |
250 | 100 | value decreased |
I have tried to implement this with an if statement but have failed at the first hurdle at identifying the None values. All I get when I run the below code is the rows containing the None values evaluating as false and returning the test value in the else statement.
if df['prior_week_value'].isnull().all():
df['value_category] = 'new record'
else:
df['value_category] = 'test'
Any help with this issue would be greatly appreciated
CodePudding user response:
lets use df.shift(1)
& np.select
for the various conditions.
s = df['current_week_value'].shift(1)
cond = [
s.isnull(),
df['current_week_value'] > s,
df['current_week_value'] < s,
df['current_week_value'] == s
]
results = ['new_record', 'value_increased', 'value_decreased', 'no_change']
df['value_category'] = np.select(cond, results)
print(df)
prior_week_value current_week_value value_category
0 None 100 new_record
1 500 600 value_increased
2 250 100 value_decreased
CodePudding user response:
Thank you Umar.H for your help - I made a small tweak and it worked, here is the code:
s = df['prior_week_value'] # this is the change i made to the code provided
cond = [
s.isnull(),
df['current_week_value'] > s,
df['current_week_value'] < s,
df['current_week_value'] == s,
]
results = ['new_record', 'value_increased', 'value_decreased', 'no_change']
df['value_category'] = np.select(cond, results)