Home > OS >  Pandas dataframe create calculated column based on other column but dealing with None values
Pandas dataframe create calculated column based on other column but dealing with None values

Time:08-08

I am trying to calculate a column in pandas which categorises the row based on the values in other columns, however I am struggling to make my logic work where I have None values in the dataset.

I have a datafame like this which shows the value of record between two weeks.

prior_week_value current_week_value
None 100
500 600
250 100

I want to be able to create a column which summarieses the difference between the two into 4 categories

  • value increased (occurs when current_week_value > prior_week_value)
  • value decreased (occurs when current_week_value < prior_week_value)
  • value not changed (occurs when current_week_value == prior_week_value)
  • new record (occurs when prior_week_value = None)

In this case it would look something like this

prior_week_value current_week_value value_category
None 100 new record
500 600 value increased
250 100 value decreased

I have tried to implement this with an if statement but have failed at the first hurdle at identifying the None values. All I get when I run the below code is the rows containing the None values evaluating as false and returning the test value in the else statement.

if df['prior_week_value'].isnull().all():
 df['value_category] = 'new record'
else:
 df['value_category] = 'test'

Any help with this issue would be greatly appreciated

CodePudding user response:

lets use df.shift(1) & np.select for the various conditions.

s = df['current_week_value'].shift(1)

cond = [
         s.isnull(),
         df['current_week_value'] > s, 
         df['current_week_value'] < s,
         df['current_week_value'] == s
        ]

results = ['new_record', 'value_increased', 'value_decreased', 'no_change']

df['value_category'] = np.select(cond, results)


print(df)

  prior_week_value  current_week_value   value_category
0             None                 100       new_record
1              500                 600  value_increased
2              250                 100  value_decreased

CodePudding user response:

Thank you Umar.H for your help - I made a small tweak and it worked, here is the code:

s = df['prior_week_value'] # this is the change i made to the code provided

cond = [
    s.isnull(),
    df['current_week_value'] > s,
    df['current_week_value'] < s,
    df['current_week_value'] == s,
]

results = ['new_record', 'value_increased', 'value_decreased', 'no_change']

df['value_category'] = np.select(cond, results)
  • Related