Home > Mobile >  Append values in new column based on 2 different condition in Python
Append values in new column based on 2 different condition in Python

Time:01-06

I have a sample data set which is similar to the one defined below.

dict_1 = {'Id' : [1, 1, 2, 2, 3, 4],
         'boolean_val' : [True, False, True, False, True, False],
         "sal" : [1000, 2000, 1500, 2500, 3500, 4500]}

test = pd.DataFrame(dict_1)
test.head(10)

I have to create 2 new columns in test dataframe i.e. output_True & output_False based on given conditions:
a) If Id[0] == Id[1] & boolean_val = True then put sal[0](Because this is the value when boolean_val = True) in output_True else "NA".
b) If Id[0] == Id[1] & boolean_val = False then put sal[1](Because this is the value when boolean_val = False) in output_False else "NA".
c) If Id[0] 1= Id[1] & boolean_val == True then put sal value of that row in output_True else if Id[0] 1= Id[1] & boolean_val == False then put sal value of that row in output_False.

If I have not properly framed my question then please check below dataframe output and I want my output to be similar to output_True & output_False as shown below.

dict_1 = {'Id' : [1, 1, 2, 2, 3, 4],
         'boolean_val' : [True, False, True, False, True, False],
         "sal" : [1000, 2000, 1500, 2500, 3500, 4500],
         "output_True" : [1000, "NA", 1500, "NA", 3500, "NA"],
         "output_False" : [2000, "NA", 2500, "NA", "NA", 4500]}

output_df = pd.DataFrame(dict_1)
output_df.head(10)

I have tried using np.where() & list comprehension but my output data is not showing me correct value. Can someone please help me with this?

CodePudding user response:

Use loc to assign your values for the boolean column. For the second condition you can use .shift() and compare your Id[0] == Id[1] values and sum based on that:

dict_1 = {'Id' : [1, 1, 2, 2, 3, 4],
         'boolean_val' : [True, False, True, False, True, False],
         "sal" : [1000, 2000, 1500, 2500, 3500, 4500]}

test = pd.DataFrame(dict_1)
test

    Id  boolean_val sal
0   1   True    1000
1   1   False   2000
2   2   True    1500
3   2   False   2500
4   3   True    3500
5   4   False   4500

cond1 = test.boolean_val
test.loc[cond1, 'output_True'] = test.sal


cond2 = (test.Id.shift(-1).eq(test.Id))
test['output_False'] = np.nan
test.loc[cond2, 'output_False'] = test['sal']   test['output_True']
test

    Id  boolean_val sal output_True output_False
0   1   True    1000    1000.0  2000.0
1   1   False   2000    NaN     NaN
2   2   True    1500    1500.0  3000.0
3   2   False   2500    NaN     NaN
4   3   True    3500    3500.0  NaN
5   4   False   4500    NaN     NaN

CodePudding user response:

Here's a way to get your desired output:

df = test.pivot(index='Id', columns='boolean_val', values='sal')
df = df.assign(boolean_val=df.loc[:,True].notna()).set_index('boolean_val', append=True)
df = df.rename(columns={True:'output_True', False:'output_False'})[['output_True', 'output_False']]

output_df = test.join(df, on=['Id','boolean_val'])
for col in ('output_True', 'output_False'):
    output_df[col] = np.where(output_df[col].isna(), "NA", output_df[col].astype(pd.Int64Dtype()))

Output:

   Id boolean_val   sal output_False output_True
0   1        True  1000         2000        1000
1   1       False  2000           NA          NA
2   2        True  1500         2500        1500
3   2       False  2500           NA          NA
4   3        True  3500           NA        3500
5   4       False  4500         4500          NA

Explanation:

  • use pivot() to create an intermediate dataframe df with True and False columns containing the corresponding sal values for each Id
  • add a boolean_val column which contains True unless a given row's True column is NaN
  • set Id, boolean_val as the index for df
  • rename the True and False columns as output_True and output_False and swap their positions (to match the desired output)
  • use join() to create output_df which is test with added columns output_Trueandoutput_False`
  • replace NaN with the string "NA" and change sal values from float to int in output_True and output_False.
  • Related