Append values in new column based on 2 different condition in Python-CodePudding

I have a sample data set which is similar to the one defined below.

dict_1 = {'Id' : [1, 1, 2, 2, 3, 4],
         'boolean_val' : [True, False, True, False, True, False],
         "sal" : [1000, 2000, 1500, 2500, 3500, 4500]}

test = pd.DataFrame(dict_1)
test.head(10)

I have to create 2 new columns in test dataframe i.e. output_True & output_False based on given conditions:
a) If Id[0] == Id[1] & boolean_val = True then put sal[0](Because this is the value when boolean_val = True) in output_True else "NA".
b) If Id[0] == Id[1] & boolean_val = False then put sal[1](Because this is the value when boolean_val = False) in output_False else "NA".
c) If Id[0] 1= Id[1] & boolean_val == True then put sal value of that row in output_True else if Id[0] 1= Id[1] & boolean_val == False then put sal value of that row in output_False.

If I have not properly framed my question then please check below dataframe output and I want my output to be similar to output_True & output_False as shown below.

dict_1 = {'Id' : [1, 1, 2, 2, 3, 4],
         'boolean_val' : [True, False, True, False, True, False],
         "sal" : [1000, 2000, 1500, 2500, 3500, 4500],
         "output_True" : [1000, "NA", 1500, "NA", 3500, "NA"],
         "output_False" : [2000, "NA", 2500, "NA", "NA", 4500]}

output_df = pd.DataFrame(dict_1)
output_df.head(10)

I have tried using np.where() & list comprehension but my output data is not showing me correct value. Can someone please help me with this?

CodePudding user response：

Use loc to assign your values for the boolean column. For the second condition you can use .shift() and compare your Id[0] == Id[1] values and sum based on that:

dict_1 = {'Id' : [1, 1, 2, 2, 3, 4],
         'boolean_val' : [True, False, True, False, True, False],
         "sal" : [1000, 2000, 1500, 2500, 3500, 4500]}

test = pd.DataFrame(dict_1)
test

    Id  boolean_val sal
0   1   True    1000
1   1   False   2000
2   2   True    1500
3   2   False   2500
4   3   True    3500
5   4   False   4500

cond1 = test.boolean_val
test.loc[cond1, 'output_True'] = test.sal


cond2 = (test.Id.shift(-1).eq(test.Id))
test['output_False'] = np.nan
test.loc[cond2, 'output_False'] = test['sal']   test['output_True']
test

    Id  boolean_val sal output_True output_False
0   1   True    1000    1000.0  2000.0
1   1   False   2000    NaN     NaN
2   2   True    1500    1500.0  3000.0
3   2   False   2500    NaN     NaN
4   3   True    3500    3500.0  NaN
5   4   False   4500    NaN     NaN

CodePudding user response：

Here's a way to get your desired output:

df = test.pivot(index='Id', columns='boolean_val', values='sal')
df = df.assign(boolean_val=df.loc[:,True].notna()).set_index('boolean_val', append=True)
df = df.rename(columns={True:'output_True', False:'output_False'})[['output_True', 'output_False']]

output_df = test.join(df, on=['Id','boolean_val'])
for col in ('output_True', 'output_False'):
    output_df[col] = np.where(output_df[col].isna(), "NA", output_df[col].astype(pd.Int64Dtype()))

Output:

   Id boolean_val   sal output_False output_True
0   1        True  1000         2000        1000
1   1       False  2000           NA          NA
2   2        True  1500         2500        1500
3   2       False  2500           NA          NA
4   3        True  3500           NA        3500
5   4       False  4500         4500          NA

Explanation:

use pivot() to create an intermediate dataframe df with True and False columns containing the corresponding sal values for each Id
add a boolean_val column which contains True unless a given row's True column is NaN
set Id, boolean_val as the index for df
rename the True and False columns as output_True and output_False and swap their positions (to match the desired output)
use join() to create output_df which is test with added columns output_Trueandoutput_False`
replace NaN with the string "NA" and change sal values from float to int in output_True and output_False.