Home > database >  How to derieve value into a new column based on unique value across the table
How to derieve value into a new column based on unique value across the table

Time:07-02

I have the below pandas dataframe where I would like derive an output into a column named "RESULT" based on the other values in the column across of the table

enter image description here

CodePudding user response:

It needs a bit of beautification, but you can achieve it with apply on axis=1 and a for loop over the columns that you want to check.

df = pd.DataFrame({
    'AGE': [25, 25, 35, np.nan, 25, 25, 25],
    'YEAR': [1987, np.nan, 1991, 1991, 1991, 1991, 1956],
    'POST_CODE': [2150, 2150, 2150, 2150, 3125, np.nan, 2150],
    'EXPECTED_RESULT': ['YEAR=1987', 'YEAR=NULL', 'AGE=35', 'AGE=NULL', 'POST_CODE=3125', 'POST_CODE=NULL', 'YEAR=1956']
})

def custom_func(x: pd.Series, val_counts):
    return f'{x.index[0]}={x[0]}' if val_counts[x[0]] == 1 and pd.isna(x[1]) else x[1]

cols = df.columns[:-1]

df['RESULT'] = np.nan
for col in cols:
    val_counts = df.value_counts(col, dropna=False)
    df['RESULT'] = df[[col, 'RESULT']].apply(lambda x: custom_func(x, val_counts), axis=1)

enter image description here

CodePudding user response:

You can try looping through the columns and use Series.duplicated to find the unique value

df['result'] = ''
for col in ['AGE', 'YEAR', 'POST_CODE']:
    m = ~df[col].duplicated(keep=False)
    df['result'] = df['result'].mask(m, col '=' df[col].astype(str))
print(df)

    AGE  YEAR POST_CODE          result
0    25  1987      2150       YEAR=1987
1    25  NULL      2150       YEAR=NULL
2    35  1991      2150          AGE=35
3  NULL  1991      2150        AGE=NULL
4    25  1991      3125  POST_CODE=3125
5    25  1991      NULL  POST_CODE=NULL
6    25  1956      2150       YEAR=1956
  • Related