How can I use \n and \t in lambda function to replace dataframe?-CodePudding

I have very large scale dataframe with 100,000 rows and 300 columns

and I'm trying to fill out Nan rows in one columns by extracting the values from the other columns

here is the example,

let's say we have a sample dataframe such as:

     NAME    RRN_FRONT    RRN_BACK    EVENT_DTL
1    JOHN    891105       1067714     Nan
2    SHOWN   791134       1156543     Nan
3    BROWN   581104       1668314     Nan
4    MIKE    984564       0153422     1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0153422
5    LARRY   796515       0168165     1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : 0168165

and I want to fill out Nan values with the NAME, RRN_FRONT, RRN_BACK

Here is the input that I tried:

df.loc[df.EVENT_DTL.isnull(), 'EVENT_DTL'] = df.apply(lambda x: ('1. NAME :\n'   str(x['NAME'])   '2. BIRTHDAY :\n'   str(x['RRN_FRONT'])   '3. SSN : \n'   str(x['RRN_BACK']

and the output is not what I intended:

1. NAME :\nJOHN2. BIRTHDAY :\n8911053. SSN : \n1067714
2. ...
 .
 .
5. ...

CodePudding user response：

Pandas.apply applies the operations on axis=0 (index axis) by default, and you need to change the axis=1 in your case:

df['EVENT_DTL'] = (np.where(df['EVENT_DTL'].isna(), 
                  df.apply(lambda x: ('1. NAME :\n'   str(x['NAME'])  
                  '2. BIRTHDAY :\n'   str(x['RRN_FRONT'])   '3. SSN : \n'   
                  str(x['RRN_BACK'])), axis=1),
                  df['EVENT_DTL']))

Output:

0    1. NAME :\nJOHN2. BIRTHDAY :\n8911053. SSN : \...
1    1. NAME :\nSHOWN2. BIRTHDAY :\n7911343. SSN : ...
2    1. NAME :\nBROWN2. BIRTHDAY :\n5811043. SSN : ...
3    1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0...
4    1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : ...
Name: EVENT_DTL, dtype: object

CodePudding user response：

Solution without apply:

df = pd.DataFrame({'col1': ['JOHN', 'SHOWN', 'BROWN'], 'col2': [10, 20, 30], 'col3': [None, None, 'other text']})
idx = df.col3.isna()
df.loc[idx, 'col3'] = '1. Name :\n'   df.loc[idx, 'col1']   ', 2. BIRTHDAY :\n'   df.loc[idx, 'col2'].astype('str')
print(df)

    col1  col2                                 col3
0   JOHN    10   1. Name :\nJOHN, 2. BIRTHDAY :\n10
1  SHOWN    20  1. Name :\nSHOWN, 2. BIRTHDAY :\n20
2  BROWN    30                           other text