I have very large scale dataframe with 100,000 rows and 300 columns
and I'm trying to fill out Nan rows in one columns by extracting the values from the other columns
here is the example,
let's say we have a sample dataframe such as:
NAME RRN_FRONT RRN_BACK EVENT_DTL
1 JOHN 891105 1067714 Nan
2 SHOWN 791134 1156543 Nan
3 BROWN 581104 1668314 Nan
4 MIKE 984564 0153422 1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0153422
5 LARRY 796515 0168165 1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : 0168165
and I want to fill out Nan values with the NAME, RRN_FRONT, RRN_BACK
Here is the input that I tried:
df.loc[df.EVENT_DTL.isnull(), 'EVENT_DTL'] = df.apply(lambda x: ('1. NAME :\n' str(x['NAME']) '2. BIRTHDAY :\n' str(x['RRN_FRONT']) '3. SSN : \n' str(x['RRN_BACK']
and the output is not what I intended:
1. NAME :\nJOHN2. BIRTHDAY :\n8911053. SSN : \n1067714
2. ...
.
.
5. ...
CodePudding user response:
Pandas.apply applies the operations on axis=0
(index axis) by default, and you need to change the axis=1
in your case:
df['EVENT_DTL'] = (np.where(df['EVENT_DTL'].isna(),
df.apply(lambda x: ('1. NAME :\n' str(x['NAME'])
'2. BIRTHDAY :\n' str(x['RRN_FRONT']) '3. SSN : \n'
str(x['RRN_BACK'])), axis=1),
df['EVENT_DTL']))
Output:
0 1. NAME :\nJOHN2. BIRTHDAY :\n8911053. SSN : \...
1 1. NAME :\nSHOWN2. BIRTHDAY :\n7911343. SSN : ...
2 1. NAME :\nBROWN2. BIRTHDAY :\n5811043. SSN : ...
3 1. Name : MIKE 2. BIRTHDAY : 984564 3. SSN : 0...
4 1. Name : LARRY 2. BIRTHDAY : 796515 3. SSN : ...
Name: EVENT_DTL, dtype: object
CodePudding user response:
Solution without apply:
df = pd.DataFrame({'col1': ['JOHN', 'SHOWN', 'BROWN'], 'col2': [10, 20, 30], 'col3': [None, None, 'other text']})
idx = df.col3.isna()
df.loc[idx, 'col3'] = '1. Name :\n' df.loc[idx, 'col1'] ', 2. BIRTHDAY :\n' df.loc[idx, 'col2'].astype('str')
print(df)
col1 col2 col3
0 JOHN 10 1. Name :\nJOHN, 2. BIRTHDAY :\n10
1 SHOWN 20 1. Name :\nSHOWN, 2. BIRTHDAY :\n20
2 BROWN 30 other text