Data Frame contains two columns.
| Extraction | Actual |
| -------------------------------- | --------- |
| [1_CHECK_90,2_SAVE_43,3_GO_56] | 2_SAVE |
| [1_FIN_54,2_CHECK_22] | 1_FIN_54 |
| [1_L_32,2_Y_79,4_X_66] | 2_Y_79 |
| [5_T_88] | NA |
Convert Extraction as Actual by comparing with numbers on left side in Extraction column.
def extract_actual(row):
try:
a =[]
for i in row['Extraction']:
for j in i:
for k in j.split("_"):
# print(k)
for l in row['Actual']:
if k == l:
a.append(j)
return a
except:
a =[]
return a
I tried using above function. It's working fine but for Actual='NA' that was not returning none.
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'Extraction': [['1_CHECK', '2_SAVE', '3_GO'],
['1_FIN', '2_CHECK'], ['1_L', '2_Y', '4_X'], ['5_T']],
'Actual': ['2_SAVE', '1_FIN', '2_Y', None]})
def extract_actual(row):
for i in row['Extraction']:
if i == row['Actual']:
return i
return None
df['Extraction'] = df.apply(extract_actual, axis=1)
df.rename(columns={'Extraction':'Extraction_Actual'}, inplace=True)
df.drop(columns=['Actual'], inplace=True)
CodePudding user response:
Can you try this,
import pandas as pd
df = pd.DataFrame({'Extraction': [['1_CHECK', '2_SAVE', '3_GO'],
['1_FIN', '2_CHECK'], ['1_L', '2_Y', '4_X'], ['5_T']],
'Actual': ['2_SAVE', '1_FIN', '2_Y', None]})
# get equal values index true false values
tFdf = df[df.columns.difference(["Expected"])].eq(df["Actual"], axis=0)
# Assign matched values
df["Extraction"].loc[tFdf["Actual"]] = df["Actual"][tFdf["Actual"]]