I have the following dataframe:
df = pd.DataFrame([['50030', '36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015', '3159'],
['50030', '36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015', '3994'],
['50030', '36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015', '5401'],
['50030', '26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401', '3159'],
['50030', '26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401', '3994'],
['50030', '26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401', '5401']],
columns = ["WKEY", "Description", "DEPTH"])
I want to compare the value in the DEPTH column (which is a string value) to the string in the Description column only for the same row. A new column called "Compare" would have a yes or no depending on if it exists.
It should be Yes, No, No, No, Yes, Yes.
The ultimate plan is to delete all rows which do not have the value in the DEPTH column also listed somewhere in the Description column in the same row.
I feel like I'm one step away from getting this to work, so any direction would be appreciated.
Thanks!
CodePudding user response:
In your case
df['Compare'] = df.apply(lambda x: 'Yes' if x['DEPTH'] in x['Description'] else 'No',axis=1)
df
Out[133]:
WKEY Description DEPTH Compare
0 50030 36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015 3159 Yes
1 50030 36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015 3994 No
2 50030 36 @ 3159 W/270, LWD[GR,RES,PWD] @ 4015 5401 No
3 50030 26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401 3159 No
4 50030 26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401 3994 Yes
5 50030 26 @ 3994, LWD[GR,RES,PWD] @ 5430, 20 @ 5401 5401 Yes