hello i have a dataframe :
name; id ; firstname ;lastname
MD ALEXIA DORTMINEX ; 1; ALEXIA ; DORTMINEX
DOC PAULO RODRIGEZ ; 3 ; PAOLO ; SANCHEZ
i want to keep only rows if name contains lastname (i.e lastname is in name)
in our case , we keep only:
name; id ; firstname ;lastname
MD ALEXIA DORTMINEX ; 1; ALEXIA ; DORTMINEX
because DORTMINEX is in MD ALEXIA DORTMINEX
thnks
CodePudding user response:
You can use apply
and slicing:
df[df.apply(lambda r: r['lastname'] in r['name'], axis=1)]
output:
name id firstname lastname
0 MD ALEXIA DORTMINEX 1 ALEXIA DORTMINEX
CodePudding user response:
You can check whether your lastname column contains your name column using a list
comprehension which will return a boolean (True / False
). Placing it within loc
will filter your dataframe using the resulting boolean, which will give you what you require:
>>> [name[0] in name[1] for name in zip(df['lastname'], df['name'])]
[True, False]
>>> df.loc[[name[0] in name[1] for name in zip(df['lastname'], df['name'])]]
name id firstname lastname
0 MD ALEXIA DORTMINEX 1 ALEXIA DORTMINEX
CodePudding user response:
You can check for each row that lastname is in name with the apply()
function and then filter your data using this mask.
As follows:
mask = df.apply(lambda x: x['lastname'] in x['name'], axis=1)
df = df[mask]
This will Output:
name id firstname lastname
0 MD ALEXIA DORTMINEX 1 ALEXIA DORTMINEX