Home > Back-end >  How can I determine that the value of column1 in df1 contains in column1 in df2 starting strictly fr
How can I determine that the value of column1 in df1 contains in column1 in df2 starting strictly fr

Time:12-29

I have 2 dataframes:

df1=pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '478615454']})
df2=pd.DataFrame({'num': ['12345', '54865', '5464','7861']})

My question how modify the following function

df2['yes/no'] = df2['num'].apply(lambda x: df1['name'][df1['name'].astype(str).str.contains(str(x))].values[0] if len(df1['name'][df1['name'].astype(str).str.contains(str(x))].values)> 0 else 0)

the value of column1 in df1 contains in column1 in df2 starting strictly from the left (that function works in any variations)

Desirable result:

name yes/no
12345 123456789
54865 5486544546
5464 0
7861 0

Wrong result:

num yes/no
12345 123456789
54865 5486544546
5464 65464564
7861 478615454

CodePudding user response:

We combine (concat) two frames to get a temporary frame with two columns, which we then process line by line, checking if one text begins with another. If yes, return the value from the column, if not - 0

df1 = pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '478615454']})
df2 = pd.DataFrame({'num': ['12345', '54865', '5464', '7861']})
df2['yes/no'] = pd.concat([df2, df1], ignore_index=True, axis=1).apply(lambda x: x.iat[1] if x.iat[1].startswith(x.iat[0]) else 0, axis=1)
print(df2)
     num      yes/no
0  12345   123456789
1  54865  5486544546
2   5464           0
3   7861           0

CodePudding user response:

if i get your point you need that :

l=[]
for i in range(len(df1)):
    if str(df1['name'][i]).startswith(str(df2['num'][i])):
        l.append(df1['name'][i])
    else:l.append(0)
df2['yes/no']=l

out:

 num      yes/no
0  12345   123456789
1  54865  5486544546
2   5464           0
3   7861           0

CodePudding user response:

Since you want to match in the starting you need to use startswith function.

Code:

#df1=pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '478615454']})
df1=pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '123456789']})
df2=pd.DataFrame({'num': ['12345', '54865', '5464','7861']})
df2["yes/no"] = df2['num'].apply(lambda x: (df1['name'][df1['name'].astype(str).str.startswith(x)].to_list() or [0])[0])

Explanation:

df1['name'][df1['name'].astype(str).str.startswith(x)].to_list()

will create a list of value where df1 start with df2 value.

(df1['name'].astype(str).str.startswith(x)].to_list() or [0])[0]

if no values got we return a list with 0 i.e. [0] then retrieve the first element. Which will be the value if it matches or 0

Output:

0     123456789
1    5486544546
2             0
3             0
Name: yes/no, dtype: object
  • Related