I have 2 dataframes:
df1=pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '478615454']})
df2=pd.DataFrame({'num': ['12345', '54865', '5464','7861']})
My question how modify the following function
df2['yes/no'] = df2['num'].apply(lambda x: df1['name'][df1['name'].astype(str).str.contains(str(x))].values[0] if len(df1['name'][df1['name'].astype(str).str.contains(str(x))].values)> 0 else 0)
the value of column1 in df1 contains in column1 in df2 starting strictly from the left (that function works in any variations)
Desirable result:
name | yes/no |
---|---|
12345 | 123456789 |
54865 | 5486544546 |
5464 | 0 |
7861 | 0 |
Wrong result:
num | yes/no |
---|---|
12345 | 123456789 |
54865 | 5486544546 |
5464 | 65464564 |
7861 | 478615454 |
CodePudding user response:
We combine (concat
) two frames to get a temporary frame with two columns, which we then process line by line, checking if one text begins with another. If yes, return the value from the column, if not - 0
df1 = pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '478615454']})
df2 = pd.DataFrame({'num': ['12345', '54865', '5464', '7861']})
df2['yes/no'] = pd.concat([df2, df1], ignore_index=True, axis=1).apply(lambda x: x.iat[1] if x.iat[1].startswith(x.iat[0]) else 0, axis=1)
print(df2)
num yes/no
0 12345 123456789
1 54865 5486544546
2 5464 0
3 7861 0
CodePudding user response:
if i get your point you need that :
l=[]
for i in range(len(df1)):
if str(df1['name'][i]).startswith(str(df2['num'][i])):
l.append(df1['name'][i])
else:l.append(0)
df2['yes/no']=l
out:
num yes/no
0 12345 123456789
1 54865 5486544546
2 5464 0
3 7861 0
CodePudding user response:
Since you want to match in the starting you need to use startswith function.
Code:
#df1=pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '478615454']})
df1=pd.DataFrame({'name': ['123456789', '5486544546', '65464564', '123456789']})
df2=pd.DataFrame({'num': ['12345', '54865', '5464','7861']})
df2["yes/no"] = df2['num'].apply(lambda x: (df1['name'][df1['name'].astype(str).str.startswith(x)].to_list() or [0])[0])
Explanation:
df1['name'][df1['name'].astype(str).str.startswith(x)].to_list()
will create a list of value where df1 start with df2 value.
(df1['name'].astype(str).str.startswith(x)].to_list() or [0])[0]
if no values got we return a list with 0 i.e. [0] then retrieve the first element. Which will be the value if it matches or 0
Output:
0 123456789
1 5486544546
2 0
3 0
Name: yes/no, dtype: object