I am trying to reverse engineer an error message.
In the code below I merge 2 dataframes.
import pandas as pd
data = pd.DataFrame({
'PB website': ["http://www.ghi.de", "http://www.jkl.de", "http://www.def.de", "http://www.abc.de", "http://www.xyz.de"],
'PB match': [21, 22, 23, 24, 25],
'PB location': ["Süd 4", "Süd 2", "Süd 5", "Süd 3", "Süd 8"],
'PB country': ['Deutschland', 'Deutschland', 'Deutschland', 'Deutschland', 'Deutschland'],
})
processed_urls = ['http://www.abc.de', 'http://www.def.de', 'http://www.ghi.de', 'http://www.xyz.de', 'http://www.jkl.de']
flags = [False, True, True, False, True]
processed = pd.merge(left=data.loc[data['PB website'].isin(processed_urls)],
right=pd.DataFrame({'url': processed_urls, 'verlinkt': flags}),
left_on='PB website', right_on='url', how='right')
processed
The result looks like this:
PB website PB match PB location PB country url verlinkt
0 http://www.abc.de 24 Süd 3 Deutschland http://www.abc.de False
1 http://www.def.de 23 Süd 5 Deutschland http://www.def.de True
2 http://www.ghi.de 21 Süd 4 Deutschland http://www.ghi.de True
3 http://www.xyz.de 25 Süd 8 Deutschland http://www.xyz.de False
4 http://www.jkl.de 22 Süd 2 Deutschland http://www.jkl.de True
Now I want to change the code in a way that I get the following error message:
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
I know, in order to do so, PB website
and url
have to have different format. But for some reason I can not generate the ValueError
mentioned above.
I am using pandas version 1.4.3
CodePudding user response:
This is impossible - neither PB website
nor url
can be represented as floats, except of course as just NaNs
. In this case you can use
processed = pd.merge(left=data.loc[data['PB website'].isin(processed_urls)],
right=pd.DataFrame({'url': processed_urls, 'verlinkt': flags}).assign(url=lambda x: pd.to_numeric(x.url, 'coerce')),
left_on='PB website', right_on='url', how='right')
which throws a ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
.