Home > other >  How to get a ValueError from merge with pandas dataframes
How to get a ValueError from merge with pandas dataframes

Time:07-06

I am trying to reverse engineer an error message.

In the code below I merge 2 dataframes.

import pandas as pd

data = pd.DataFrame({
     'PB website': ["http://www.ghi.de", "http://www.jkl.de", "http://www.def.de", "http://www.abc.de", "http://www.xyz.de"],
     'PB match': [21, 22, 23, 24, 25],
     'PB location': ["Süd 4", "Süd 2", "Süd 5", "Süd 3", "Süd 8"],
     'PB country': ['Deutschland', 'Deutschland', 'Deutschland', 'Deutschland', 'Deutschland'],
     })

processed_urls = ['http://www.abc.de', 'http://www.def.de', 'http://www.ghi.de', 'http://www.xyz.de', 'http://www.jkl.de']
flags = [False, True, True, False, True]

processed = pd.merge(left=data.loc[data['PB website'].isin(processed_urls)],
                    right=pd.DataFrame({'url': processed_urls, 'verlinkt': flags}),
                    left_on='PB website', right_on='url', how='right')

processed

The result looks like this:

    PB website     PB match PB location  PB country         url         verlinkt
0   http://www.abc.de   24     Süd 3    Deutschland    http://www.abc.de    False
1   http://www.def.de   23     Süd 5    Deutschland    http://www.def.de    True
2   http://www.ghi.de   21     Süd 4    Deutschland    http://www.ghi.de    True
3   http://www.xyz.de   25     Süd 8    Deutschland    http://www.xyz.de    False
4   http://www.jkl.de   22     Süd 2    Deutschland    http://www.jkl.de    True

Now I want to change the code in a way that I get the following error message:

ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat

I know, in order to do so, PB website and url have to have different format. But for some reason I can not generate the ValueError mentioned above.

I am using pandas version 1.4.3

CodePudding user response:

This is impossible - neither PB website nor url can be represented as floats, except of course as just NaNs. In this case you can use

processed = pd.merge(left=data.loc[data['PB website'].isin(processed_urls)],
                    right=pd.DataFrame({'url': processed_urls, 'verlinkt': flags}).assign(url=lambda x: pd.to_numeric(x.url, 'coerce')),
                    left_on='PB website', right_on='url', how='right')

which throws a ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat.

  • Related