Hi there stack overflow community,
I have the following dataframe in an excel:
sparte sparten status stati gesellschaft gesellschaften
10 Krankenvoll B beantragt 0 - Allgemein -
11 Reisekranken A aktiv 10000 nordinvest
12 Krankenkasse N beitragsfrei M552D SV SparkassenVersicherung
and the folliwing column for merging in a csv:
sparten status gesellschaft
10 B 0
11 A 10000
12 N M552D
to merge some columns from an excel and a csv file I'm using the following code:
df1 = pd.read_csv(r'path', sep=',').drop(columns = ['risiko'])
df2 = pd.read_excel(r'path')
df3 = pd.merge(df1,df2[['status','stati']],on='status', how='left').drop(columns = ['status'])
df4 = df3.merge(df2[['sparte','sparten']],on='sparte', how='left').drop(columns = ['sparte'])
It works fine for me, but now i want to me merge the following column:
df4 = df3.merge(df2[['gesellschaft','gesellschaften']],on='gesellschaft', how='left')
print(df4)
...and it does not work. It merges only the cells with this format M552D
, but leaves the cells with numbers untouched. I don't understand what I'm doing wrong. If I try to put how='right'
the merge works, but the other columns disappear.
Maybe someone has an idea what is happening here! Thanks for any hint!
CodePudding user response:
The problem is that the geselschaft
column contains only strings in df1
which is loaded with read_csv
, because the column is not fully numeric. But in df2
which is loaded with read_excel
, it contains a mix of int and string values. And at Pandas level and int and a string cannot be equal.
A possible workaround is to force a string conversion at merge time:
df4 = df3.merge(df2[['gesellschaft','gesellschaften']], left_on='gesellschaft',
right_on = df2['gesellschaft'].astype('str'), how='left')