Home > Blockchain >  Unexpected result when concatenating strings of cells in (geo)pandas
Unexpected result when concatenating strings of cells in (geo)pandas

Time:12-02

I am working the data provided here https://www.opengeodata.nrw.de/produkte/transport_verkehr/unfallatlas/

I am trying to create a concatenated string like this

import geopandas
accidents2020 = gp.read_file("Unfallorte2020_LinRef.shp")
accidents2020['joined'] = f"{accidents2020['ULAND']}{accidents2020['UREGBEZ']}{accidents2020['UKREIS']}{accidents2020['UGEMEINDE']}"

However, this gives me some weird string

0         0         12\n1         12\n2         12\n3   ...
1         0         12\n1         12\n2         12\n3   ...
2         0         12\n1         12\n2         12\n3   ...
3         0         12\n1         12\n2         12\n3   ...
4         0         12\n1         12\n2         12\n3   ...

which is unexpected. When I do a accidents2020['ULAND']

0         12
1         12
2         12
3         12
4         12

there are no \n1. Where are the \n1 etc. coming from?

CodePudding user response:

accidents2020['ULAND'] is a Series, if you convert this Series to a string, it also includes the index and the linefeeds at the end of each line:

print(repr(f"{accidents2020.loc[0:1, 'ULAND']}"))
# '0    12\n1    12\nName: ULAND, dtype: object'

print(f"{accidents2020.loc[0:1, 'ULAND']}")
# 0    12
# 1    12
# Name: ULAND, dtype: object

When I do a accidents2020['ULAND'] there are no \n1.

No, they are there - you simply don't see them as \n representations but as linefeeds in the output.

Where are the \n1 etc. coming from?

\n is the newline character and 1 is the row index.


So what you need is simply accidents2020['joined'] = accidents2020['ULAND'] accidents2020['UREGBEZ'] accidents2020['UKREIS'] accidents2020['UGEMEINDE'], without any f strings.

An alternative is cat where you can optionally specify a separator: accidents2020['joined'] = accidents2020['ULAND'].str.cat(accidents2020[['UREGBEZ', 'UKREIS', 'UGEMEINDE']])

  • Related