I am working the data provided here https://www.opengeodata.nrw.de/produkte/transport_verkehr/unfallatlas/
I am trying to create a concatenated string like this
import geopandas
accidents2020 = gp.read_file("Unfallorte2020_LinRef.shp")
accidents2020['joined'] = f"{accidents2020['ULAND']}{accidents2020['UREGBEZ']}{accidents2020['UKREIS']}{accidents2020['UGEMEINDE']}"
However, this gives me some weird string
0 0 12\n1 12\n2 12\n3 ...
1 0 12\n1 12\n2 12\n3 ...
2 0 12\n1 12\n2 12\n3 ...
3 0 12\n1 12\n2 12\n3 ...
4 0 12\n1 12\n2 12\n3 ...
which is unexpected. When I do a accidents2020['ULAND']
0 12
1 12
2 12
3 12
4 12
there are no \n1
. Where are the \n1
etc. coming from?
CodePudding user response:
accidents2020['ULAND']
is a Series, if you convert this Series to a string, it also includes the index and the linefeeds at the end of each line:
print(repr(f"{accidents2020.loc[0:1, 'ULAND']}"))
# '0 12\n1 12\nName: ULAND, dtype: object'
print(f"{accidents2020.loc[0:1, 'ULAND']}")
# 0 12
# 1 12
# Name: ULAND, dtype: object
When I do a
accidents2020['ULAND']
there are no\n1
.
No, they are there - you simply don't see them as \n
representations but as linefeeds in the output.
Where are the
\n1
etc. coming from?
\n
is the newline character and 1
is the row index.
So what you need is simply accidents2020['joined'] = accidents2020['ULAND'] accidents2020['UREGBEZ'] accidents2020['UKREIS'] accidents2020['UGEMEINDE']
, without any f strings.
An alternative is cat
where you can optionally specify a separator: accidents2020['joined'] = accidents2020['ULAND'].str.cat(accidents2020[['UREGBEZ', 'UKREIS', 'UGEMEINDE']])