I have the following dataframe:
Senior | Location |
---|---|
False | Warszawa |
True | Warszawa\n 1 |
I try to remove that "\n 1", which looks like a hidden character to me. At first, I tried with:
df['Location']=df['Location'].str.replace('Warszawa\n 1','Warszawa')
but nothing happened.
I managed to remove those characters manually, with a long row of splits and replaces, but it is not a viable solution, because it gives me some weird results in subsequent part of the program: although I have "Warszawa" in both rows of the df, they are treated as being two different locations, although there is only one location.
What I want is this:
Senior | Location |
---|---|
False | Warszawa |
True | Warszawa |
How can I correctly remove that "\n 1"? And what character is it?
CodePudding user response:
df['Location'] = df['Location'].str.replace(r'Warszawa\n 1','Warszawa', regex = False)
CodePudding user response:
When using str.replace()
the regex parameter is set to True
by default. Since you just want to replace the literal string you either want to do what @Amir Py has done and turn regex=False
or you can use the replace()
method and do an inplace literal string replacement. The regex parameter is replace()
is set to False
by default.
Code:
df['Location'].replace('Warszawa\n 1', 'Warszawa', inplace=True)
It can also be useful if you have other similar issues in other columns of your dataframe. For more information there is a great question and answer on stack: str.replace v replace