Could you please shed some light on this?
The spaces are not dealt properly (test C and E) and I don't understand what is wrong.
Thanks a lot.
foo={'testing':['this is test A',' this is test B',' this is test C ',' this is test D',' this is test E ']}
foo=pd.DataFrame(foo,columns=['testing'])
print("Before:")
print(foo,"\n")
foo.replace(r'\s ', ' ', regex=True,inplace=True)
print("After:")
print(foo)
Before:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
After:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
CodePudding user response:
It's probably easier to process the dictionary before constructing the dataframe. You also need to account for leading space in any of the strings.
import pandas as pd
import re
foo={'testing':['this is test A',' this is test B',' this is test C ',' this is test D',' this is test E ']}
foo['testing'] = [re.sub('\s ', ' ', s.strip()) for s in foo['testing']]
foo = pd.DataFrame(foo, columns=['testing'])
print(foo)
Output:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
CodePudding user response:
# remove leading and trailing space first; then use regex to replace space inside the strings
foo['testing'] = foo['testing'].str.strip().str.replace(r'\s ', ' ', regex=True)
print(foo)
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
CodePudding user response:
You can do it without regex:
foo["testing"] = foo["testing"].str.split().str.join(" ")
print(foo)
Prints:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E