Home > Software design >  removing of unnecessary spaces in text
removing of unnecessary spaces in text

Time:05-22


Could you please shed some light on this?
The spaces are not dealt properly (test C and E) and I don't understand what is wrong.
Thanks a lot.

foo={'testing':['this    is test A','  this is test B',' this is test C ','   this is test D','   this is test E  ']}
foo=pd.DataFrame(foo,columns=['testing']) 
print("Before:")
print(foo,"\n")
foo.replace(r'\s ', ' ', regex=True,inplace=True)
print("After:")
print(foo)

Before:
               testing
0    this    is test A
1       this is test B
2      this is test C 
3       this is test D
4     this is test E   

After:
            testing
0    this is test A
1    this is test B
2   this is test C 
3    this is test D
4   this is test E 

CodePudding user response:

It's probably easier to process the dictionary before constructing the dataframe. You also need to account for leading space in any of the strings.

import pandas as pd
import re

foo={'testing':['this    is test A','  this is test B',' this is test C ','   this is test D','   this is test E  ']}

foo['testing'] = [re.sub('\s ', ' ', s.strip()) for s in foo['testing']]

foo = pd.DataFrame(foo, columns=['testing'])

print(foo)

Output:

          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

CodePudding user response:

# remove leading and trailing space first; then use regex to replace space inside the strings
foo['testing'] = foo['testing'].str.strip().str.replace(r'\s ', ' ', regex=True)
print(foo)
          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

CodePudding user response:

You can do it without regex:

foo["testing"] = foo["testing"].str.split().str.join(" ")
print(foo)

Prints:

          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E
  • Related