'TypeError: expected string or bytes-like object' while trying to replace consecutive whit-CodePudding

I have a DataFrame where every entry is a string value and a given entry may contain consecutive white spaces. For example:

import re
import pandas as pd
df = pd.DataFrame({'col1':['a--b','c  d'], 'col2':['e   f','g---h']})
print(df)

Output of print(df) (this is the initial df):

   col1   col2
0  a--b  e   f
1  c  d  g---h

I want to replace any consecutive white spaces with a single space in all the entries of df. So in this example, 'c d' (with two consecutive white spaces) should be replaced with 'c d', and 'e f' (with three consecutive white spaces) should be replaced with 'e f'.

Approach 1: I get the correct result using df.replace, like so

# Approach 1 - works fine
df = df.replace('\s ', ' ', regex = True)
print(df)

Output of print(df) (this is the correct result expected):

   col1   col2
0  a--b    e f
1   c d  g---h

Approach 2: However, I get TypeError: expected string or bytes-like object while using df.transform, like so

# Approach 2 - gives TypeError
df = df.transform(lambda s: re.sub('\s ', ' ', s))
print(df)

Output:

...
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

Approach 3: I get ValueError: Transform function failed if I do

# Approach 3 - gives ValueError
df = df.transform(lambda s: ' '.join(s.split()))
print(df)

Output:

...
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/apply.py", line 227, in transform
    raise ValueError("Transform function failed") from err
ValueError: Transform function failed

So where am I going wrong with the Approach 2 and 3? Asking because the df.transform seems more powerful for transforming each cell in a DataFrame and will need that in my project for more complex transformations. Thank you!

CodePudding user response：

You need DataFrame.applymap for element wise processing, because both function working with scalars:

df = df.applymap(lambda s: re.sub('\s ', ' ', s))
print(df)
   col1   col2
0  a--b    e f
1   c d  g---h

df = df.applymap(lambda s: ' '.join(s.split()))
print(df)
   col1   col2
0  a--b    e f
1   c d  g---h

Method DataFrame.transform processing columns like Series, so it failed.

You can rewrite second solution with Series.str.split and Series.str.join for processing columns (Series):

def f(x):
    #test - processing column
    #print (x)
    return x.str.split().str.join(' ')

df = df.transform(f)
print (df)

   col1   col2
0  a--b    e f
1   c d  g---h