How to remove excess spaces in-between words in dataframe index?-CodePudding

The index of my df are strings of company names. Eg Wells Fargo

Sometimes there are excess spaces in-between the words I want to convert to only single spaces. I tried the below but got errors.

**TypeError: expected string or bytes-like object**

df.index=re.sub('  ', ' ', df.index.astype('str').str.strip())


**AttributeError: 'Index' object has no attribute 'apply'**

df.index=df.index.astype('str').str.strip().apply(lambda x: re.sub('  ', ' ', x))

Input df

                  | Revenue |
Wells   Fargo     | 1       |
  Bank of American| 3       |

Desired output

                | Revenue |
Wells Fargo     | 1       |
Bank of American| 3       |

CodePudding user response：

df.index = df.index.str.replace(r'\s ', ' ', regex=True).str.strip()

In your first attempt, you are trying to pass a Pandas Index of strings to re.sub, which takes a string.

apply would work if the company names were stored as a data frame column. However as the error message says, apply is not implemented for the index.

CodePudding user response：

Use str.split() on string and then df.rename on index. See each step below.

import pandas as pd 

# making your df 
d = {'index':['Wells     Fargo'], 'col1':[123], 'col2':[123]}
df = pd.DataFrame(d)
df = df.set_index('index')

# get list of index strings
index_str_list = [strings for strings in df.index]

# format spaces and append to new list 
new_list = []
for i in index_str_list:
    s1,s2 = i.split()
    s = "{:6}{:}".format(s1,s2) # set your distance 
    new_list.append(s)

# change index value
for old,new in zip(index_str_list, new_list):
    df.rename(index={old:new}, inplace=True)

print(df)

Output:
             col1  col2
index                  
Wells Fargo   123   123