Home > Software design >  How to remove excess spaces in-between words in dataframe index?
How to remove excess spaces in-between words in dataframe index?

Time:01-03

The index of my df are strings of company names. Eg Wells Fargo

Sometimes there are excess spaces in-between the words I want to convert to only single spaces. I tried the below but got errors.

**TypeError: expected string or bytes-like object**

df.index=re.sub('  ', ' ', df.index.astype('str').str.strip())


**AttributeError: 'Index' object has no attribute 'apply'**

df.index=df.index.astype('str').str.strip().apply(lambda x: re.sub('  ', ' ', x))

Input df

                  | Revenue |
Wells   Fargo     | 1       |
  Bank of American| 3       |

Desired output

                | Revenue |
Wells Fargo     | 1       |
Bank of American| 3       |

CodePudding user response:

df.index = df.index.str.replace(r'\s ', ' ', regex=True).str.strip()

In your first attempt, you are trying to pass a Pandas Index of strings to re.sub, which takes a string.

apply would work if the company names were stored as a data frame column. However as the error message says, apply is not implemented for the index.

CodePudding user response:

Use str.split() on string and then df.rename on index. See each step below.

import pandas as pd 

# making your df 
d = {'index':['Wells     Fargo'], 'col1':[123], 'col2':[123]}
df = pd.DataFrame(d)
df = df.set_index('index')

# get list of index strings
index_str_list = [strings for strings in df.index]

# format spaces and append to new list 
new_list = []
for i in index_str_list:
    s1,s2 = i.split()
    s = "{:6}{:}".format(s1,s2) # set your distance 
    new_list.append(s)

# change index value
for old,new in zip(index_str_list, new_list):
    df.rename(index={old:new}, inplace=True)

print(df)
Output:
             col1  col2
index                  
Wells Fargo   123   123
  • Related