How to remove space between numbers but leave space between names on the same column in a DataFrame-CodePudding

I would like to clean a Dataframe in such a way that only cells that contain numbers will not have empty spaces but cells with names remain the same.

Author
07 07 34    
08 26 20 
08 26 20 
Tata Smith
Jhon Doe
08 26 22
3409243

here is my approach which is failing

df.loc[df["Author"].str.isdigit(), "Author"] = df["Author"].strip()

How can I handle this?

CodePudding user response：

You might want to use regex.

import pandas as pd
import re

# Create a sample dataframe
import io
df = pd.read_csv(io.StringIO('Author\n 07 07 34    \n 08 26 20 \n 08 26 20 \n Tata Smith\n Jhon Doe\n 08 26 22\n 3409243'))

# Use regex
mask = df['Author'].str.fullmatch(r'[\d ]*')
df.loc[mask, 'Author'] = df.loc[mask, 'Author'].str.replace(' ', '')

# You can also do the same treatment by the following line
# df['Author'] = df['Author'].apply(lambda s: s.replace(' ', '') if re.match(r'[\d ]*$', s) else s)

Author
070734
082620
082620
Tata Smith
Jhon Doe
082622
3409243

CodePudding user response：

How about this?

import pandas as pd

df = pd.read_csv('two.csv')

# remove spaces on copy
df['Author_clean'] = df['Author'].str.replace(" ","")

# try conversion to numeric if possible
df['Author_clean'] = df['Author_clean'].apply(pd.to_numeric, errors='coerce')

# fill missing vals with original strings
df['Author_clean'].fillna(df['Author'], inplace=True)

print(df.head(10))

Output:

         Author Author_clean
0  07 07 34          70734.0
1     08 26 20       82620.0
2     08 26 20       82620.0
3    Tata Smith   Tata Smith
4      Jhon Doe     Jhon Doe
5      08 26 22      82622.0
6       3409243    3409243.0