Delete Space based on Conditions-CodePudding

I have a dataframe but there are words containing a lot of space, for example

people = pd.DataFrame({'id': [1,2,3,4,5],
                      'name': ['THOMAS BETA EDIMON','MIKHAIL LAUNDRUPP','M A R C U S','ANTONIO','R O B E R T S O N']})

How can I remove space if there are more than 3 or 4 spaces in the data, so the result will be like this

id name
1  THOMAS BETA EDIMON
2  MIKHAIL LAUNDRUPP
3  MARCUS
4  ANTONIO
5  ROBERTSON

CodePudding user response：

Example

df = pd.DataFrame({'id': [1,2,3,4,5],
                      'name': ['THOMAS BETA EDIMON','MIKHAIL LAUNDRUPP','M A R C U S','ANTONIO','R O B E R T S O N']})

df

    id  name
0   1   THOMAS BETA EDIMON
1   2   MIKHAIL LAUNDRUPP
2   3   M A R C U S
3   4   ANTONIO
4   5   R O B E R T S O N

Code

remove all space if all of space is more than 3.

cond1 = df['name'].str.count(' ').gt(3)
df['name'].mask(cond1, df['name'].str.replace(' ', ''))

result:

0    THOMAS BETA EDIMON
1     MIKHAIL LAUNDRUPP
2                MARCUS
3               ANTONIO
4             ROBERTSON
Name: name, dtype: object

make result to name column

df.assign(name=df['name'].mask(cond1, df['name'].str.replace(' ', '')))

desired output:

    id  name
0   1   THOMAS BETA EDIMON
1   2   MIKHAIL LAUNDRUPP
2   3   MARCUS
3   4   ANTONIO
4   5   ROBERTSON

CodePudding user response：

Try applying this function on your dataframe. It checks to see if all the substrings are a single character, and if so, joins them together.

def remove_spaces(name: str) -> str:
    pieces: list[str] = name.split()
    it: Iterator[str] = iter(pieces)
    the_len = 1
    if all(len(l) == the_len for l in it):
        return "".join(pieces)
    return name

usage:

people.name = people.name.apply(remove_spaces)

Output:

   id                name
0   1  THOMAS BETA EDIMON
1   2   MIKHAIL LAUNDRUPP
2   3              MARCUS
3   4             ANTONIO
4   5           ROBERTSON