I have a dataframe but there are words containing a lot of space, for example
people = pd.DataFrame({'id': [1,2,3,4,5],
'name': ['THOMAS BETA EDIMON','MIKHAIL LAUNDRUPP','M A R C U S','ANTONIO','R O B E R T S O N']})
How can I remove space if there are more than 3 or 4 spaces in the data, so the result will be like this
id name
1 THOMAS BETA EDIMON
2 MIKHAIL LAUNDRUPP
3 MARCUS
4 ANTONIO
5 ROBERTSON
CodePudding user response:
Example
df = pd.DataFrame({'id': [1,2,3,4,5],
'name': ['THOMAS BETA EDIMON','MIKHAIL LAUNDRUPP','M A R C U S','ANTONIO','R O B E R T S O N']})
df
id name
0 1 THOMAS BETA EDIMON
1 2 MIKHAIL LAUNDRUPP
2 3 M A R C U S
3 4 ANTONIO
4 5 R O B E R T S O N
Code
remove all space if all of space is more than 3.
cond1 = df['name'].str.count(' ').gt(3)
df['name'].mask(cond1, df['name'].str.replace(' ', ''))
result:
0 THOMAS BETA EDIMON
1 MIKHAIL LAUNDRUPP
2 MARCUS
3 ANTONIO
4 ROBERTSON
Name: name, dtype: object
make result to name
column
df.assign(name=df['name'].mask(cond1, df['name'].str.replace(' ', '')))
desired output:
id name
0 1 THOMAS BETA EDIMON
1 2 MIKHAIL LAUNDRUPP
2 3 MARCUS
3 4 ANTONIO
4 5 ROBERTSON
CodePudding user response:
Try applying this function on your dataframe. It checks to see if all the substrings are a single character, and if so, joins them together.
def remove_spaces(name: str) -> str:
pieces: list[str] = name.split()
it: Iterator[str] = iter(pieces)
the_len = 1
if all(len(l) == the_len for l in it):
return "".join(pieces)
return name
usage:
people.name = people.name.apply(remove_spaces)
Output:
id name
0 1 THOMAS BETA EDIMON
1 2 MIKHAIL LAUNDRUPP
2 3 MARCUS
3 4 ANTONIO
4 5 ROBERTSON