My question is very simple and I tried all possible SO answers before asking it.
I have a dataframe
{'article_journal': {0: 'lipids', 1: 'nature reviews cardiology', 2: 'clinical research
cardiology supplements', 3: 'jacc basic to translational science', 4: 'clinical research
cardiology supplements', 5: 'diabetes obesity metabolism', 6: 'journal'}, 'Title': {0:
'chemistry physics lipids', 1: 'nature reviews cardiology', 2: 'clinical research
cardiology', 3: 'jacc basic to translational science', 4: 'clinical research cardiology', 5:
'diabetes obesity metabolism', 6: 'journal'}}
There are extra white spaces in almost all strings. I checked it in Excel.
I print only those where values are identical
df[df.apply(pd.Series.nunique, axis = 1) == 1]
and receive only one row, which I created myself for testing purposes
article_journal Title
6 journal journal
How to trim white spaces in similar strings?
Expected output
article_journal Title
1 nature reviews cardiology nature reviews cardiology
3 jacc basic to translational science jacc basic to translational science
5 diabetes obesity metabolism diabetes obesity metabolism
6 journal journal
I tried all possible questions with str.strip()
, but nothing works. Columns are objects. What am I missing? Thanks!
CodePudding user response:
You can replace the double space with a single space in the entire dataframe and also trim the leading and trailing space(s) if they exist:
df = df.replace('\s ',' ', regex=True) # remove double spacing in strings
df = df.replace('^\s |\s $','', regex=True) # remove leading or trailing spaces