at the moment I am setting my whole "name" column to lowercase with this line.
df["name"] = df["name"].str.lower()
My problem is however, that I only want to str.lower() the cells that do not contain the string "Foo" in them.
I tried the following but it doesn't work:
df["name"] = df["name"](lambda x: str(x) if "Foo" in str(x) else str.lower(x))
TypeError: 'Series' object is not callable
CodePudding user response:
Use .loc
:
df.loc[~ df['name'].str.contains('Foo'), 'name'] =
df.loc[~ df['name'].str.contains('Foo'), 'name'].str.lower()
CodePudding user response:
np.where
function can also work here, as below:
import numpy as np
df['name'] = np.where(df['name'].str.contains('Foo'),df['name'],df['name'].str.lower())
CodePudding user response:
In your solution missing Series.apply
:
df["name"] = df["name"].apply(lambda x: str(x) if "Foo" in str(x) else str.lower(x))
Or use Series.str.contains
and apply lowercase only for filtered rows with ~
for test NOT
match mask:
m = df["name"].str.contains('Foo')
df.loc[~m, "name"] = df.loc[~m, "name"].str.lower()
CodePudding user response:
df[~df["name"].str.contains("Foo")]['name'].str.lower()
The term df[~df["name"].str.contains("Foo")]
will return the DataFrame without Foo
in column name
. After that, we lower the column name
by the remain term.
CodePudding user response:
List comprehension also works as below:
df['name'] = [str.lower(x) if x != 'Foo' else x for x in df['name']]
I've timed the other methods on a list of 300 names (of which 100 are Foo) 1000 times each and roughly timed it (I know there are better timing methods, but I used time.time() between for loops). These are the timings in secconds below that I found for these methods:
list-comprehension : 0.049
.apply : 0.179
np.where : 0.349
.loc : 0.880
It may be that some of these methods would perform better on much bigger data sets than 300, so I understand that this is not a vigorous test. Others may be able to better say.