Home > Software engineering >  Python Pandas: How to only str.lower() Rows or Cells that contain a certain string
Python Pandas: How to only str.lower() Rows or Cells that contain a certain string

Time:09-23

at the moment I am setting my whole "name" column to lowercase with this line.

df["name"] = df["name"].str.lower()

My problem is however, that I only want to str.lower() the cells that do not contain the string "Foo" in them.

I tried the following but it doesn't work:

df["name"] = df["name"](lambda x: str(x) if "Foo" in str(x) else str.lower(x))

TypeError: 'Series' object is not callable

CodePudding user response:

Use .loc:

df.loc[~ df['name'].str.contains('Foo'), 'name'] = 
    df.loc[~ df['name'].str.contains('Foo'), 'name'].str.lower()

CodePudding user response:

np.where function can also work here, as below:

import numpy as np
df['name'] = np.where(df['name'].str.contains('Foo'),df['name'],df['name'].str.lower())

CodePudding user response:

In your solution missing Series.apply:

df["name"] = df["name"].apply(lambda x: str(x) if "Foo" in str(x) else str.lower(x))

Or use Series.str.contains and apply lowercase only for filtered rows with ~ for test NOT match mask:

m = df["name"].str.contains('Foo')
df.loc[~m, "name"] = df.loc[~m, "name"].str.lower()

CodePudding user response:

df[~df["name"].str.contains("Foo")]['name'].str.lower()

The term df[~df["name"].str.contains("Foo")] will return the DataFrame without Foo in column name. After that, we lower the column name by the remain term.

CodePudding user response:

List comprehension also works as below:

df['name'] = [str.lower(x) if x != 'Foo' else x for x in df['name']]

I've timed the other methods on a list of 300 names (of which 100 are Foo) 1000 times each and roughly timed it (I know there are better timing methods, but I used time.time() between for loops). These are the timings in secconds below that I found for these methods:

list-comprehension : 0.049
.apply : 0.179
np.where : 0.349
.loc : 0.880

It may be that some of these methods would perform better on much bigger data sets than 300, so I understand that this is not a vigorous test. Others may be able to better say.

  • Related