Home > Software design >  How to lowercase all values in dataframe except specific column
How to lowercase all values in dataframe except specific column

Time:02-24

I've been searching around for a while now, but I can't seem to find the answer to this small problem.

I have this code to make a function for lowercase values:

df = {'name':['AL', 'EL', 'NAILA', 'DORI', 'KAKAEKA', 'GENTA', 'RUBY'],
      'living':['lagoa','sangiang','penjaringan','warakas','jonggol','cikarang', 'cikarang'],
      'food':['PIZZA','MEATBALL','CHICKEN','CAKE','CAKE','ONION','NOODLE'],
      'sub':['KOTA','KAB','WILAYAH','KAB','DAERAH','KOTA','WILAYAH'],
      'job':['Chef','Teacher','Police','Doctor','Students','Programmer','Lecturer'],
      'side_job':['Designer','Nurse','Designer','Programmer','Programmer','Teacher','Mentor'],
      'status':['Single','Single','Married','Single','Single','Divorced','Married'],
      'age':[20,25,20,18,25,40,37]
}

df = pd.DataFrame(df)

def content_consistent(df):
    cols = df.select_dtypes(object).columns
    df[cols] = df[cols].apply(lambda x: x.str.lower())
    return df

df = content_consistent(df)

the result shows all values to be lowercase, but what I want is some columns not to be lowercase like 'sub' and 'status' columns

But I am actually expecting this output with the simple code not use looping

    name       living        food       sub      job        side_job    status     age
0   al         lagoa         pizza     KOTA      chef       designer    Single     20
1   el        sangiang      meatball    KAB     teacher     nurse       Single     25
2   naila    penjaringan    chicken   WILAYAH   police      designer    Married    20
3   dori       warakas       cake       KAB     doctor      programmer  Single     18
4   kakaeka    jonggol       cake      DAERAH   students    programmer  Single     25
5   genta     cikarang       onion      KOTA    programmer  teacher     Divorced   40
6   ruby      cikarang      noodle    WILAYAH   lecturer    mentor      Married    37

CodePudding user response:

Use Index.difference for exclude some non numeric columns by list:

def content_consistent(df):
    cols = df.select_dtypes(object).columns.difference(['sub', 'status'])
    df[cols] = df[cols].apply(lambda x: x.str.lower())
    return df

CodePudding user response:

You can exclude those columns with list comprehension as mentioned below

df = pd.DataFrame(df)

def content_consistent(df):
    cols = df.select_dtypes(object).columns
    cols = [x for x in cols if x not in ['sub', 'status']]
    df[cols] = df[cols].apply(lambda x: x.str.lower())
    return df

df = content_consistent(df)

CodePudding user response:

Select columns except sub and age. make them all lower and then update the df

df.update(df.filter(regex='[^subage]', axis=1).apply(lambda x:x.str.lower()))
  • Related