Home > Mobile >  Find pattern in pandas column names and change such columns using pipe
Find pattern in pandas column names and change such columns using pipe

Time:05-08

Let say I have below calculation,

import pandas as pd
dat = pd.DataFrame({'xx1' : [1,2,3], 'aa2' : ['qq', '4', 'd'], 'xx3' : [4,5,6]})
dat2 = (dat
        .assign(xx1 = lambda x : [str(i) for i in x['xx1'].values])
        .assign(xx3 = lambda x : [str(i) for i in x['xx3'].values])
    )

Basically, I need to find those columns for which column names match pattern xx sequence of numbers (i.e. xx1, xx2, xx3 etc) and then apply some transformation to those column (e.g. apply str function)

One way I can do this is like above i.e. find manually those columns and perform transformation. I wonder if there is any way to generalise this approach. I prefer to use pipe like above.

Any pointer will be very helpful.

CodePudding user response:

You could do:

# Matches all columns starting with 'xx' with a sequence of numbers afterwards. 
cols_to_transform = dat.columns[dat.columns.str.match('^xx[0-9] $')]

# Transform to apply (column-wise).
transform_function = lambda c: c.astype(str)

# If you want a new DataFrame and not modify the other in-place.
dat2 = dat.copy()

dat2[cols_to_transform] = dat2[cols_to_transform].transform(transform_function, axis=0)

To use it within assign:

# Here I put a lambda to avoid precomputing all the transformations in the dict comprehension.
dat.assign(**{col: lambda df: df[col].astype(str) for col in cols_to_transform})

CodePudding user response:

import pandas as pd
frame = pd.DataFrame({'xx1' : [1,2,3], 'aa2' : ['qq', '4', 'd'], 'xx3' : [4,5,6]})

def parse_column(col, vals):
    if "xx" == col[:2] and col[2:].isdigit():
        return [str(i) for i in vals]
    return vals

for (name, col) in frame.iteritems():
    frame[name] = parse_column(name, col.values)
  1. you can iterate over columns, getting their names and values as a series
  2. the incredibly niche str.isdigits() function exists as an inherent part of pytohn for some reason, but it came in useful here
  • Related