Home > Software engineering >  apply function to varying number of columns using regex
apply function to varying number of columns using regex

Time:06-30

p1 p2 p3 p4
0 1 4 3 40
1 2 5 70 8
2 3 6 6 500

I have a table that looks like this i want to apply regex = '^p\d' to search for specific column names and create new columns name n{1}, n{2}, ...

  • n{1} = p{1}/ SQUARE ROOT OF (p{1}**2 p{2}**2 p{3}**2 ...)

  • n{2} = p{2}/ SQUARE ROOT OF (p{1}**2 p{2}**2 p{3}**2 ...)

  • and so on upto... n_{n}

  • index location of p1, p2, p3 varies for different csv files. So I am trying search using regex to search column names.

  • number of p rows also varies for different csv files. I am trying to stop creating new columns loop if the next p column does not exist. Not sure how to do it.

import pandas as pd
import numpy as np

data_1 = {'p1': [1, 2, 3],
          'p2': [4, 5, 6],
          'p3': [3, 70 ,6],
          'p4': [40, 8, 500],
         }



df1 = pd.DataFrame(data = data_1)
df1

CodePudding user response:

You can use python string function to achive the same, but since you asked for regex solution, here is how you can do using the same.

r = re.compile("^p\d")
qualified_columns = list(filter(r.match, df1.columns))
new_col_names = [re.sub(r'p(\d )', r'n\1', col) for col in qualified_columns]
df1[new_col_names] = df1[qualified_columns].div(df1.pow(2).sum(axis=1).pow(1./2), axis=0)

Result :

   p1  p2  p3   p4        n1        n2        n3        n4
0   1   4   3   40  0.024799  0.099197  0.074398  0.991973
1   2   5  70    8  0.028304  0.070760  0.990643  0.113216
2   3   6   6  500  0.005999  0.011998  0.011998  0.999838

CodePudding user response:

The index location of the columns does not matter if you are reading in the column names correctly.

pcols = [col for col in df1.columns if col.startswith('p')]
pnums = [col.split('p')[-1] for col in pcols]
pcols_rss = df1[pcols].apply(lambda row: np.sqrt(np.sum(row**2)), axis=1)
for col, num in zip(pcols, pnums):
    df1.loc[:, f'n{{{num}}}'] = df1[col] / pcols_rss

Output

   p1  p2  p3   p4      n{1}      n{2}      n{3}      n{4}
0   1   4   3   40  0.024799  0.099197  0.074398  0.991973
1   2   5  70    8  0.028304  0.070760  0.990643  0.113216
2   3   6   6  500  0.005999  0.011998  0.011998  0.999838
  • Related