detect column names and create new columns based on a custom function-CodePudding

I am trying to make 'n' = 1, 2, 3.. number of columns named t_{n}
t(1, 2, 3...) = p(1, 2, 3...) - current respectively
example: t1 = p1 - current
My problem is data is kinda messy and p1, p2, p3 can lie any .iloc (index) in the dataset. but the p{n} will always be there.
I dont want to make a column if p lies in the name of other columns.
Not sure how to best proceed further.

current	w	p1	p2	p3	other	p4
1.765	84	1.765	1.765	1.765	ad	1.765
6.527	79	8.91	8.936	8.916	ad	8.82
9.429	79	8.91	8.936	8.916	ad	8.82

import pandas as pd
import numpy as np

data_1 = {'p1': [1, 2, 3],
          'p2': [4, 5, 6],
          'p3': [3, 70 ,6],
          'p4': [40, 8, 500],
          'current': [500, 600, 700],
          't1': ['x', 'y', 'z']
         }

df1 = pd.DataFrame(data = data_1)
df1

CodePudding user response：

Use DataFrame.filter wth regex for seelct all column with pattern p with digit, then subtract from right side by DataFrame.rsub and rename columns names:

f = lambda x: x.replace('p','t')
df = df1.filter(regex='^p\d').rsub(df1['current'], axis=0).rename(columns=f)
print (df)
    t1   t2   t3   t4
0  499  496  497  460
1  598  595  530  592
2  697  694  694  200

If need append columns to original first remove t1 column (because in sample data):

f = lambda x: x.replace('p','t')
df = df1.drop('t1',1).join(df1.filter(regex='^p\d').rsub(df1['current'], axis=0).rename(columns=f))
print (df)
   p1  p2  p3   p4  current   t1   t2   t3   t4
0   1   4   3   40      500  499  496  497  460
1   2   5  70    8      600  598  595  530  592
2   3   6   6  500      700  697  694  694  200

Details:

print (df1.filter(regex='^p\d'))
   p1  p2  p3   p4
0   1   4   3   40
1   2   5  70    8
2   3   6   6  500

EDIT: If need select only consecutive p columns use:

data_1 = {'p1': [1, 2, 3],
          'p2': [4, 5, 6],
          'p3': [3, 70 ,6],
          'p4': [40, 8, 500],
          'current': [500, 600, 700],
          'p5': [4, 8, 500],
          'p6': [40, 80, 50],
          'other': ['x', 'y', 'z'],
          'p7': [4, 8, 5],
         }

df1 = pd.DataFrame(data = data_1)
print (df1)
   p1  p2  p3   p4  current   p5  p6 other  p7
0   1   4   3   40      500    4  40     x   4
1   2   5  70    8      600    8  80     y   8
2   3   6   6  500      700  500  50     z   5

pos = df1.columns.get_indexer(df1.filter(regex='^p\d').columns)
print (pos)
[0 1 2 3 5 6 8]

s = pd.Series(pos).diff().ne(1).cumsum().duplicated(keep=False)
print (s)
0     True
1     True
2     True
3     True
4     True
5     True
6    False
dtype: bool

df = df1.iloc[:, pos[s]]
print (df)
   p1  p2  p3   p4   p5  p6
0   1   4   3   40    4  40
1   2   5  70    8    8  80
2   3   6   6  500  500  50