- I am trying to make 'n' = 1, 2, 3.. number of columns named t_{n}
- t(1, 2, 3...) = p(1, 2, 3...) - current respectively
- example: t1 = p1 - current
- My problem is data is kinda messy and p1, p2, p3 can lie any .iloc (index) in the dataset. but the
p{n}
will always be there. - I dont want to make a column if p lies in the name of other columns.
- Not sure how to best proceed further.
current | w | p1 | p2 | p3 | other | p4 | t1 |
t2 |
t3 |
---|---|---|---|---|---|---|---|---|---|
1.765 | 84 | 1.765 | 1.765 | 1.765 | ad | 1.765 | |||
6.527 | 79 | 8.91 | 8.936 | 8.916 | ad | 8.82 | |||
9.429 | 79 | 8.91 | 8.936 | 8.916 | ad | 8.82 |
import pandas as pd
import numpy as np
data_1 = {'p1': [1, 2, 3],
'p2': [4, 5, 6],
'p3': [3, 70 ,6],
'p4': [40, 8, 500],
'current': [500, 600, 700],
't1': ['x', 'y', 'z']
}
df1 = pd.DataFrame(data = data_1)
df1
CodePudding user response:
Use DataFrame.filter
wth regex
for seelct all column with pattern p
with digit, then subtract from right side by DataFrame.rsub
and rename
columns names:
f = lambda x: x.replace('p','t')
df = df1.filter(regex='^p\d').rsub(df1['current'], axis=0).rename(columns=f)
print (df)
t1 t2 t3 t4
0 499 496 497 460
1 598 595 530 592
2 697 694 694 200
If need append columns to original first remove t1
column (because in sample data):
f = lambda x: x.replace('p','t')
df = df1.drop('t1',1).join(df1.filter(regex='^p\d').rsub(df1['current'], axis=0).rename(columns=f))
print (df)
p1 p2 p3 p4 current t1 t2 t3 t4
0 1 4 3 40 500 499 496 497 460
1 2 5 70 8 600 598 595 530 592
2 3 6 6 500 700 697 694 694 200
Details:
print (df1.filter(regex='^p\d'))
p1 p2 p3 p4
0 1 4 3 40
1 2 5 70 8
2 3 6 6 500
EDIT: If need select only consecutive p
columns use:
data_1 = {'p1': [1, 2, 3],
'p2': [4, 5, 6],
'p3': [3, 70 ,6],
'p4': [40, 8, 500],
'current': [500, 600, 700],
'p5': [4, 8, 500],
'p6': [40, 80, 50],
'other': ['x', 'y', 'z'],
'p7': [4, 8, 5],
}
df1 = pd.DataFrame(data = data_1)
print (df1)
p1 p2 p3 p4 current p5 p6 other p7
0 1 4 3 40 500 4 40 x 4
1 2 5 70 8 600 8 80 y 8
2 3 6 6 500 700 500 50 z 5
pos = df1.columns.get_indexer(df1.filter(regex='^p\d').columns)
print (pos)
[0 1 2 3 5 6 8]
s = pd.Series(pos).diff().ne(1).cumsum().duplicated(keep=False)
print (s)
0 True
1 True
2 True
3 True
4 True
5 True
6 False
dtype: bool
df = df1.iloc[:, pos[s]]
print (df)
p1 p2 p3 p4 p5 p6
0 1 4 3 40 4 40
1 2 5 70 8 8 80
2 3 6 6 500 500 50