I have the following pandas DataFrame:
A | B | C | D |
---|---|---|---|
N,M | N | 1 | N-D, M-D |
N,M | M | 1 | N-D, M-D |
N,M | N | 2 | N-D, M-D |
N,M | M | 2 | N-D, M-D |
X,Y | Y | 1 | X-D, Y-D |
X,Y | X | 1 | X-D, Y-D |
And I need to simplify it to:
B | C | D |
---|---|---|
N | 1 | N-D |
M | 1 | M-D |
N | 2 | N-D |
M | 2 | M-D |
Y | 1 | Y-D |
X | 1 | X-D |
Basically in column D we need to keep the values that corresponds to the B column-index. It seems more like a logical problem. I've read that for iterations are not recomended because are really slow in pandas but that is the only approach I could think of. I accept any suggestions. Column A indicates the index order in D column values (if it is an N or M value in the first 4 rows, or an X/Y value in the last two). Thanks.
CodePudding user response:
Please check this:
import pandas as pd
df1 = pd.DataFrame([['N,M', 'N', 1,'N-D, M-D'],['N,M', 'M', 1,'N-D, M-D'],
['N,M', 'N', 2,'N-D, M-D'],['N,M', 'M', 2,'N-D, M-D'],
['X,Y', 'Y', 1,'X-D, Y-D'],['X,Y', 'X', 1,'X-D, Y-D']],columns=['A','B','C','D'])
def f(x):
'''
Function to find the matching substring in column D w.r.t column B
'''
b,d = x[0],x[1]
d = d.split(',',1)
d = [d[0] if b in d[0] else d[1] if b in d[1] else None]
return d[0]
df1['D'] = df1[['B','D']].apply(f, axis=1)
df1 = df1.drop(columns='A',axis=1)
print(df1)
CodePudding user response:
Use df.explode
with Series.str.split
:
In [1375]: x = df.assign(D=df.D.str.split(',')).explode('D')
In [1370]: output = x[x.B.eq(x.D.str.split('-').str[0])]