How to replace columns values based on substring order in a pandas dataframe-CodePudding

I have the following pandas DataFrame:

A	B	C	D
N,M	N	1	N-D, M-D
N,M	M	1	N-D, M-D
N,M	N	2	N-D, M-D
N,M	M	2	N-D, M-D
X,Y	Y	1	X-D, Y-D
X,Y	X	1	X-D, Y-D

And I need to simplify it to:

B	C	D
N	1	N-D
M	1	M-D
N	2	N-D
M	2	M-D
Y	1	Y-D
X	1	X-D

Basically in column D we need to keep the values that corresponds to the B column-index. It seems more like a logical problem. I've read that for iterations are not recomended because are really slow in pandas but that is the only approach I could think of. I accept any suggestions. Column A indicates the index order in D column values (if it is an N or M value in the first 4 rows, or an X/Y value in the last two). Thanks.

CodePudding user response：

Please check this:

import pandas as pd

df1 = pd.DataFrame([['N,M', 'N', 1,'N-D, M-D'],['N,M', 'M', 1,'N-D, M-D'],
                    ['N,M', 'N', 2,'N-D, M-D'],['N,M',  'M', 2,'N-D, M-D'],
                    ['X,Y', 'Y', 1,'X-D, Y-D'],['X,Y', 'X', 1,'X-D, Y-D']],columns=['A','B','C','D'])

def f(x):
  '''
Function to find the matching substring in column D w.r.t column B
  '''
  b,d = x[0],x[1]
  d = d.split(',',1)
  d = [d[0] if b in d[0] else d[1] if b in d[1] else None]
  return d[0]

df1['D'] = df1[['B','D']].apply(f, axis=1)
df1  = df1.drop(columns='A',axis=1)
print(df1)

CodePudding user response：

Use df.explode with Series.str.split:

In [1375]: x = df.assign(D=df.D.str.split(',')).explode('D')
In [1370]: output = x[x.B.eq(x.D.str.split('-').str[0])]