Home > Mobile >  How to replace columns values based on substring order in a pandas dataframe
How to replace columns values based on substring order in a pandas dataframe

Time:02-18

I have the following pandas DataFrame:

A B C D
N,M N 1 N-D, M-D
N,M M 1 N-D, M-D
N,M N 2 N-D, M-D
N,M M 2 N-D, M-D
X,Y Y 1 X-D, Y-D
X,Y X 1 X-D, Y-D

And I need to simplify it to:

B C D
N 1 N-D
M 1 M-D
N 2 N-D
M 2 M-D
Y 1 Y-D
X 1 X-D

Basically in column D we need to keep the values that corresponds to the B column-index. It seems more like a logical problem. I've read that for iterations are not recomended because are really slow in pandas but that is the only approach I could think of. I accept any suggestions. Column A indicates the index order in D column values (if it is an N or M value in the first 4 rows, or an X/Y value in the last two). Thanks.

CodePudding user response:

Please check this:

import pandas as pd

df1 = pd.DataFrame([['N,M', 'N', 1,'N-D, M-D'],['N,M', 'M', 1,'N-D, M-D'],
                    ['N,M', 'N', 2,'N-D, M-D'],['N,M',  'M', 2,'N-D, M-D'],
                    ['X,Y', 'Y', 1,'X-D, Y-D'],['X,Y', 'X', 1,'X-D, Y-D']],columns=['A','B','C','D'])

def f(x):
  '''
Function to find the matching substring in column D w.r.t column B
  '''
  b,d = x[0],x[1]
  d = d.split(',',1)
  d = [d[0] if b in d[0] else d[1] if b in d[1] else None]
  return d[0]

df1['D'] = df1[['B','D']].apply(f, axis=1)
df1  = df1.drop(columns='A',axis=1)
print(df1)

CodePudding user response:

Use df.explode with Series.str.split:

In [1375]: x = df.assign(D=df.D.str.split(',')).explode('D')
In [1370]: output = x[x.B.eq(x.D.str.split('-').str[0])]
  • Related