Home > Mobile >  Pandas update column to reflect number of times string occurs
Pandas update column to reflect number of times string occurs

Time:04-08

Is there a way to search for the pattern "MV=" within each column and update the column values to reflect the number of times "MV=" occurs.

Dataframe:

d = [{'AJ':['Rec(POS=4,,REF=FF,, MV=55), Rec(POS=2,, REF=GH,, MV=23)'], 'AVF12':[], 'HI':['Rec(POS=2,,REF=RTD,, MV=23), Rec(POS=234,, REF=FFRE,, MV=00)'],'AV1':[], 'v1':[]},
      {'AJ':[], 'AVF12':['Rec(POS=43,,REF=FeF,, MV=5455), Rec(POS=2,, REF=GH,, MV=23), Rec(POS=231,, REF=JK, MV=TR)'], 'HI':[],'AV1':[], 'v2':[]},
      {'AJ':['Rec(POS=2342,,REF=FhF,, MV=1)'], 'AVF12':['Rec(POS=11,,REF=FF11,, MV=551)'], 'HI':[],'AV1':[], 'v3':[]}]

frame = pd.DataFrame(d)

f = frame.T

lst = []
f['temp'] = f.index
for i in f.iloc[-3:, -1]:
  lst.append(i)
f = f.drop(columns={'temp'})

f.columns = [lst]
f = f.drop(f.index[[-1,-2,-3]])

Basically, since MV= occurs twice in the first row of the first column the value would be 2.

I tried str.count but it just returned 0 when there was an MV= in a given row.

Desired output:

           v1      v2     v3
AJ         2       0      1
AVF12      0       3      1
HI         2       0      0
AV1        0       0      0

CodePudding user response:

First remove last 3 columns v and for each column convert values to strings with call Series.str.count, then transpose values and rename columns:

pat = "MV="

df = (frame.iloc[:, :-3]
            .apply(lambda x: x.astype(str).str.count(pat))
            .T
            .rename(columns=lambda x: f'v{x 1}'))
print (df)
       v1  v2  v3
AJ      2   0   1
AVF12   0   3   1
HI      2   0   0
AV1     0   0   0

CodePudding user response:

The issue is that you have lists as elements, you need to extract the elements first:

f.apply(lambda c: c.str[0].str.count('MV=')).fillna(0, downcast='infer')

output:

       0  1  2
AJ     2  0  1
AVF12  0  3  1
HI     2  0  0
AV1    0  0  0
v1     0  0  0
v2     0  0  0
v3     0  0  0

NB. It is unclear how you want to map the names v1/v2/v2 from the original layout, please make it more explicit

Assuming the last 3 rows really determine those labels:

drop = 3
(f
 .iloc[:-drop]
 .apply(lambda c: c.str[0].str.count('MV='))
 .fillna(0, downcast='infer')
 .set_axis(f.index[-drop:], axis=1)
)

output:

       v1  v2  v3
AJ      2   0   1
AVF12   0   3   1
HI      2   0   0
AV1     0   0   0
  • Related