Is there a way to search for the pattern "MV=" within each column and update the column values to reflect the number of times "MV=" occurs.
Dataframe:
d = [{'AJ':['Rec(POS=4,,REF=FF,, MV=55), Rec(POS=2,, REF=GH,, MV=23)'], 'AVF12':[], 'HI':['Rec(POS=2,,REF=RTD,, MV=23), Rec(POS=234,, REF=FFRE,, MV=00)'],'AV1':[], 'v1':[]},
{'AJ':[], 'AVF12':['Rec(POS=43,,REF=FeF,, MV=5455), Rec(POS=2,, REF=GH,, MV=23), Rec(POS=231,, REF=JK, MV=TR)'], 'HI':[],'AV1':[], 'v2':[]},
{'AJ':['Rec(POS=2342,,REF=FhF,, MV=1)'], 'AVF12':['Rec(POS=11,,REF=FF11,, MV=551)'], 'HI':[],'AV1':[], 'v3':[]}]
frame = pd.DataFrame(d)
f = frame.T
lst = []
f['temp'] = f.index
for i in f.iloc[-3:, -1]:
lst.append(i)
f = f.drop(columns={'temp'})
f.columns = [lst]
f = f.drop(f.index[[-1,-2,-3]])
Basically, since MV= occurs twice in the first row of the first column the value would be 2.
I tried str.count but it just returned 0 when there was an MV= in a given row.
Desired output:
v1 v2 v3
AJ 2 0 1
AVF12 0 3 1
HI 2 0 0
AV1 0 0 0
CodePudding user response:
First remove last 3 columns v
and for each column convert values to strings with call Series.str.count
, then transpose values and rename
columns:
pat = "MV="
df = (frame.iloc[:, :-3]
.apply(lambda x: x.astype(str).str.count(pat))
.T
.rename(columns=lambda x: f'v{x 1}'))
print (df)
v1 v2 v3
AJ 2 0 1
AVF12 0 3 1
HI 2 0 0
AV1 0 0 0
CodePudding user response:
The issue is that you have lists as elements, you need to extract the elements first:
f.apply(lambda c: c.str[0].str.count('MV=')).fillna(0, downcast='infer')
output:
0 1 2
AJ 2 0 1
AVF12 0 3 1
HI 2 0 0
AV1 0 0 0
v1 0 0 0
v2 0 0 0
v3 0 0 0
NB. It is unclear how you want to map the names v1/v2/v2 from the original layout, please make it more explicit
Assuming the last 3 rows really determine those labels:
drop = 3
(f
.iloc[:-drop]
.apply(lambda c: c.str[0].str.count('MV='))
.fillna(0, downcast='infer')
.set_axis(f.index[-drop:], axis=1)
)
output:
v1 v2 v3
AJ 2 0 1
AVF12 0 3 1
HI 2 0 0
AV1 0 0 0