I have the following code that masks values equal to ten, and then the next closest value. But actually I need to apply it only if 10 occurs once in the column ending in '_ans'. So the mask should only occur for the column 'a_ans', because there are two 10s in 'b_ans. any comments welcome. thanks
df = pd.DataFrame(data={'a_ans':[0,1,1,10,11],
'a_num': [1,8,90,2,8],
'b_ans': [0,10,139,10,18],
'b_num': [15,43,90,14,87]}).astype(float)
out=[]
for i in ['a_', 'b_']:
pairs = (df.loc[:,df.columns.str.startswith(i)]) # pair columns
mask1 = pairs[i 'ans'] == 10 # mask values equal to 10
mask2 = pairs[i 'ans'].eq(pairs[i 'ans'].mask(mask1).max())# get the next highest value
pairs = pairs.mask(mask1, 1001).mask(mask2, 1002) # replacing values
out.append(pairs)
CodePudding user response:
you can use value_counts()
to get the occurrence of each row value within each column:
if pairs[i 'ans'].value_counts()[10] == 1:
# apply mask logic
CodePudding user response:
Following modifications could be useful, but it is not clear what should be the next values closest or highest?
df = pd.DataFrame(data={'a_ans':[0,1,1,10,11],
'a_num': [1,8,90,2,8],
'b_ans': [0,10,139,10,18],
'b_num': [15,43,90,14,87]}).astype(float)
out=[]
for i in ['a_', 'b_']:
pairs = df.loc[:,df.columns.str.startswith(i "ans")] # for only _ans columns
if len(pairs[pairs[i 'ans'] == 10]) == 1: # for only one ten
mask1 = pairs[i 'ans'] == 10 # mask values equal to 10
mask2 = pairs[i 'ans'].eq(pairs[i 'ans'].mask(mask1).max())
pairs = pairs.mask(mask1, 1001).mask(mask2, 1002)
out.append(pairs)