i'm trying to create a new column(tpost) where the values are from column(post) based on the conditions of the function. However, a value in the new column came back as empty.
goal is using the first value of the same group with the condition of excluding place=1, however, still retaining the value from place=1 in the column(tpost)
my attempt with the missing value:
╔═══════════╦═════════╦═════════╦═══════╦════╦═══════╦═══════╗ ║ id ║ subject ║ class ║ place ║ hf ║ post ║ tpost ║ ╠═══════════╬═════════╬═════════╬═══════╬════╬═══════╬═══════╣ ║ 202006106 ║ B_Math ║ Class_4 ║ 4 ║ 1 ║ 0.048 ║ 0.048 ║ ║ 202006106 ║ B_Math ║ Class_4 ║ 6 ║ 1 ║ 0.045 ║ 0.048 ║ ║ 202006106 ║ B_Math ║ Class_4 ║ 7 ║ 1 ║ 0.043 ║ 0.048 ║ ║ 202007153 ║ B_Math ║ Class_4 ║ 4 ║ 1 ║ 0.042 ║ 0.042 ║ ║ 202007155 ║ B_Math ║ Class_4 ║ 6 ║ 1 ║ 0.040 ║ 0.040 ║ ║ 202009094 ║ B_Math ║ Class_4 ║ 2 ║ 1 ║ 0.038 ║ 0.038 ║ ║ 202009094 ║ B_Math ║ Class_4 ║ 8 ║ 1 ║ 0.037 ║ 0.038 ║ ║ 202009095 ║ B_Math ║ Class_4 ║ 4 ║ 1 ║ 0.036 ║ 0.036 ║ ║ 202010143 ║ B_Math ║ Class_4 ║ 2 ║ 1 ║ 0.034 ║ 0.034 ║ ║ 202010143 ║ B_Math ║ Class_4 ║ 7 ║ 1 ║ 0.033 ║ 0.034 ║ ║ 202010145 ║ B_Math ║ Class_4 ║ 1 ║ 1 ║ 0.065 ║ ║ ╚═══════════╩═════════╩═════════╩═══════╩════╩═══════╩═══════╝
correct result:
╔═══════════╦═════════╦═════════╦═══════╦════╦═══════╦═══════╗ ║ id ║ subject ║ class ║ place ║ hf ║ post ║ tpost ║ ╠═══════════╬═════════╬═════════╬═══════╬════╬═══════╬═══════╣ ║ 202006106 ║ B_Math ║ Class_4 ║ 4 ║ 1 ║ 0.048 ║ 0.048 ║ ║ 202006106 ║ B_Math ║ Class_4 ║ 6 ║ 1 ║ 0.045 ║ 0.048 ║ ║ 202006106 ║ B_Math ║ Class_4 ║ 7 ║ 1 ║ 0.043 ║ 0.048 ║ ║ 202007153 ║ B_Math ║ Class_4 ║ 4 ║ 1 ║ 0.042 ║ 0.042 ║ ║ 202007155 ║ B_Math ║ Class_4 ║ 6 ║ 1 ║ 0.040 ║ 0.040 ║ ║ 202009094 ║ B_Math ║ Class_4 ║ 2 ║ 1 ║ 0.038 ║ 0.038 ║ ║ 202009094 ║ B_Math ║ Class_4 ║ 8 ║ 1 ║ 0.037 ║ 0.038 ║ ║ 202009095 ║ B_Math ║ Class_4 ║ 4 ║ 1 ║ 0.036 ║ 0.036 ║ ║ 202010143 ║ B_Math ║ Class_4 ║ 2 ║ 1 ║ 0.034 ║ 0.034 ║ ║ 202010143 ║ B_Math ║ Class_4 ║ 7 ║ 1 ║ 0.033 ║ 0.034 ║ ║ 202010145 ║ B_Math ║ Class_4 ║ 1 ║ 1 ║ 0.065 ║ 0.065 ║ ╚═══════════╩═════════╩═════════╩═══════╩════╩═══════╩═══════╝
here are the 2 codes i've used:
code1:
df=(df.set_index(['id', 'subject', 'class', 'hf'])
.assign(tpost=df[df['place']>1].groupby(['id', 'subject', 'class', 'hf'])['post'].first())
.reset_index())
code2:
df=(df[df['place']>1].groupby(['id', 'subject', 'class', 'hf']).post.transform('first')
.pipe(lambda x: df.assign(tpost = x))
.fillna(''))
orig df code:
data = {'id':['202006106','202006106','202006106','202007153','202007155','202009094',
'202009094','202009095','202010143','202010143','202010145'],
'subject':['B_Math','B_Math','B_Math','B_Math','B_Math','B_Math',
'B_Math','B_Math','B_Math','B_Math','B_Math'],
'class':['Class_4','Class_4','Class_4','Class_4','Class_4','Class_4',
'Class_4','Class_4','Class_4','Class_4','Class_4'],
'hf':[1,1,1,1,1,1,1,1,1,1,1],
'place':[4,6,7,4,6,2,8,4,2,7,1],
'post':[0.048,0.045,0.043,0.042,0.040,0.038,0.037,0.036,0.034,0.033,0.065]}
df = pd.DataFrame(data)
kindly advise. Many thanks
CodePudding user response:
Here is one approach. You can mask the values for place 1 and restore them after the groupby
transform
:
group = df.groupby(['id', 'subject', 'class', 'hf']).ngroup()
new = df['post'].mask(df['place'].eq(1)).groupby(group).transform('first')
df['tpost'] = new.mask(new.isna(), df['post'])
output:
id subject class hf place post tpost
0 202006106 B_Math Class_4 1 4 0.048 0.048
1 202006106 B_Math Class_4 1 6 0.045 0.048
2 202006106 B_Math Class_4 1 7 0.043 0.048
3 202007153 B_Math Class_4 1 4 0.042 0.042
4 202007155 B_Math Class_4 1 6 0.040 0.040
5 202009094 B_Math Class_4 1 2 0.038 0.038
6 202009094 B_Math Class_4 1 8 0.037 0.038
7 202009095 B_Math Class_4 1 4 0.036 0.036
8 202010143 B_Math Class_4 1 2 0.034 0.034
9 202010143 B_Math Class_4 1 7 0.033 0.034
10 202010145 B_Math Class_4 1 1 0.065 0.065