Home > Software design >  creating new column using first value within group and conditions(pandas, python, groupby)
creating new column using first value within group and conditions(pandas, python, groupby)

Time:10-29

i'm trying to create a new column(tpost) where the values are from column(post) based on the conditions of the function. However, a value in the new column came back as empty.

goal is using the first value of the same group with the condition of excluding place=1, however, still retaining the value from place=1 in the column(tpost)

my attempt with the missing value:

╔═══════════╦═════════╦═════════╦═══════╦════╦═══════╦═══════╗
║    id     ║ subject ║  class  ║ place ║ hf ║ post  ║ tpost ║
╠═══════════╬═════════╬═════════╬═══════╬════╬═══════╬═══════╣
║ 202006106 ║ B_Math  ║ Class_4 ║     4 ║  1 ║ 0.048 ║ 0.048 ║
║ 202006106 ║ B_Math  ║ Class_4 ║     6 ║  1 ║ 0.045 ║ 0.048 ║
║ 202006106 ║ B_Math  ║ Class_4 ║     7 ║  1 ║ 0.043 ║ 0.048 ║
║ 202007153 ║ B_Math  ║ Class_4 ║     4 ║  1 ║ 0.042 ║ 0.042 ║
║ 202007155 ║ B_Math  ║ Class_4 ║     6 ║  1 ║ 0.040 ║ 0.040 ║
║ 202009094 ║ B_Math  ║ Class_4 ║     2 ║  1 ║ 0.038 ║ 0.038 ║
║ 202009094 ║ B_Math  ║ Class_4 ║     8 ║  1 ║ 0.037 ║ 0.038 ║
║ 202009095 ║ B_Math  ║ Class_4 ║     4 ║  1 ║ 0.036 ║ 0.036 ║
║ 202010143 ║ B_Math  ║ Class_4 ║     2 ║  1 ║ 0.034 ║ 0.034 ║
║ 202010143 ║ B_Math  ║ Class_4 ║     7 ║  1 ║ 0.033 ║ 0.034 ║
║ 202010145 ║ B_Math  ║ Class_4 ║     1 ║  1 ║ 0.065 ║       ║
╚═══════════╩═════════╩═════════╩═══════╩════╩═══════╩═══════╝

correct result:

╔═══════════╦═════════╦═════════╦═══════╦════╦═══════╦═══════╗
║    id     ║ subject ║  class  ║ place ║ hf ║ post  ║ tpost ║
╠═══════════╬═════════╬═════════╬═══════╬════╬═══════╬═══════╣
║ 202006106 ║ B_Math  ║ Class_4 ║     4 ║  1 ║ 0.048 ║ 0.048 ║
║ 202006106 ║ B_Math  ║ Class_4 ║     6 ║  1 ║ 0.045 ║ 0.048 ║
║ 202006106 ║ B_Math  ║ Class_4 ║     7 ║  1 ║ 0.043 ║ 0.048 ║
║ 202007153 ║ B_Math  ║ Class_4 ║     4 ║  1 ║ 0.042 ║ 0.042 ║
║ 202007155 ║ B_Math  ║ Class_4 ║     6 ║  1 ║ 0.040 ║ 0.040 ║
║ 202009094 ║ B_Math  ║ Class_4 ║     2 ║  1 ║ 0.038 ║ 0.038 ║
║ 202009094 ║ B_Math  ║ Class_4 ║     8 ║  1 ║ 0.037 ║ 0.038 ║
║ 202009095 ║ B_Math  ║ Class_4 ║     4 ║  1 ║ 0.036 ║ 0.036 ║
║ 202010143 ║ B_Math  ║ Class_4 ║     2 ║  1 ║ 0.034 ║ 0.034 ║
║ 202010143 ║ B_Math  ║ Class_4 ║     7 ║  1 ║ 0.033 ║ 0.034 ║
║ 202010145 ║ B_Math  ║ Class_4 ║     1 ║  1 ║ 0.065 ║ 0.065 ║
╚═══════════╩═════════╩═════════╩═══════╩════╩═══════╩═══════╝

here are the 2 codes i've used:

code1:
df=(df.set_index(['id', 'subject', 'class', 'hf'])
.assign(tpost=df[df['place']>1].groupby(['id', 'subject', 'class', 'hf'])['post'].first())
.reset_index())

code2:    
df=(df[df['place']>1].groupby(['id', 'subject', 'class', 'hf']).post.transform('first')
.pipe(lambda x: df.assign(tpost = x))
.fillna(''))
orig df code:
data = {'id':['202006106','202006106','202006106','202007153','202007155','202009094',
               '202009094','202009095','202010143','202010143','202010145'],
        'subject':['B_Math','B_Math','B_Math','B_Math','B_Math','B_Math',
               'B_Math','B_Math','B_Math','B_Math','B_Math'],
        'class':['Class_4','Class_4','Class_4','Class_4','Class_4','Class_4',
               'Class_4','Class_4','Class_4','Class_4','Class_4'],
        'hf':[1,1,1,1,1,1,1,1,1,1,1],
        'place':[4,6,7,4,6,2,8,4,2,7,1],
        'post':[0.048,0.045,0.043,0.042,0.040,0.038,0.037,0.036,0.034,0.033,0.065]}

df = pd.DataFrame(data)

kindly advise. Many thanks

CodePudding user response:

Here is one approach. You can mask the values for place 1 and restore them after the groupby transform:

group = df.groupby(['id', 'subject', 'class', 'hf']).ngroup()
new = df['post'].mask(df['place'].eq(1)).groupby(group).transform('first')
df['tpost'] = new.mask(new.isna(), df['post'])

output:

           id subject    class  hf  place   post  tpost
0   202006106  B_Math  Class_4   1      4  0.048  0.048
1   202006106  B_Math  Class_4   1      6  0.045  0.048
2   202006106  B_Math  Class_4   1      7  0.043  0.048
3   202007153  B_Math  Class_4   1      4  0.042  0.042
4   202007155  B_Math  Class_4   1      6  0.040  0.040
5   202009094  B_Math  Class_4   1      2  0.038  0.038
6   202009094  B_Math  Class_4   1      8  0.037  0.038
7   202009095  B_Math  Class_4   1      4  0.036  0.036
8   202010143  B_Math  Class_4   1      2  0.034  0.034
9   202010143  B_Math  Class_4   1      7  0.033  0.034
10  202010145  B_Math  Class_4   1      1  0.065  0.065
  • Related