Home > Enterprise >  How do I created a function to do a new column in panda dataset based on conditions, error
How do I created a function to do a new column in panda dataset based on conditions, error

Time:07-21

I have been trying to create a new column in a dataset, however, it has been not working.

    df2 = pd.DataFrame([[1, 'born'], [2, '8 a 14'], [3,'born'], [4,'14 a 21'], [8,'0 a 7'], [10,'die'], [7,'lost']], columns = ["Pen",'Result']) def myFunc(record):
    for i in df['Result']:
        if (df['Result']=='born').any():
            return 'eclosion'
        elif (df['Result']=='1 a 7').any():
            return 'early'
        elif (df['Result']=='8 a 14').any():
            return 'mediun'
        elif (df['Result']=='15 a 21').any():
            return 'late'
df['Final'] = df.apply(myFunc, axis=1)
df

that is the result:

enter image description here

CodePudding user response:

First thing, if your goal is simple to map, use:

d = {'born': 'eclosion', '1 a 7': 'early', '8 a 14':'mediun', '15 a 21': 'late'}

df2['Final'] = df2['Result'].map(d)
# or to keep original values on no match:
df2['Final2'] = df2['Result'].map(d).fillna(df2['Result'])

output:

   Pen   Result     Final    Final2
0    1     born  eclosion  eclosion
1    2   8 a 14    mediun    mediun
2    3     born  eclosion  eclosion
3    4  14 a 21       NaN   14 a 21
4    8    0 a 7       NaN     0 a 7
5   10      die       NaN       die
6    7     lost       NaN      lost

If you want the shown output, find the first value in the desired order and map it:

d = {'born': 'eclosion', '1 a 7': 'early', '8 a 14':'mediun', '15 a 21': 'late'}
idx = (df2.drop_duplicates('Result').set_index('Result')
          .reindex(list(d)).first_valid_index()
       )

df2['Final'] = d.get(idx, None)

output:

   Pen   Result     Final
0    1     born  eclosion
1    2   8 a 14  eclosion
2    3     born  eclosion
3    4  14 a 21  eclosion
4    8    0 a 7  eclosion
5   10      die  eclosion
6    7     lost  eclosion

CodePudding user response:

Problem in your code is that your Result column contains born so (df['Result']=='born').any() will return True and never go into elif part.

You can use np.select instead

df['Final'] = np.select(
    [df['Result']=='born', df['Result']=='1 a 7',
     df['Result']=='8 a 14', df['Result']=='15 a 21'],
    ['eclosion', 'early', 'mediun', 'late'],
    df['Result']
)

CodePudding user response:

The entire code works perfect for me, once you change df2 to be df in the first line to match the rest of the code.

  • Related