Home > Blockchain >  Assigning new column based on other columns in Python
Assigning new column based on other columns in Python

Time:10-26

In Python I am trying to create a new column(degree) within a dataframe and to set its value based on if logic based on two other columns in the dataframe (whether single rows of one or both these columns are null values or not..). Per row it should assign to the new column the value of either one of these columns based on the presence of null values in the column.

I have tried the below code, which gives me the following error message:

KeyError: 'degree'

The code is -

for i in basicdataframe.index:
    if pd.isnull(basicdataframe['section_degree'][i]) and pd.isnull(basicdataframe['model_degree'][i]):
        basicdataframe['degree'][i] = basicdataframe['model_degree'][i]
    elif pd.notnull(basicdataframe['section_degree'][i]) and pd.isnull(basicdataframe['model_degree'][i]):
        basicdataframe['degree'][i] = basicdataframe['section_degree'][i]
    elif pd.isnull(basicdataframe['section_degree'][i]) and pd.notnull(basicdataframe['model_degree'][i]):
        basicdataframe['degree'][i] = basicdataframe['model_degree'][i]
    elif pd.notnull(basicdataframe['section_degree'][i]) and pd.notnull(basicdataframe['model_degree'][i]):
        basicdataframe['degree'][i] = basicdataframe['model_degree'][i]

Does anybody know how to achieve this?

CodePudding user response:

The error is because you are trying to assign values inside a column which does not exist yet.

Since you are setting a new column as degree, it makes sense if you add the column first with some default value.

basicdataframe['degree'] = ''

This would set an empty string for all rows of the dataframe for this column.

After that, you can set the values.

P.S. Your code is likely to give you warnings about SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.

To fix that, you could take help from https://stackoverflow.com/a/20627316/1388513

CodePudding user response:

Let's say you have pandas Dataframe like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(data={
    "section_degree": [1, 2, np.nan, np.nan], 
    "model_degree": [np.nan, np.nan, np.nan, 3]
})

You can define function that will be applied to DataFrame:

def define_degree(x):
    if pd.isnull(x["section_degree"]) and pd.isnull(x["model_degree"]):
        return x["model_degree"]
    elif pd.notnull(x['section_degree']) and pd.isnull(x['model_degree']):
        return x["section_degree"]
    elif pd.isnull(x['section_degree']) and pd.notnull(x['model_degree']):
        return x["model_degree"]
    elif pd.notnull(x['section_degree']) and pd.notnull(x['model_degree']):
        return x["model_degree"]
df["degree"] = df.apply(define_degree, axis=1)

df

# output

    section_degree  model_degree    degree
0   1.0             NaN             1.0
1   2.0             NaN             2.0
2   NaN             NaN             NaN
3   NaN             3.0             3.0
  • Related