In Python I am trying to create a new column(degree
) within a dataframe
and to set its value based on if logic based on two other columns in the dataframe
(whether single rows of one or both these columns are null
values or not..). Per row it should assign to the new column the value of either one of these columns based on the presence of null
values in the column.
I have tried the below code, which gives me the following error message:
KeyError: 'degree'
The code is -
for i in basicdataframe.index:
if pd.isnull(basicdataframe['section_degree'][i]) and pd.isnull(basicdataframe['model_degree'][i]):
basicdataframe['degree'][i] = basicdataframe['model_degree'][i]
elif pd.notnull(basicdataframe['section_degree'][i]) and pd.isnull(basicdataframe['model_degree'][i]):
basicdataframe['degree'][i] = basicdataframe['section_degree'][i]
elif pd.isnull(basicdataframe['section_degree'][i]) and pd.notnull(basicdataframe['model_degree'][i]):
basicdataframe['degree'][i] = basicdataframe['model_degree'][i]
elif pd.notnull(basicdataframe['section_degree'][i]) and pd.notnull(basicdataframe['model_degree'][i]):
basicdataframe['degree'][i] = basicdataframe['model_degree'][i]
Does anybody know how to achieve this?
CodePudding user response:
The error is because you are trying to assign values inside a column which does not exist yet.
Since you are setting a new column as degree
, it makes sense if you add the column first with some default value.
basicdataframe['degree'] = ''
This would set an empty string for all rows of the dataframe for this column.
After that, you can set the values.
P.S. Your code is likely to give you warnings about
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
.
To fix that, you could take help from https://stackoverflow.com/a/20627316/1388513
CodePudding user response:
Let's say you have pandas Dataframe like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={
"section_degree": [1, 2, np.nan, np.nan],
"model_degree": [np.nan, np.nan, np.nan, 3]
})
You can define function that will be applied to DataFrame:
def define_degree(x):
if pd.isnull(x["section_degree"]) and pd.isnull(x["model_degree"]):
return x["model_degree"]
elif pd.notnull(x['section_degree']) and pd.isnull(x['model_degree']):
return x["section_degree"]
elif pd.isnull(x['section_degree']) and pd.notnull(x['model_degree']):
return x["model_degree"]
elif pd.notnull(x['section_degree']) and pd.notnull(x['model_degree']):
return x["model_degree"]
df["degree"] = df.apply(define_degree, axis=1)
df
# output
section_degree model_degree degree
0 1.0 NaN 1.0
1 2.0 NaN 2.0
2 NaN NaN NaN
3 NaN 3.0 3.0