Home > Software engineering >  Values in a dataframe column will not change
Values in a dataframe column will not change

Time:03-14

I have a data frame where I want to replace the values '<=50K' and '>50K' in the 'Salary' column with '0' and '1' respectively. I have tried the replace function but it does not change anything. I have tried a lot of things but nothing seems to work. I am trying to do some logistic regression on the cells but the formulas do not work because of the datatype. The real data set has over 20,000 rows.

Age   Workclass   fnlwgt  education   education-num   Salary 
39  state-gov    455    Bachelors      13            <=50K 
25   private     22      Masters       89             >50K
df['Salary']= df['Salary'].replace(['<=50K'],'0')
df['Salary']

This is the error i get when i try to do smf.logit(). See below code. I don't understand why i get an error because Age and education-num are both int64.

mod = smf.logit(formula = 'education-num ~ Age', data= dftrn)

resmod = modelAdm.fit()

ValueError: endog has evaluated to an array with multiple columns that has shape (26049, 16). This occurs when the variable converted to endog is non-numeric (e.g., bool or str).

CodePudding user response:

You can try this and for check purpose I have created a new column, you can always change the same column as well just replace new_column with column;

df[df['new_salary']=='<=50K']= 0
df[df['new_salary']=='>50K']= 1 

CodePudding user response:

Regarding the first question, you should just use a single square bracket on the left side of the equation.

df['Salary']= df['Salary'].replace(['<=50K'],'0')
df['Salary']= df['Salary'].replace(['>50K'],'1')
df['Salary']

As for the second part of the question, you are naming the model as mod but you are calling the fit function on modelAdm.

Anyways those are 2 different questions and should be asked separately in 2 different posts.

  • Related