I have a DataFrame and I want to predict the Income of each user using linear regression . my score is too bad and I think it is because of the programming languages column (I encoded all of the data) but this way is not good. How can I make the Programming language column better?
CodePudding user response:
You can create an additional column per each programming language. The column type will be boolean - the user use this programming language or not.
One of the options how you can do this:
df['Python'] = df['The_programming_languages_you_use'].apply(lambda languages: 'Python' in languages)
df['Go'] = df['The_programming_languages_you_use'].apply(lambda languages: 'Go' in languages)