I am trying to generate dummy variables from a string variable using the syntax below
import pandas as pd
data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'crops': ['[maize]', '[maize, cassava]', '[beans, cassava, potato]', '[beans, potato]', '[beans, cassava, maize, potato]', '[beans]', '[cassava, maize, potato]', '[beans, maize]', '[cassava, maize, potato]', '[cassava]', '[beans, cassava, potato]', '[maize, potato]', '[beans, maize, potato]', '[beans, cassava, maize, potato]', '[potato]', '[cassava, potato]', '[beans]', '[maize]', '[potato]', '[cassava]'],
}
df = pd.DataFrame(data)
df['crops'] = df['crops'].str.replace('[', '')
df['crops'] = df['crops'].str.replace(']', '')
res = df.join(df.pop('crops').str.get_dummies(','))
res
However, some variables seem repeated and I don't know why.
CodePudding user response:
Just add a space after , in get dummies.
res = df.join(df.pop('crops').str.get_dummies(', '))
If you dont have space ' maize' and 'maize' is a different thing so a different column etc.