How to encode a dataframe in Python from strings to integers?-CodePudding

I have a dataframe that consists of 2 columns - df[['c1','c2']] In those columns there are only 3 unique string values - a, b and c. I would like to convert those values into 3 numbers to perform data analysis. I think it should be a map or a dictionary, but I keep getting errors.

CodePudding user response：

You can use

Or you can do a frequency chart via pie as follows

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df["c1"].value_counts().plot(kind="pie", ax=ax[0])
df["c2"].value_counts().plot(kind="pie", ax=ax[1])
plt.show()

Or if you are working with seaborn, that'll make it easier as there'll be no conversion involved at all.

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
sns.countplot(x="c1", data=df, ax=ax[0])
sns.countplot(x="c2", data=df, ax=ax[1])
plt.show()

Or you can do a scatter plot like this

fig, ax = plt.subplots(1, 1, figsize=(10, 5))
sns.scatterplot(x="c2", y="c1", data=df, ax=ax)
plt.show()

With that being said, it wont make your data ready for a machine learning model, so you'll need to use OneHotEncoder or LabelEncode from sklearn to convert it to a integral form.

You can do it with sklearn as follows.

For example with LabelEncoder,

le = LabelEncoder()
df["c1"] = le.fit_transform(df["c1"])
df["c2"] = le.fit_transform(df["c2"])
print(df)

This will map a,b,c to an integer and the result will be