I have a dataframe that consists of 2 columns - df[['c1','c2']] In those columns there are only 3 unique string values - a, b and c. I would like to convert those values into 3 numbers to perform data analysis. I think it should be a map or a dictionary, but I keep getting errors.
CodePudding user response:
Or you can do a frequency chart via pie as follows
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df["c1"].value_counts().plot(kind="pie", ax=ax[0])
df["c2"].value_counts().plot(kind="pie", ax=ax[1])
plt.show()
Or if you are working with seaborn, that'll make it easier as there'll be no conversion involved at all.
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
sns.countplot(x="c1", data=df, ax=ax[0])
sns.countplot(x="c2", data=df, ax=ax[1])
plt.show()
Or you can do a scatter plot like this
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
sns.scatterplot(x="c2", y="c1", data=df, ax=ax)
plt.show()
With that being said, it wont make your data ready for a machine learning model, so you'll need to use OneHotEncoder
or LabelEncode
from sklearn to convert it to a integral form.
You can do it with sklearn as follows.
For example with LabelEncoder,
le = LabelEncoder()
df["c1"] = le.fit_transform(df["c1"])
df["c2"] = le.fit_transform(df["c2"])
print(df)
This will map a,b,c to an integer and the result will be
c1 c2
0 0 0
1 0 1
2 0 0
3 1 0
4 2 0
5 2 1
6 2 1
7 2 2