I am trying to read a csv file that contains a column, SpType, in which there are String values. My variable is being converted into an object, but I need it to be float type. Here's the snippet:
data = pd.read_csv("/content/Star3642_balanced.csv")
X_orig = data[["Vmag", "Plx", "e_Plx", "B-V", "SpType", "Amag"]].to_numpy()
Here's what's giving me the error:
X = torch.tensor(X_orig, dtype=torch.float32)
The error reads "can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool."
I tried doing this after reading the csv file, but it didn't help:
data["SpType"] = data.SpType.astype(float)
Can someone please tell me what can be done about this?
CodePudding user response:
Strings should be encoded into numeric values. The easiest way would be using Pandas one-hot encoding (that will create lots of extra columns in this case, but a neural network should process those without much effort):
ohe = pd.get_dummies(data["SpType"], drop_first=True)
data[ohe.columns] = ohe
data = data.drop(["SpType"], axis=1)
Alternatively, you may use sklearn encoders or category_encoders library - more complex encoding might require to process the test set separately to avoid the target leakage.