Trying to convert after encoding to integers but they are objects so i first turn them into strings
train_df["labels"] = train_df["labels"].astype(str).astype(int)
I am getting this error
invalid literal for int() with base 10: '[0, 1, 0, 0]
An example of a row from the dataset is
text labels
[word1,word2,word3,word4] [1,0,1,0]
CodePudding user response:
From the looks of it, your problem arises from that numbers represented as strings may be floats. If that is the problem, then the below should solve it:
train_df["labels"] = train_df["labels"].astype(str).astype(float).astype(int)
(In Python you can't convert string representation of float numbers into int
type.)
From the error, I suspect that your string actually includes brackets and commas (which is not crystal clear from the question). If that's the case you need to tell Python how to deal with them. For example, if train_df["labels"] is equal to "[1,0,1,0]" then you can use below:
train_df_labels = [int(label) for label in train_df["labels"][1:-1].split(',').strip()]
#first getting rid of the brackets in the string,
#then splitting the string at commas and getting rid of the spaces,
#finally, converting values to int type one by one and making a list out of them
CodePudding user response:
It's because after train_df["labels"].astype(str)
, this Series became a Series of lists, so you can't convert a list into type int
.
If each element in train_df["labels"]
is of type list
, you can do:
train_df["labels"].apply(lambda x: [int(el) for el in x])
If it's of type str
, you can do:
train_df["labels"].apply(lambda x: [int(el) for el in x.strip("[]").split(",")])
You presumably you want to train some model but you can't use pd.Series of lists to do it. You'll need to convert this into a DataFrame. I can't say how to do that without looking at more than 1 line of data.