Home > other >  Convert column of lists to integer
Convert column of lists to integer

Time:12-01

Trying to convert after encoding to integers but they are objects so i first turn them into strings

train_df["labels"] = train_df["labels"].astype(str).astype(int)

I am getting this error

invalid literal for int() with base 10: '[0, 1, 0, 0]

An example of a row from the dataset is

text                        labels
[word1,word2,word3,word4]    [1,0,1,0]

CodePudding user response:

From the looks of it, your problem arises from that numbers represented as strings may be floats. If that is the problem, then the below should solve it:

train_df["labels"] = train_df["labels"].astype(str).astype(float).astype(int)

(In Python you can't convert string representation of float numbers into int type.)

From the error, I suspect that your string actually includes brackets and commas (which is not crystal clear from the question). If that's the case you need to tell Python how to deal with them. For example, if train_df["labels"] is equal to "[1,0,1,0]" then you can use below:

train_df_labels = [int(label) for label in train_df["labels"][1:-1].split(',').strip()]

#first getting rid of the brackets in the string, 
#then splitting the string at commas and getting rid of the spaces,
#finally, converting values to int type one by one and making a list out of them

CodePudding user response:

It's because after train_df["labels"].astype(str), this Series became a Series of lists, so you can't convert a list into type int.

If each element in train_df["labels"] is of type list, you can do:

train_df["labels"].apply(lambda x: [int(el) for el in x])

If it's of type str, you can do:

train_df["labels"].apply(lambda x: [int(el) for el in x.strip("[]").split(",")])

You presumably you want to train some model but you can't use pd.Series of lists to do it. You'll need to convert this into a DataFrame. I can't say how to do that without looking at more than 1 line of data.

  • Related