Home > front end >  Conver list to int inside a dataframe
Conver list to int inside a dataframe

Time:09-17

I have a problem. My following column contains a list in inside that list there is int. How could I convert that kind of list to an int value?

0       [1]
1       [0]
2       [1]
3       [0]
4       [0]
       ... 
9869    [1]
9870    [1]
9871    [1]
9872    [0]
9873    [0]
Name: predicted, Length: 9874, dtype: object

What I tried

df_testing['predicted'] = df_testing['predicted'].astype('str') 
df_testing['predicted'].replace('[','',inplace=True)
df_testing['predicted'].replace(']','',inplace=True)
df_testing.predicted = pd.to_numeric(df_testing.predicted, errors='coerce')

[OUT]
0      NaN
1      NaN
---

df_testing['predicted'] = df_testing['predicted'].apply(lambda x: list(map(int, x)))
[OUT]
ValueError: invalid literal for int() with base 10: '['
--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [45], in <cell line: 1>()
----> 1 df_testing['predicted'] = df_testing['predicted'].apply(lambda x: list(map(int, x)))

File ~\Anaconda3\lib\site-packages\pandas\core\series.py:4433, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4323 def apply(
   4324     self,
   4325     func: AggFuncType,
   (...)
   4328     **kwargs,
   4329 ) -> DataFrame | Series:
   4330     """
   4331     Invoke function on values of Series.
   4332 
   (...)
   4431     dtype: float64
   4432     """
-> 4433     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~\Anaconda3\lib\site-packages\pandas\core\apply.py:1082, in SeriesApply.apply(self)
   1078 if isinstance(self.f, str):
   1079     # if we are a string, try to dispatch
   1080     return self.apply_str()
-> 1082 return self.apply_standard()

File ~\Anaconda3\lib\site-packages\pandas\core\apply.py:1137, in SeriesApply.apply_standard(self)
   1131         values = obj.astype(object)._values
   1132         # error: Argument 2 to "map_infer" has incompatible type
   1133         # "Union[Callable[..., Any], str, List[Union[Callable[..., Any], str]],
   1134         # Dict[Hashable, Union[Union[Callable[..., Any], str],
   1135         # List[Union[Callable[..., Any], str]]]]]"; expected
   1136         # "Callable[[Any], Any]"
-> 1137         mapped = lib.map_infer(
   1138             values,
   1139             f,  # type: ignore[arg-type]
   1140             convert=self.convert_dtype,
   1141         )
   1143 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1144     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1145     #  See also GH#25959 regarding EA support
   1146     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~\Anaconda3\lib\site-packages\pandas\_libs\lib.pyx:2870, in pandas._libs.lib.map_infer()

Input In [45], in <lambda>(x)
----> 1 df_testing['predicted'] = df_testing['predicted'].apply(lambda x: list(map(int, x)))

ValueError: invalid literal for int() with base 10: '['

CodePudding user response:

Here is one more way to do it, since its a single value in the square bracket, treat it as a string and then strip off the brackets and convert to int (if value is believed to be all int)

df['predicted']=df['predicted'].str.strip('[|]').astype(int)
df
    predicted
0       1
1       0
2       1
3       0
4       0
9869    1
9870    1
9871    1
9872    0
9873    0

CodePudding user response:

If you are sure that each row contains a list with elements you can use a lambda function:

df_testing['predicted'] = df_testing['predicted'].apply(lambda x: x[0])

Otherwise you can create a function which checks if there are elements and take the mean for example.

I assumed that the list elements are numeric, otherwise you can also change the type in the function.

  • Related