Take first numeric value from string array in pandas dataframe-CodePudding

I have columns in my pandas dataframe that come in the following format for example:

df['X']:

0      [0.8242424242424241, 1.511111111111111, 2.9191...
1      [1.236363636363636, 2.438383838383838, 3.09090...
2                [1.064646464646464, 2.5757575757575752]
3      [0.583838383838383, 1.373737373737373, 2.02626...
4      [0.7898989898989891, 1.751515151515151, 2.6444...
                             ...                        
135    [1.236363636363636, 1.751515151515151, 2.26666...
136    [1.202020202020202, 2.1292929292929292, 2.7818...
137    [0.583838383838383, 1.476767676767676, 3.15959...
138    [1.236363636363636, 2.61010101010101, 3.090909...
139    [1.339393939393939, 2.7818181818181813, 3.1252...
Name: X, Length: 140, dtype: object

where df['X'][0] for example is a fully stringed array as follows:

'[0.8242424242424241, 1.511111111111111, 2.919191919191919]'

Essentially each entry is a array/vector coming through and is, as a whole, a string (note that it's NOT just the individual numeric values that are strings but the array as a whole)

I want to be able to take just the first numeric value in the string vector/array and place that in the cell of the pandas column (in place of the string array) - how can I do this?

CodePudding user response：

Use pd.eval

df['X'] = pd.eval(df['X'])

# Setup: df = pd.DataFrame({'X': ['[0, 1, 2]', '[3, 4, 5]']})
>>> df
           X
0  [0, 1, 2]
1  [3, 4, 5]

# Before pd.eval
>>> df['X'][0]
'[0, 1, 2]'

>>> type(df['X'][0])
str

# After pd.eval
>>> df['X'][0]
[0, 1, 2]

>>> type(df['X'][0])
list

CodePudding user response：

To convert each string representation of a list (str_lst) to a list you should use ast.literal_eval. Then you just need to index the first element of each list, i.e., ast.literal_eval(str_lst).

To apply this logic to each str_lst of the column 'X' you can use Series.map.

import ast 

df['X'] = df['X'].map(lambda str_lst: ast.literal_eval(str_lst)[0])

CodePudding user response：

import ast
df['a'].apply(ast.literal_eval).str[0]