Each cell in a dataframe contains a list of values - how do I select the first value?-CodePudding

I have a dataframe and I want to select the first value in the list of each cell.

I have tried:

for i in clean_columns:
    print(clean_columns[i][:][0][0])

But it selects only the first row. How do I select the first value from every row so I'm left with a dataframe minus the values in parenthesis. Thanks

CodePudding user response：

You can use applymap:

df.applymap(lambda x: x[0])

or, stack, use str locator, unstack:

df.stack().str[0].unstack()

or, apply with str:

df.apply(lambda c: c.str[0])

Example:

# input
df = pd.DataFrame([[[0, 1] for _ in range(3)] for _ in range(3)])
#         0       1       2
# 0  [0, 1]  [0, 1]  [0, 1]
# 1  [0, 1]  [0, 1]  [0, 1]
# 2  [0, 1]  [0, 1]  [0, 1]

# output
   0  1  2
0  0  0  0
1  0  0  0
2  0  0  0

CodePudding user response：

If your lists are always the same length, you can use numpy to slice the element you want, then reconstruct the DataFrame. Might be faster since it avoids explicit looping.

import numpy as np
import pandas as pd

df = pd.DataFrame([[list('abcde') for _ in range(4)] for _ in range(3)],
                  columns=['Jan', 'Feb', 'Mar', 'April'])
#               Jan              Feb              Mar            April
#0  [a, b, c, d, e]  [a, b, c, d, e]  [a, b, c, d, e]  [a, b, c, d, e]
#1  [a, b, c, d, e]  [a, b, c, d, e]  [a, b, c, d, e]  [a, b, c, d, e]
#2  [a, b, c, d, e]  [a, b, c, d, e]  [a, b, c, d, e]  [a, b, c, d, e]   


Nelem = 0  # Element in the list you want
pd.DataFrame(np.array(df.to_numpy().tolist(), dtype='object')[:, :, Nelem],
             index=df.index, columns=df.columns)

  Jan Feb Mar April
0   a   a   a     a
1   a   a   a     a
2   a   a   a     a