I have a dataframe and I want to select the first value in the list of each cell.
I have tried:
for i in clean_columns:
print(clean_columns[i][:][0][0])
But it selects only the first row. How do I select the first value from every row so I'm left with a dataframe minus the values in parenthesis. Thanks
CodePudding user response:
You can use applymap
:
df.applymap(lambda x: x[0])
or, stack
, use str
locator, unstack
:
df.stack().str[0].unstack()
df.apply(lambda c: c.str[0])
Example:
# input
df = pd.DataFrame([[[0, 1] for _ in range(3)] for _ in range(3)])
# 0 1 2
# 0 [0, 1] [0, 1] [0, 1]
# 1 [0, 1] [0, 1] [0, 1]
# 2 [0, 1] [0, 1] [0, 1]
# output
0 1 2
0 0 0 0
1 0 0 0
2 0 0 0
CodePudding user response:
If your lists are always the same length, you can use numpy to slice the element you want, then reconstruct the DataFrame. Might be faster since it avoids explicit looping.
import numpy as np
import pandas as pd
df = pd.DataFrame([[list('abcde') for _ in range(4)] for _ in range(3)],
columns=['Jan', 'Feb', 'Mar', 'April'])
# Jan Feb Mar April
#0 [a, b, c, d, e] [a, b, c, d, e] [a, b, c, d, e] [a, b, c, d, e]
#1 [a, b, c, d, e] [a, b, c, d, e] [a, b, c, d, e] [a, b, c, d, e]
#2 [a, b, c, d, e] [a, b, c, d, e] [a, b, c, d, e] [a, b, c, d, e]
Nelem = 0 # Element in the list you want
pd.DataFrame(np.array(df.to_numpy().tolist(), dtype='object')[:, :, Nelem],
index=df.index, columns=df.columns)
Jan Feb Mar April
0 a a a a
1 a a a a
2 a a a a