My goal is to write a loop to iterate through a DataFrame's columns and only select if the column is type list. My table looks like this:
a | b | c | |
---|---|---|---|
0 | a | ['bb', 'cc'] | d |
1 | z | ['b', 'c'] | 3 |
My code looks like this, but does not work.
df = pd.DataFrame([['a', ['bb', 'cc'], 'd'], ['z', ['b', 'c'], '3']], columns = ['a', 'b', 'c'])
df_list = [col for col in df.columns if type(list) in col]
desired output is:
b | |
---|---|
0 | ['bb', 'cc'] |
1 | ['b', 'c'] |
CodePudding user response:
df_list = [i for i in df.columns if len(pd.DataFrame(df[i].to_list()).T) > 1]
df[df_list]
Output:
b
0 [bb, cc]
1 [b, c]
if you make no list column to dataframe after to_list, we can get n X 1 dataframe.
so we can dataframe by chk len(df.T) > 1
CodePudding user response:
There is not pandas way to check this. You would need to use pure python.
If you can rely on testing only the first row, this should be efficient:
mask = df.iloc[0].apply(lambda x: isinstance(x, list))
df.loc[:, mask]
If you need to test all cells, use applymap
and all
(or any
if a single list is sufficient to select a column). Note that this might be slow on large dataframes.
mask = df.applymap(lambda x: isinstance(x, list)).all()
df.loc[:, mask]
Output:
b
0 [bb, cc]
1 [b, c]
CodePudding user response:
You didn't specify if all the rows of a specific column in your DataFrame are of the same type. Looking at column c
, it seems like they might be of different types. In that case, do you want columns that have ALL rows as a list or columns that have ANY of its rows as a list?
In either case, you can use boolean indexing
to filter the database as follows.
To find out the columns that have a list in all rows:
status = (df.applymap(type).astype(str) == "<class 'list'>").all()
Or, to find out the columns that have a list in any of its rows:
status = (df.applymap(type).astype(str) == "<class 'list'>").any()
Afterwards you can obtain the result by:
target_columns = list((status.loc[status == True]).index)
df = df[target_columns]