Return subset elements of list by condition (include string) python pandas-CodePudding

Assuming I'm dealing with this dataframe:

Element List
[123, 1234, abc-123, abc-1234]
[abc-321]
nan
...

As you may recognize, '[]' represents the data type in 'Element List' column is list

My desired output should be like this:

Element List	abc	Others
[123, 1234, abc-123, abc-1234]	[abc-123, abc-1234]	[123,1234]
[abc-321]	[abc-321]	nan
nan	nan	nan
...	...	...

The point is how to extract the subset of list by a condition (like include, or is in) and list it in different columns. Here, 'abc' column includes the subset that includes abc string, and 'Others' column includes the complement of selected subset of list.

Have no idea to deal with the list data type in a column... Sorry for very naive question.

CodePudding user response：

Try:

mask = pd.notna(df["Element List"])

df[["abc", "Others"]] = df.loc[mask, "Element List"].apply(
    lambda x: pd.Series(
        {
            "abc": [v for v in x if v.startswith("abc")] or np.nan,
            "Others": [v for v in x if not v.startswith("abc")] or np.nan,
        }
    )
)

print(df)

Prints:

                     Element List                  abc       Others
0  [123, 1234, abc-123, abc-1234]  [abc-123, abc-1234]  [123, 1234]
1                       [abc-321]            [abc-321]          NaN
2                             NaN                  NaN          NaN