Assuming I'm dealing with this dataframe:
Element List |
---|
[123, 1234, abc-123, abc-1234] |
[abc-321] |
nan |
... |
As you may recognize, '[]' represents the data type in 'Element List' column is list
My desired output should be like this:
Element List | abc | Others |
---|---|---|
[123, 1234, abc-123, abc-1234] | [abc-123, abc-1234] | [123,1234] |
[abc-321] | [abc-321] | nan |
nan | nan | nan |
... | ... | ... |
The point is how to extract the subset of list by a condition (like include, or is in) and list it in different columns. Here, 'abc' column includes the subset that includes abc string, and 'Others' column includes the complement of selected subset of list.
Have no idea to deal with the list data type in a column... Sorry for very naive question.
CodePudding user response:
Try:
mask = pd.notna(df["Element List"])
df[["abc", "Others"]] = df.loc[mask, "Element List"].apply(
lambda x: pd.Series(
{
"abc": [v for v in x if v.startswith("abc")] or np.nan,
"Others": [v for v in x if not v.startswith("abc")] or np.nan,
}
)
)
print(df)
Prints:
Element List abc Others
0 [123, 1234, abc-123, abc-1234] [abc-123, abc-1234] [123, 1234]
1 [abc-321] [abc-321] NaN
2 NaN NaN NaN