I have multiple columns with the same name mixed with other columns. Some of these columns that I want to combine have null in the rows. For instance,
apple_0 apple_1
0 abc None
1 abc efg
2 hig None
3 dsf None
and I want:
apple
0 abc
1 abc, efg
2 hig
3 dsf
I have like 85 of these columns. The actual names are: scheduleSettings_nodes_0_name, scheduleSettings_nodes_1_name and so on How can I combine these?
CodePudding user response:
In addition to the other answers, you could also try using agg
something like this.
df = pd.DataFrame({'apple_0': ['abc', 'abc', 'hig', 'dsf'], 'apple_1': [None, 'efg', None, None]})
selected_cols = [col for col in df.columns if col.startswith('apple')]
df['apple'] = df[selected_cols].agg(lambda x: ', '.join(map(str, filter(None, x))), axis=1)
CodePudding user response:
Option 1: using stacking to drop the null, then aggregation per group.
(df.filter(like='apple_')
.replace('None', pd.NA)
.stack()
.groupby(level=0).agg(','.join)
.reindex(df.index)
.to_frame('apple')
)
Option 2: using an internal loop with agg
.
(df.filter(like='apple_')
.replace('None', pd.NA)
.agg(lambda r: ','.join(x for x in r if pd.notna(x)), axis=1)
.to_frame('apple')
)
Output:
apple
0 abc
1 abc,efg
2 hig
3 dsf
CodePudding user response:
You can use df.apply
with axis=1
to apply a function to each row. This function can use str.join
to join the elements of all columns that match apple_*
if the elements are not NA
def join_cols(row, sep, col_names):
return sep.join(row[c]
for c in col_names
if not pd.isna(row[c]))
cols_to_combine = ["apple_0", "apple_1"]
df["apple"] = df.apply(join_cols, axis=1, args=(",", cols_to_combine))
df.drop(columns=cols_to_combine, inplace=True)
Which gives your desired output:
apple
0 abc
1 abc,efg
2 hig
3 dsf
To figure out which columns match your pattern, you could do:
cols_to_combine = [c for c in df.columns if c.startswith("apple_")]
There's probably a way to vectorize this, which would be preferred over apply()
in terms of speed