In a data frame similar to the one below how can I create the Concatenation column based on the date of each activity?
Activity A | Activity B | Activity C | Concatenation |
---|---|---|---|
1/1/2022 | 1/15/2022 | 2/3/2022 | Activity A --> Activity B --> Activity C |
1/15/2022 | 2/3/2022 | 1/1/2022 | Activity C --> Activity A --> Activity B |
CodePudding user response:
You can use numpy's argsort
:
df2 = df.filter(like='Activity').apply(pd.to_datetime, dayfirst=False)
df['Concatenation'] = list(map(' -> '.join, df2.columns.to_numpy()[np.argsort(df2.to_numpy())]))
Or with pandas only (less efficient):
df['Concatenation'] = (df
.filter(like='Activity')
.apply(pd.to_datetime, dayfirst=False)
.apply(lambda s: ' -> '.join(s.sort_values().index), axis=1)
)
output:
Activity A Activity B Activity C Concatenation
0 1/1/2022 1/15/2022 2/3/2022 Activity A -> Activity B -> Activity C
1 1/15/2022 2/3/2022 1/1/2022 Activity C -> Activity A -> Activity B
CodePudding user response:
Suppose you have the data in a file called sample.csv, I would do lke this
df = pd.read_csv('sample.csv')
df
Activity A Activity B Activity C
0 1/1/2022 1/15/2022 2/3/2022
1 1/15/2022 2/3/2022 1/1/2022
df['Concatenation'] = df[['Activity A', 'Activity B', 'Activity C']].apply(lambda x: ' -> '.join(list(x)), axis=1)
df
Activity A Activity B Activity C Concatenation
0 1/1/2022 1/15/2022 2/3/2022 1/1/2022 -> 1/15/2022 -> 2/3/2022
1 1/15/2022 2/3/2022 1/1/2022 1/15/2022 -> 2/3/2022 -> 1/1/2022