Home > Software design >  Concanate cells from a column with respect to an indicator
Concanate cells from a column with respect to an indicator

Time:10-27

I have the following DataFrame:

import pandas as pd

DataExe = [[2 , 1],
           [4 , 1],
           [7 , 2],
           [9 , 3],
           [10, 3],
           [15, 3],
           [19, 2],
           [24, 2],
           [27, 2],
           [29, 2],
           [37, 2],
           [38, 2]]


DataExe = pd.DataFrame(data = DataExe, columns = ['ColA', 'Indicator'])

What I am looking for is that the 'ColA' values are concented according to what 'Indicator' indicates. For example:

As it is observed in 'ColA' [2, 4] they have in common that 'Indicator' = 1. Then we create a new 'ColC' column where for each row place all the values of 'ColA'.

I must have the following:

ColA    Indicator   ColC
2       1           2, 4
4       1           2, 4
7       2           7, 19, 24, 27, 29, 37, 38
9       3           9, 10, 15
10      3           9, 10, 15
15      3           9, 10, 15
19      2           7, 19, 24, 27, 29, 37, 38
24      2           7, 19, 24, 27, 29, 37, 38
27      2           7, 19, 24, 27, 29, 37, 38
29      2           7, 19, 24, 27, 29, 37, 38
37      2           7, 19, 24, 27, 29, 37, 38
38      2           7, 19, 24, 27, 29, 37, 38

Note: ColC is a new column.

CodePudding user response:

You could use groupby transform:

DataExe['ColC'] = DataExe.groupby('Indicator')['ColA'].transform(lambda g: ','.join(map(str, g)))

Output:

    ColA  Indicator                 ColC
0      2          1                  2,4
1      4          1                  2,4
2      7          2  7,19,24,27,29,37,38
3      9          3              9,10,15
4     10          3              9,10,15
5     15          3              9,10,15
6     19          2  7,19,24,27,29,37,38
7     24          2  7,19,24,27,29,37,38
8     27          2  7,19,24,27,29,37,38
9     29          2  7,19,24,27,29,37,38
10    37          2  7,19,24,27,29,37,38
11    38          2  7,19,24,27,29,37,38

CodePudding user response:

This is an alternative that will return the groups of numbers as lists:

def GroupList(group):
    group['ColC'] = [np.array(group.ColA)]*len(group)
    return group

DataExe.groupby('Indicator').apply(GroupList)

Output:

   ColA Indicator   ColC
0   2   1   [2, 4]
1   4   1   [2, 4]
2   7   2   [7, 19, 24, 27, 29, 37, 38]
6   19  2   [7, 19, 24, 27, 29, 37, 38]
7   24  2   [7, 19, 24, 27, 29, 37, 38]
8   27  2   [7, 19, 24, 27, 29, 37, 38]
9   29  2   [7, 19, 24, 27, 29, 37, 38]
10  37  2   [7, 19, 24, 27, 29, 37, 38]
11  38  2   [7, 19, 24, 27, 29, 37, 38]
3   9   3   [9, 10, 15]
4   10  3   [9, 10, 15]
5   15  3   [9, 10, 15]

Or, more complicated syntax but in one line:

DataExe.groupby('Indicator').apply(lambda g:g.assign(ColC=[np.array(g.ColA)]*len(g))).droplevel(0)

    ColA    Indicator   ColC
0   2   1   [2 4]
1   4   1   [2 4]
2   7   2   [ 7 19 24 27 29 37 38]
6   19  2   [ 7 19 24 27 29 37 38]
7   24  2   [ 7 19 24 27 29 37 38]
8   27  2   [ 7 19 24 27 29 37 38]
9   29  2   [ 7 19 24 27 29 37 38]
10  37  2   [ 7 19 24 27 29 37 38]
11  38  2   [ 7 19 24 27 29 37 38]
3   9   3   [ 9 10 15]
4   10  3   [ 9 10 15]
5   15  3   [ 9 10 15]
  • Related