Home > Software engineering >  How can two arrays be joined by a common column in Pandas?
How can two arrays be joined by a common column in Pandas?

Time:06-21

I have 2 Pandas arrays like:

A = 
        Date       Value
0    2022-03-01     50
1    2022-03-01     50
2    2022-03-03     50
3    2022-03-04     50
4    2022-03-04     50
5    2022-03-04     50
6    2022-03-04     50
7    2022-03-04     50
8    2022-03-15     50
9    2022-03-19     50
10   2022-03-23     50


B =
            Value   Date
Date                                                                             
2022-03-01  0.0   2022-03-01
2022-03-02  0.0   2022-03-02
2022-03-03  0.0   2022-03-03
2022-03-04  0.0   2022-03-04
2022-03-05  0.0   2022-03-05
2022-03-06  0.0   2022-03-06
2022-03-07  0.0   2022-03-07
2022-03-08  0.0   2022-03-08

In addition, the A and B lists are of different lengths.

I need to form a list 'C=' in which the dates will go in order (as in the list 'B='), and 'Values' for each date equal the sum of the 'Values' of the list 'A=' for this date... All 'Values' of the list 'A=' for those dates that are not included in the dates in the list 'B=' should not be in the list 'C='.

I'm new to python I've tried many options already... But everything doesn't work:

for i in range(len(A)):
    for n in range(len(B)):
        if A['Date'][i] == B['Date'][i]:
            C['Date'] == B['Date'][i]
            C['Value'] == A['Value'][i]
        else:
            C['Date'] == B['Date'][i]
            C['Value'] == B['Value'][i]

The resulting array should be as follows:

B =
   Value   Date                                                                             
1  100.0   2022-03-01
2  0.0     2022-03-02
3  50.0    2022-03-03
4  250.0   2022-03-04
5  0.0     2022-03-05
6  0.0     2022-03-06
7  0.0     2022-03-07
8  0.0     2022-03-08

///////// Real arrays are:

     Date:           Value:
0   2022-05-06       0.000000e 00
1   2022-05-04      -4.042888e 06
2   2022-04-15      -2.742782e 06
3   2022-04-15      -9.152107e 05
4   2022-04-01      -3.608648e 07
5   2022-04-01      -1.072268e 07
6   2022-04-01      -1.885982e 07
7   2022-04-01      -1.087363e 07
8   2022-03-04       0.000000e 00
9   2022-03-04       0.000000e 00
10  2022-03-04       0.000000e 00
11  2022-03-04       0.000000e 00
12  2022-03-04       0.000000e 00
13  2022-03-04       0.000000e 00
14  2022-03-04       0.000000e 00
15  2022-03-04       0.000000e 00
16  2022-02-03      -4.369548e 06
17  2022-02-01       0.000000e 00
18  2022-02-01       0.000000e 00
19  2022-02-01       0.000000e 00
20  2022-02-01      -3.441539e 05
21  2022-02-01       0.000000e 00
22  2022-02-01       0.000000e 00
23  2022-02-01       0.000000e 00
24  2022-02-01       0.000000e 00
25  2021-11-12      -1.350750e 06
26  2021-11-09       0.000000e 00
27  2021-11-09       0.000000e 00
28  2021-11-01      -3.716895e 06
29  2021-10-19      -2.464551e 07
30  2021-10-15      -5.662351e 06
31  2021-10-15      -1.243596e 06
32  2021-10-01      -4.992055e 07
33  2021-10-01      -2.695814e 07
34  2021-10-01      -2.689851e 07
35  2021-10-01      -2.689851e 07
36  2021-09-26       0.000000e 00
37  2021-09-26       0.000000e 00
38  2021-09-26       0.000000e 00
39  2021-09-26       0.000000e 00
40  2021-09-26       0.000000e 00
41  2021-09-26       0.000000e 00
42  2021-08-24      -7.515941e 08
43  2021-08-20       0.000000e 00
44  2021-08-05      -4.674707e 06
45  2021-08-02      -2.480048e 06
46  2021-05-03      -1.607934e 07
47  2021-05-03      -2.257631e 06
48  2021-04-15      -5.224200e 06
                        Value:   Date:
Date                                    
2021-11-01                0.0 2021-11-01
2021-11-02                0.0 2021-11-02
2021-11-03                0.0 2021-11-03
2021-11-04                0.0 2021-11-04
2021-11-05                0.0 2021-11-05
...                       ...        ...
2022-05-10                0.0 2022-05-10
2022-05-11                0.0 2022-05-11
2022-05-12                0.0 2022-05-12
2022-05-13                0.0 2022-05-13
2022-05-16                0.0 2022-05-16

[136 rows x 2 columns]

After applying the approach described by Nick, all 'Values' in 'C=' are equal to 0.0

                Date:               Value:
Date                                    
2021-11-01 2021-11-01                0.0
2021-11-02 2021-11-02                0.0
2021-11-03 2021-11-03                0.0
2021-11-04 2021-11-04                0.0
2021-11-05 2021-11-05                0.0
...               ...                ...
2022-05-10 2022-05-10                0.0
2022-05-11 2022-05-11                0.0
2022-05-12 2022-05-12                0.0
2022-05-13 2022-05-13                0.0
2022-05-16 2022-05-16                0.0

[136 rows x 2 columns]

CodePudding user response:

You should be able to inner join B to A grouped by Date and summed:

C = B.join(A.groupby('Date').sum(), how='inner', lsuffix='_l').drop('Value_l', axis=1)

Output (for your sample data):

                  Date  Value
Date
2022-03-01  2022-03-01    100
2022-03-03  2022-03-03     50
2022-03-04  2022-03-04    250

If you want to keep dates from B with no sum, use a left join (the default) instead and fillna with 0:

C = B.join(A.groupby('Date').sum(), lsuffix='_l').drop('Value_l', axis=1).fillna(0.0)

Output:

                  Date  Value
Date
2022-03-01  2022-03-01  100.0
2022-03-02  2022-03-02    0.0
2022-03-03  2022-03-03   50.0
2022-03-04  2022-03-04  250.0
2022-03-05  2022-03-05    0.0
2022-03-06  2022-03-06    0.0
2022-03-07  2022-03-07    0.0
2022-03-08  2022-03-08    0.0

Finally if you want to remove Date as the index, use reset_index:

C = C.reset_index(drop=True)

Output:

         Date  Value
0  2022-03-01  100.0
1  2022-03-02    0.0
2  2022-03-03   50.0
3  2022-03-04  250.0
4  2022-03-05    0.0
5  2022-03-06    0.0
6  2022-03-07    0.0
7  2022-03-08    0.0
  • Related