I have 2 Pandas arrays like:
A =
Date Value
0 2022-03-01 50
1 2022-03-01 50
2 2022-03-03 50
3 2022-03-04 50
4 2022-03-04 50
5 2022-03-04 50
6 2022-03-04 50
7 2022-03-04 50
8 2022-03-15 50
9 2022-03-19 50
10 2022-03-23 50
B =
Value Date
Date
2022-03-01 0.0 2022-03-01
2022-03-02 0.0 2022-03-02
2022-03-03 0.0 2022-03-03
2022-03-04 0.0 2022-03-04
2022-03-05 0.0 2022-03-05
2022-03-06 0.0 2022-03-06
2022-03-07 0.0 2022-03-07
2022-03-08 0.0 2022-03-08
In addition, the A and B lists are of different lengths.
I need to form a list 'C=' in which the dates will go in order (as in the list 'B='), and 'Values' for each date equal the sum of the 'Values' of the list 'A=' for this date... All 'Values' of the list 'A=' for those dates that are not included in the dates in the list 'B=' should not be in the list 'C='.
I'm new to python I've tried many options already... But everything doesn't work:
for i in range(len(A)):
for n in range(len(B)):
if A['Date'][i] == B['Date'][i]:
C['Date'] == B['Date'][i]
C['Value'] == A['Value'][i]
else:
C['Date'] == B['Date'][i]
C['Value'] == B['Value'][i]
The resulting array should be as follows:
B =
Value Date
1 100.0 2022-03-01
2 0.0 2022-03-02
3 50.0 2022-03-03
4 250.0 2022-03-04
5 0.0 2022-03-05
6 0.0 2022-03-06
7 0.0 2022-03-07
8 0.0 2022-03-08
///////// Real arrays are:
Date: Value:
0 2022-05-06 0.000000e 00
1 2022-05-04 -4.042888e 06
2 2022-04-15 -2.742782e 06
3 2022-04-15 -9.152107e 05
4 2022-04-01 -3.608648e 07
5 2022-04-01 -1.072268e 07
6 2022-04-01 -1.885982e 07
7 2022-04-01 -1.087363e 07
8 2022-03-04 0.000000e 00
9 2022-03-04 0.000000e 00
10 2022-03-04 0.000000e 00
11 2022-03-04 0.000000e 00
12 2022-03-04 0.000000e 00
13 2022-03-04 0.000000e 00
14 2022-03-04 0.000000e 00
15 2022-03-04 0.000000e 00
16 2022-02-03 -4.369548e 06
17 2022-02-01 0.000000e 00
18 2022-02-01 0.000000e 00
19 2022-02-01 0.000000e 00
20 2022-02-01 -3.441539e 05
21 2022-02-01 0.000000e 00
22 2022-02-01 0.000000e 00
23 2022-02-01 0.000000e 00
24 2022-02-01 0.000000e 00
25 2021-11-12 -1.350750e 06
26 2021-11-09 0.000000e 00
27 2021-11-09 0.000000e 00
28 2021-11-01 -3.716895e 06
29 2021-10-19 -2.464551e 07
30 2021-10-15 -5.662351e 06
31 2021-10-15 -1.243596e 06
32 2021-10-01 -4.992055e 07
33 2021-10-01 -2.695814e 07
34 2021-10-01 -2.689851e 07
35 2021-10-01 -2.689851e 07
36 2021-09-26 0.000000e 00
37 2021-09-26 0.000000e 00
38 2021-09-26 0.000000e 00
39 2021-09-26 0.000000e 00
40 2021-09-26 0.000000e 00
41 2021-09-26 0.000000e 00
42 2021-08-24 -7.515941e 08
43 2021-08-20 0.000000e 00
44 2021-08-05 -4.674707e 06
45 2021-08-02 -2.480048e 06
46 2021-05-03 -1.607934e 07
47 2021-05-03 -2.257631e 06
48 2021-04-15 -5.224200e 06
Value: Date:
Date
2021-11-01 0.0 2021-11-01
2021-11-02 0.0 2021-11-02
2021-11-03 0.0 2021-11-03
2021-11-04 0.0 2021-11-04
2021-11-05 0.0 2021-11-05
... ... ...
2022-05-10 0.0 2022-05-10
2022-05-11 0.0 2022-05-11
2022-05-12 0.0 2022-05-12
2022-05-13 0.0 2022-05-13
2022-05-16 0.0 2022-05-16
[136 rows x 2 columns]
After applying the approach described by Nick, all 'Values' in 'C=' are equal to 0.0
Date: Value:
Date
2021-11-01 2021-11-01 0.0
2021-11-02 2021-11-02 0.0
2021-11-03 2021-11-03 0.0
2021-11-04 2021-11-04 0.0
2021-11-05 2021-11-05 0.0
... ... ...
2022-05-10 2022-05-10 0.0
2022-05-11 2022-05-11 0.0
2022-05-12 2022-05-12 0.0
2022-05-13 2022-05-13 0.0
2022-05-16 2022-05-16 0.0
[136 rows x 2 columns]
CodePudding user response:
You should be able to inner join
B
to A
grouped by Date
and summed:
C = B.join(A.groupby('Date').sum(), how='inner', lsuffix='_l').drop('Value_l', axis=1)
Output (for your sample data):
Date Value
Date
2022-03-01 2022-03-01 100
2022-03-03 2022-03-03 50
2022-03-04 2022-03-04 250
If you want to keep dates from B
with no sum, use a left join (the default) instead and fillna
with 0:
C = B.join(A.groupby('Date').sum(), lsuffix='_l').drop('Value_l', axis=1).fillna(0.0)
Output:
Date Value
Date
2022-03-01 2022-03-01 100.0
2022-03-02 2022-03-02 0.0
2022-03-03 2022-03-03 50.0
2022-03-04 2022-03-04 250.0
2022-03-05 2022-03-05 0.0
2022-03-06 2022-03-06 0.0
2022-03-07 2022-03-07 0.0
2022-03-08 2022-03-08 0.0
Finally if you want to remove Date
as the index, use reset_index
:
C = C.reset_index(drop=True)
Output:
Date Value
0 2022-03-01 100.0
1 2022-03-02 0.0
2 2022-03-03 50.0
3 2022-03-04 250.0
4 2022-03-05 0.0
5 2022-03-06 0.0
6 2022-03-07 0.0
7 2022-03-08 0.0