Home > Net >  How to join/merge and sum columns with the same name
How to join/merge and sum columns with the same name

Time:05-03

How can I merge and sum the columns with the same name?

So the output should be 1 Column named Canada as a result of the sum of the 4 columns named Canada.

Country/Region  Brazil  Canada  Canada  Canada  Canada  
Week 1               0       3       0       0       0   
Week 2               0      17       0       0       0   
Week 3               0      21       0       0       0   
Week 4               0      21       0       0       0   
Week 5               0      23       0       0       0   
Week 6               0      80       0       5       0   
Week 7               0     194       0      20       0   
Week 8              12     702       3     199      20   
Week 9             182    2679      16    2395     260   
Week 10            737    8711      80   17928     892   
Week 11           1674   25497     153   48195    1597   
Week 12           2923   46392     175   85563    2003   
Week 13           4516   76095     182  122431    2180   
Week 14           6002  105386     183  163539    2431   
Week 15           6751  127713     189  210409    2995   
Week 16           7081  147716     189  258188    3845 

CodePudding user response:

Try:

np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,100,(20,5)), columns=[*'ZAABC'])

df.groupby(df.columns, axis=1, sort=False).sum()

Output:

     Z    A   B   C
0   44  111  67  67
1    9  104  36  87
2   70  176  12  58
3   65  126  46  88
4   81   62  77  72
5    9  100  69  79
6   47  146  99  88
7   49   48  19  14
8   39   97   9  57
9   32  105  23  35
10  75   83  34   0
11   0   89   5  38
12  17   83  42  58
13  31   66  41  57
14  35   57  82  91
15   0  113  53  12
16  42  159  68   6
17  68   50  76  52
18  78   35  99  58
19  23   92  85  48

CodePudding user response:

From its current state, this should give the outcome you're looking for:

df = df.set_index('Country/Region') # optional
df.groupby(df.columns, axis=1).sum() # Stolen from Scott Boston as it's a superior method.

Output:

index           Brazil  Canada
Country/Region
Week 1               0       3
Week 2               0      17
Week 3               0      21
Week 4               0      21
Week 5               0      23
Week 6               0      85
Week 7               0     214
Week 8              12     924
Week 9             182    5350
Week 10            737   27611
Week 11           1674   75442
Week 12           2923  134133
Week 13           4516  200888
Week 14           6002  271539
Week 15           6751  341306
Week 16           7081  409938

I found your dataset interesting, here's how I would clean it up from step 1:

df = pd.read_csv('file.csv')
df = df.set_index(['Province/State', 'Country/Region', 'Lat', 'Long']).stack().reset_index()
df.columns = ['Province/State', 'Country/Region', 'Lat', 'Long', 'date', 'value']
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.pivot_table(index=df.index, columns='Country/Region', values='value', aggfunc=np.sum)
print(df)

Output:

Country/Region  Afghanistan  Albania  Algeria  Andorra  Angola  ...  West Bank and Gaza  Western Sahara  Yemen  Zambia  Zimbabwe
date                                                            ...
2020-01-22                0        0        0        0       0  ...                   0               0      0       0         0
2020-01-23                0        0        0        0       0  ...                   0               0      0       0         0
2020-01-24                0        0        0        0       0  ...                   0               0      0       0         0
2020-01-25                0        0        0        0       0  ...                   0               0      0       0         0
2020-01-26                0        0        0        0       0  ...                   0               0      0       0         0
...                     ...      ...      ...      ...     ...  ...                 ...             ...    ...     ...       ...
2020-07-30            36542     5197    29831      922    1109  ...               11548              10   1726    5555      3092
2020-07-31            36675     5276    30394      925    1148  ...               11837              10   1728    5963      3169
2020-08-01            36710     5396    30950      925    1164  ...               12160              10   1730    6228      3659
2020-08-02            36710     5519    31465      925    1199  ...               12297              10   1734    6347      3921
2020-08-03            36747     5620    31972      937    1280  ...               12541              10   1734    6580      4075

If you now want to do weekly aggregations, it's as simple as:

print(df.resample('w').sum())

Output:

Country/Region  Afghanistan  Albania  Algeria  Andorra  Angola  ...  West Bank and Gaza  Western Sahara  Yemen  Zambia  Zimbabwe
date                                                            ...
2020-01-26                0        0        0        0       0  ...                   0               0      0       0         0
2020-02-02                0        0        0        0       0  ...                   0               0      0       0         0
2020-02-09                0        0        0        0       0  ...                   0               0      0       0         0
2020-02-16                0        0        0        0       0  ...                   0               0      0       0         0
2020-02-23                0        0        0        0       0  ...                   0               0      0       0         0
2020-03-01                7        0        6        0       0  ...                   0               0      0       0         0
2020-03-08               10        0       85        7       0  ...                  43               0      0       0         0
2020-03-15               57      160      195        7       0  ...                 209               0      0       0         0
2020-03-22              175      464      705      409       5  ...                 309               0      0      11         7
2020-03-29              632     1142     2537     1618      29  ...                 559               0      0     113        31
2020-04-05             1783     2000     6875     2970      62  ...                1178               4      0     262        59
2020-04-12             3401     2864    11629     4057     128  ...                1847              30      3     279        84
2020-04-19             5838     3603    16062     4764     143  ...                2081              42      7     356       154
2020-04-26             8918     4606    21211     5087     174  ...                2353              42      7     541       200
2020-05-03            15149     5391    27943     5214     208  ...                2432              42     41     738       244
2020-05-10            25286     5871    36315     5265     274  ...                2607              42    203    1260       241
2020-05-17            39634     6321    45122     5317     327  ...                2632              42    632    3894       274
2020-05-24            61342     6798    54185     5332     402  ...                2869              45   1321    5991       354
2020-05-31            91885     7517    62849     5344     536  ...                3073              63   1932    7125       894
2020-06-07           126442     8378    68842     5868     609  ...                3221              63   3060    7623      1694
2020-06-14           159822     9689    74147     5967     827  ...                3396              63   4236    8836      2335
2020-06-21           191378    12463    79737     5981    1142  ...                4466              63   6322    9905      3089
2020-06-28           210487    15349    87615     5985    1522  ...               10242              70   7360   10512      3813
2020-07-05           224560    18707   102918     5985    2186  ...               21897              70   8450   11322      4426
2020-07-12           237087    22399   124588     5985    2940  ...               36949              70   9489   13002      6200
2020-07-19           245264    26845   149611     6098    4279  ...               52323              70  10855   16350      9058
2020-07-26           250970    31255   178605     6237    5919  ...               68154              70  11571   26749     14933
2020-08-02           255739    36370   208457     6429    7648  ...               80685              70  12023   38896     22241
2020-08-09            36747     5620    31972      937    1280  ...               12541              10   1734    6580      4075

CodePudding user response:

You can try a transpose and groupby, e.g. something similar to the below.

df_T = df.tranpose()
df_T.groupby(df_T.index).sum()['Canada']

CodePudding user response:

Here's a way to do it:

df.columns = [(col   str(i)) if col.startswith('Canada') else col for i, col in enumerate(df.columns)]
df = df.assign(Canada=df.filter(like='Canada').sum(axis=1)).drop(columns=[x for x in df.columns if x.startswith('Canada') and x != 'Canada'])

First we rename the columns starting with Canada by appending their integer position, which ensures they are no longer duplicates.

Then we use sum() to add across columns like Canada, put the result in a new column named Canada, and drop the columns that were originally named Canada.

Full test code is:

import pandas as pd
df = pd.DataFrame(
    columns=[x.strip() for x in 'Brazil  Canada  Canada  Canada  Canada'.split()],
    index=['Week '   str(i) for i in range(1, 17)],
    data=[[i] * 5 for i in range(1, 17)])
df.columns.names=['Country/Region']
print(df)

df.columns = [(col   str(i)) if col.startswith('Canada') else col for i, col in enumerate(df.columns)]
df = df.assign(Canada=df.filter(like='Canada').sum(axis=1)).drop(columns=[x for x in df.columns if x.startswith('Canada') and x != 'Canada'])
print(df)

Output:

Country/Region  Brazil  Canada  Canada  Canada  Canada
Week 1               1       1       1       1       1
Week 2               2       2       2       2       2
Week 3               3       3       3       3       3
Week 4               4       4       4       4       4
Week 5               5       5       5       5       5
Week 6               6       6       6       6       6
Week 7               7       7       7       7       7
Week 8               8       8       8       8       8
Week 9               9       9       9       9       9
Week 10             10      10      10      10      10
Week 11             11      11      11      11      11
Week 12             12      12      12      12      12
Week 13             13      13      13      13      13
Week 14             14      14      14      14      14
Week 15             15      15      15      15      15
Week 16             16      16      16      16      16
         Brazil  Canada
Week 1        1       4
Week 2        2       8
Week 3        3      12
Week 4        4      16
Week 5        5      20
Week 6        6      24
Week 7        7      28
Week 8        8      32
Week 9        9      36
Week 10      10      40
Week 11      11      44
Week 12      12      48
Week 13      13      52
Week 14      14      56
Week 15      15      60
Week 16      16      64
  • Related