How can I merge and sum the columns with the same name?
So the output should be 1 Column named Canada as a result of the sum of the 4 columns named Canada.
Country/Region Brazil Canada Canada Canada Canada
Week 1 0 3 0 0 0
Week 2 0 17 0 0 0
Week 3 0 21 0 0 0
Week 4 0 21 0 0 0
Week 5 0 23 0 0 0
Week 6 0 80 0 5 0
Week 7 0 194 0 20 0
Week 8 12 702 3 199 20
Week 9 182 2679 16 2395 260
Week 10 737 8711 80 17928 892
Week 11 1674 25497 153 48195 1597
Week 12 2923 46392 175 85563 2003
Week 13 4516 76095 182 122431 2180
Week 14 6002 105386 183 163539 2431
Week 15 6751 127713 189 210409 2995
Week 16 7081 147716 189 258188 3845
CodePudding user response:
Try:
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,100,(20,5)), columns=[*'ZAABC'])
df.groupby(df.columns, axis=1, sort=False).sum()
Output:
Z A B C
0 44 111 67 67
1 9 104 36 87
2 70 176 12 58
3 65 126 46 88
4 81 62 77 72
5 9 100 69 79
6 47 146 99 88
7 49 48 19 14
8 39 97 9 57
9 32 105 23 35
10 75 83 34 0
11 0 89 5 38
12 17 83 42 58
13 31 66 41 57
14 35 57 82 91
15 0 113 53 12
16 42 159 68 6
17 68 50 76 52
18 78 35 99 58
19 23 92 85 48
CodePudding user response:
From its current state, this should give the outcome you're looking for:
df = df.set_index('Country/Region') # optional
df.groupby(df.columns, axis=1).sum() # Stolen from Scott Boston as it's a superior method.
Output:
index Brazil Canada
Country/Region
Week 1 0 3
Week 2 0 17
Week 3 0 21
Week 4 0 21
Week 5 0 23
Week 6 0 85
Week 7 0 214
Week 8 12 924
Week 9 182 5350
Week 10 737 27611
Week 11 1674 75442
Week 12 2923 134133
Week 13 4516 200888
Week 14 6002 271539
Week 15 6751 341306
Week 16 7081 409938
I found your dataset interesting, here's how I would clean it up from step 1:
df = pd.read_csv('file.csv')
df = df.set_index(['Province/State', 'Country/Region', 'Lat', 'Long']).stack().reset_index()
df.columns = ['Province/State', 'Country/Region', 'Lat', 'Long', 'date', 'value']
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.pivot_table(index=df.index, columns='Country/Region', values='value', aggfunc=np.sum)
print(df)
Output:
Country/Region Afghanistan Albania Algeria Andorra Angola ... West Bank and Gaza Western Sahara Yemen Zambia Zimbabwe
date ...
2020-01-22 0 0 0 0 0 ... 0 0 0 0 0
2020-01-23 0 0 0 0 0 ... 0 0 0 0 0
2020-01-24 0 0 0 0 0 ... 0 0 0 0 0
2020-01-25 0 0 0 0 0 ... 0 0 0 0 0
2020-01-26 0 0 0 0 0 ... 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ...
2020-07-30 36542 5197 29831 922 1109 ... 11548 10 1726 5555 3092
2020-07-31 36675 5276 30394 925 1148 ... 11837 10 1728 5963 3169
2020-08-01 36710 5396 30950 925 1164 ... 12160 10 1730 6228 3659
2020-08-02 36710 5519 31465 925 1199 ... 12297 10 1734 6347 3921
2020-08-03 36747 5620 31972 937 1280 ... 12541 10 1734 6580 4075
If you now want to do weekly aggregations, it's as simple as:
print(df.resample('w').sum())
Output:
Country/Region Afghanistan Albania Algeria Andorra Angola ... West Bank and Gaza Western Sahara Yemen Zambia Zimbabwe
date ...
2020-01-26 0 0 0 0 0 ... 0 0 0 0 0
2020-02-02 0 0 0 0 0 ... 0 0 0 0 0
2020-02-09 0 0 0 0 0 ... 0 0 0 0 0
2020-02-16 0 0 0 0 0 ... 0 0 0 0 0
2020-02-23 0 0 0 0 0 ... 0 0 0 0 0
2020-03-01 7 0 6 0 0 ... 0 0 0 0 0
2020-03-08 10 0 85 7 0 ... 43 0 0 0 0
2020-03-15 57 160 195 7 0 ... 209 0 0 0 0
2020-03-22 175 464 705 409 5 ... 309 0 0 11 7
2020-03-29 632 1142 2537 1618 29 ... 559 0 0 113 31
2020-04-05 1783 2000 6875 2970 62 ... 1178 4 0 262 59
2020-04-12 3401 2864 11629 4057 128 ... 1847 30 3 279 84
2020-04-19 5838 3603 16062 4764 143 ... 2081 42 7 356 154
2020-04-26 8918 4606 21211 5087 174 ... 2353 42 7 541 200
2020-05-03 15149 5391 27943 5214 208 ... 2432 42 41 738 244
2020-05-10 25286 5871 36315 5265 274 ... 2607 42 203 1260 241
2020-05-17 39634 6321 45122 5317 327 ... 2632 42 632 3894 274
2020-05-24 61342 6798 54185 5332 402 ... 2869 45 1321 5991 354
2020-05-31 91885 7517 62849 5344 536 ... 3073 63 1932 7125 894
2020-06-07 126442 8378 68842 5868 609 ... 3221 63 3060 7623 1694
2020-06-14 159822 9689 74147 5967 827 ... 3396 63 4236 8836 2335
2020-06-21 191378 12463 79737 5981 1142 ... 4466 63 6322 9905 3089
2020-06-28 210487 15349 87615 5985 1522 ... 10242 70 7360 10512 3813
2020-07-05 224560 18707 102918 5985 2186 ... 21897 70 8450 11322 4426
2020-07-12 237087 22399 124588 5985 2940 ... 36949 70 9489 13002 6200
2020-07-19 245264 26845 149611 6098 4279 ... 52323 70 10855 16350 9058
2020-07-26 250970 31255 178605 6237 5919 ... 68154 70 11571 26749 14933
2020-08-02 255739 36370 208457 6429 7648 ... 80685 70 12023 38896 22241
2020-08-09 36747 5620 31972 937 1280 ... 12541 10 1734 6580 4075
CodePudding user response:
You can try a transpose and groupby, e.g. something similar to the below.
df_T = df.tranpose()
df_T.groupby(df_T.index).sum()['Canada']
CodePudding user response:
Here's a way to do it:
df.columns = [(col str(i)) if col.startswith('Canada') else col for i, col in enumerate(df.columns)]
df = df.assign(Canada=df.filter(like='Canada').sum(axis=1)).drop(columns=[x for x in df.columns if x.startswith('Canada') and x != 'Canada'])
First we rename the columns starting with Canada
by appending their integer position, which ensures they are no longer duplicates.
Then we use sum()
to add across columns like
Canada, put the result in a new column named Canada
, and drop the columns that were originally named Canada
.
Full test code is:
import pandas as pd
df = pd.DataFrame(
columns=[x.strip() for x in 'Brazil Canada Canada Canada Canada'.split()],
index=['Week ' str(i) for i in range(1, 17)],
data=[[i] * 5 for i in range(1, 17)])
df.columns.names=['Country/Region']
print(df)
df.columns = [(col str(i)) if col.startswith('Canada') else col for i, col in enumerate(df.columns)]
df = df.assign(Canada=df.filter(like='Canada').sum(axis=1)).drop(columns=[x for x in df.columns if x.startswith('Canada') and x != 'Canada'])
print(df)
Output:
Country/Region Brazil Canada Canada Canada Canada
Week 1 1 1 1 1 1
Week 2 2 2 2 2 2
Week 3 3 3 3 3 3
Week 4 4 4 4 4 4
Week 5 5 5 5 5 5
Week 6 6 6 6 6 6
Week 7 7 7 7 7 7
Week 8 8 8 8 8 8
Week 9 9 9 9 9 9
Week 10 10 10 10 10 10
Week 11 11 11 11 11 11
Week 12 12 12 12 12 12
Week 13 13 13 13 13 13
Week 14 14 14 14 14 14
Week 15 15 15 15 15 15
Week 16 16 16 16 16 16
Brazil Canada
Week 1 1 4
Week 2 2 8
Week 3 3 12
Week 4 4 16
Week 5 5 20
Week 6 6 24
Week 7 7 28
Week 8 8 32
Week 9 9 36
Week 10 10 40
Week 11 11 44
Week 12 12 48
Week 13 13 52
Week 14 14 56
Week 15 15 60
Week 16 16 64