please i need help on the restructuring of the dataframe below, i would like to group by batch of 7 day(week) so by summing the column values for each 7 days and the 7th days represents the final index. so for example for the first 7 lines I will have
Uruguay Uzbekistan
2020-01-28 0 0
sum of value in colone from 2020-01-22 to 2020-01-28
you can see the dataframe in the snippet. thank
<table border="0" >
<thead>
<tr style="text-align: right;">
<th>Country/Region</th>
<th>Uruguay</th>
<th>Uzbekistan</th>
<th>Venezuela</th>
<th>Vietnam</th>
<th>West Bank and Gaza</th>
<th>Western Sahara</th>
<th>Yemen</th>
<th>Zambia</th>
<th>Zimbabwe</th>
</tr>
</thead>
<tbody>
<tr>
<th>2020-01-22</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>2020-01-23</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>2020-01-24</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>2020-01-25</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>2020-01-26</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>2020-11-03</th>
<td>2788</td>
<td>65389</td>
<td>88751</td>
<td>1104</td>
<td>48245</td>
<td>9</td>
<td>1976</td>
<td>16112</td>
<td>8188</td>
</tr>
<tr>
<th>2020-11-04</th>
<td>2831</td>
<td>65578</td>
<td>89140</td>
<td>1104</td>
<td>48728</td>
<td>9</td>
<td>1976</td>
<td>16168</td>
<td>8215</td>
</tr>
<tr>
<th>2020-11-05</th>
<td>2875</td>
<td>65833</td>
<td>89520</td>
<td>1104</td>
<td>49188</td>
<td>9</td>
<td>1976</td>
<td>16176</td>
<td>8223</td>
</tr>
<tr>
<th>2020-11-06</th>
<td>2914</td>
<td>65912</td>
<td>89920</td>
<td>1105</td>
<td>50048</td>
<td>9</td>
<td>1977</td>
<td>16211</td>
<td>8233</td>
</tr>
<tr>
<th>2020-11-07</th>
<td>2964</td>
<td>66131</td>
<td>90326</td>
<td>1105</td>
<td>50487</td>
<td>9</td>
<td>1977</td>
<td>16222</td>
<td>8246</td>
</tr>
</tbody>
</table>
CodePudding user response:
This should give you some place to start. Please come back with a solid question in mind if you run into any specific issues:
pip install lxml
import pandas as pd
df = pd.read_html(your_html)[0]
print(df)
Output:
Country/Region Uruguay Uzbekistan Venezuela Vietnam West Bank and Gaza Western Sahara Yemen Zambia Zimbabwe
0 2020-01-22 0 0 0 0 0 0 0 0 0
1 2020-01-23 0 0 0 0 0 0 0 0 0
2 2020-01-24 0 0 0 0 0 0 0 0 0
3 2020-01-25 0 0 0 0 0 0 0 0 0
4 2020-01-26 0 0 0 0 0 0 0 0 0
5 ... ... ... ... ... ... ... ... ... ...
6 2020-11-03 2788 65389 88751 1104 48245 9 1976 16112 8188
7 2020-11-04 2831 65578 89140 1104 48728 9 1976 16168 8215
8 2020-11-05 2875 65833 89520 1104 49188 9 1976 16176 8223
9 2020-11-06 2914 65912 89920 1105 50048 9 1977 16211 8233
10 2020-11-07 2964 66131 90326 1105 50487 9 1977 16222 8246
CodePudding user response:
If your each 7 days are all in the same week, you can try sum
after group by then set the index by last row in each group.
g = df.groupby(df['Country/Region'].dt.isocalendar().week)
out = g.sum().set_index(g.tail(1)['Country/Region'])
print(out)
Uruguay Uzbekistan Venezuela Vietnam West Bank and Gaza Western Sahara Yemen Zambia Zimbabwe
Country/Region
2020-01-26 0 0 0 0 0 0 0 0 0
2020-11-07 14372 328843 447657 5522 246696 45 9882 80889 41105