I have a table such as the following, with vaccine manufacturer, date and total_vaccinations as a cumulative sum of the vaccine manufacturer over time
I am trying to 'backfill' the nulls, basically such that since this is a cumulative sum, dates with null values would have the previous date's total vaccinations
| Vaccine Manufacturer | Date | Total_Vaccinations |
| Pfizer | 10/10/21 | 100 |
| Pfizer | 10/11/21 | NaN |
| Pfizer | 10/12/21 | NaN |
| Pfizer | 10/13/21 | 130 |
| Pfizer | 10/14/21 | NaN |
| Moderna | 10/10/21 | 220 |
| Moderna | 10/11/21 | NaN |
| Moderna | 10/12/21 | 280 |
| Moderna | 10/13/21 | NaN |
I've tried the following:
df['Total_Vaccinations'] = df.groupby(['Vaccine Manufacturer', 'Date'])['Total_Vaccinations'].bfill()
and, after consulting stack overflow:
df['Total_Vaccinations'] = df.groupby(['Vaccine Manufacturer', 'Date'])['Total_Vaccinations'].ffill().bfill()
Neither of these return what I'm looking for. The first does nothing, and returns the same values as the original dataframe, and the second one basically just forward fills.
For reference, this is my desired output:
| Vaccine Manufacturer | Date | Total_Vaccinations |
| Pfizer | 10/10/21 | 100 |
| Pfizer | 10/11/21 | 100 |
| Pfizer | 10/12/21 | 100 |
| Pfizer | 10/13/21 | 130 |
| Pfizer | 10/14/21 | 130 |
| Moderna | 10/10/21 | 220 |
| Moderna | 10/11/21 | 220 |
| Moderna | 10/12/21 | 280 |
| Moderna | 10/13/21 | 280 |
What am I missing here?
CodePudding user response:
Example
data = [['Pfizer', '10/10/21', 100],
['Pfizer', '10/11/21', None],
['Pfizer', '10/12/21', None],
['Pfizer', '10/13/21', 130],
['Pfizer', '10/14/21', None],
['Moderna', '10/10/21', 220],
['Moderna', '10/11/21', None],
['Moderna', '10/12/21', 280],
['Moderna', '10/13/21', None]]
df = pd.DataFrame(data, columns=['M', 'Date', 'Total'])
df
M Date Total
0 Pfizer 10/10/21 100.0
1 Pfizer 10/11/21 NaN
2 Pfizer 10/12/21 NaN
3 Pfizer 10/13/21 130.0
4 Pfizer 10/14/21 NaN
5 Moderna 10/10/21 220.0
6 Moderna 10/11/21 NaN
7 Moderna 10/12/21 280.0
8 Moderna 10/13/21 NaN
Code
df.assign(Total=df.groupby('M')['Total'].ffill().bfill())
output:
M Date Total
0 Pfizer 10/10/21 100.0
1 Pfizer 10/11/21 100.0
2 Pfizer 10/12/21 100.0
3 Pfizer 10/13/21 130.0
4 Pfizer 10/14/21 130.0
5 Moderna 10/10/21 220.0
6 Moderna 10/11/21 220.0
7 Moderna 10/12/21 280.0
8 Moderna 10/13/21 280.0