Home > Net >  How to fill the gaps, and assigne the values, when using cumulative sum with Pandas?
How to fill the gaps, and assigne the values, when using cumulative sum with Pandas?

Time:12-31

I have following dataset (much larger, this is just small sample from it):

City    Year    Votes
Detroit 1964    23
Detroit 1977    61
Detroit 1978    89
Detroit 1986    116
Detroit 1993    144
Baltimore   1964    42
Baltimore   1965    91
Baltimore   1966    161
Baltimore   1967    219
Baltimore   1968    263
Baltimore   1969    312
Baltimore   1970    346
Baltimore   1978    375
Baltimore   1980    415
Baltimore   1981    449
Baltimore   1995    484
Baltimore   1996    529
Baltimore   1997    578
Baltimore   1998    619
Baltimore   1999    660
Baltimore   2000    713
Baltimore   2001    757
Baltimore   2002    807
Baltimore   2003    852
Baltimore   2004    884
Boston  1968    47
Boston  1969    101
Boston  1970    123
Boston  2007    157
Phoenix 1971    41
Phoenix 1972    41
Phoenix 1979    76
Phoenix 1981    112
Phoenix 1982    154
Phoenix 1983    197
Phoenix 1984    242
Phoenix 1985    279
Phoenix 1997    319
Phoenix 1998    351
Phoenix 2000    381
Phoenix 2003    417
Phoenix 2005    457
Phoenix 2006    494
Phoenix 2007    536
Phoenix 2008    570
Phoenix 2009    598
Phoenix 2021    633
Phoenix 2022    661

Years should be in range from 1950 to 2023, and I would like to populate years for each city that are missing:

  • if city has votes at the starting year (1950) then use that value
  • if city doesn't have the votes at the starting year (1950), then use 0 as a start
  • for every city I would like to fill missing years with the next logic: always use value of votes for the previous year.

Result, (only Detroit in, as I did it manually, but for all cities) should look like this:

City    Year    Votes
Detroit 1950    0
Detroit 1951    0
Detroit 1952    0
Detroit 1953    0
Detroit 1954    0
Detroit 1955    0
Detroit 1956    0
Detroit 1957    0
Detroit 1958    0
Detroit 1959    0
Detroit 1960    0
Detroit 1961    0
Detroit 1962    0
Detroit 1963    0
Detroit 1964    23
Detroit 1965    23
Detroit 1966    23
Detroit 1967    23
Detroit 1968    23
Detroit 1969    23
Detroit 1970    23
Detroit 1971    23
Detroit 1972    23
Detroit 1973    23
Detroit 1974    23
Detroit 1975    23
Detroit 1976    23
Detroit 1977    61
Detroit 1978    89
Detroit 1979    89
Detroit 1980    89
Detroit 1981    89
Detroit 1982    89
Detroit 1983    89
Detroit 1984    89
Detroit 1985    89
Detroit 1986    116
Detroit 1987    116
Detroit 1988    116
Detroit 1989    116
Detroit 1990    116
Detroit 1991    116
Detroit 1992    116
Detroit 1993    144
Detroit 1994    144
Detroit 1995    144
Detroit 1996    144
Detroit 1997    144
Detroit 1998    144
Detroit 1999    144
Detroit 2000    144
Detroit 2001    144
Detroit 2002    144
Detroit 2003    144
Detroit 2004    144
Detroit 2005    144
Detroit 2006    144
Detroit 2007    144
Detroit 2008    144
Detroit 2009    144
Detroit 2010    144
Detroit 2011    144
Detroit 2012    144
Detroit 2013    144
Detroit 2014    144
Detroit 2015    144
Detroit 2016    144
Detroit 2017    144
Detroit 2018    144
Detroit 2019    144
Detroit 2020    144
Detroit 2021    144
Detroit 2022    144
Detroit 2023    144

CodePudding user response:

import pandas as pd

df = pd.read_clipboard() # Your df here

cities = df["City"].unique()
years = range(1950, 2024)

index = pd.MultiIndex.from_product([cities, years], names=["City", "Year"])

out = df.set_index(["City", "Year"]).reindex(index).groupby(level=0).ffill().fillna(0).astype(int).reset_index()
  • Related