I have following dataset (much larger, this is just small sample from it):
City Year Votes
Detroit 1964 23
Detroit 1977 61
Detroit 1978 89
Detroit 1986 116
Detroit 1993 144
Baltimore 1964 42
Baltimore 1965 91
Baltimore 1966 161
Baltimore 1967 219
Baltimore 1968 263
Baltimore 1969 312
Baltimore 1970 346
Baltimore 1978 375
Baltimore 1980 415
Baltimore 1981 449
Baltimore 1995 484
Baltimore 1996 529
Baltimore 1997 578
Baltimore 1998 619
Baltimore 1999 660
Baltimore 2000 713
Baltimore 2001 757
Baltimore 2002 807
Baltimore 2003 852
Baltimore 2004 884
Boston 1968 47
Boston 1969 101
Boston 1970 123
Boston 2007 157
Phoenix 1971 41
Phoenix 1972 41
Phoenix 1979 76
Phoenix 1981 112
Phoenix 1982 154
Phoenix 1983 197
Phoenix 1984 242
Phoenix 1985 279
Phoenix 1997 319
Phoenix 1998 351
Phoenix 2000 381
Phoenix 2003 417
Phoenix 2005 457
Phoenix 2006 494
Phoenix 2007 536
Phoenix 2008 570
Phoenix 2009 598
Phoenix 2021 633
Phoenix 2022 661
Years should be in range from 1950 to 2023, and I would like to populate years for each city that are missing:
- if city has votes at the starting year (1950) then use that value
- if city doesn't have the votes at the starting year (1950), then use 0 as a start
- for every city I would like to fill missing years with the next logic: always use value of votes for the previous year.
Result, (only Detroit in, as I did it manually, but for all cities) should look like this:
City Year Votes
Detroit 1950 0
Detroit 1951 0
Detroit 1952 0
Detroit 1953 0
Detroit 1954 0
Detroit 1955 0
Detroit 1956 0
Detroit 1957 0
Detroit 1958 0
Detroit 1959 0
Detroit 1960 0
Detroit 1961 0
Detroit 1962 0
Detroit 1963 0
Detroit 1964 23
Detroit 1965 23
Detroit 1966 23
Detroit 1967 23
Detroit 1968 23
Detroit 1969 23
Detroit 1970 23
Detroit 1971 23
Detroit 1972 23
Detroit 1973 23
Detroit 1974 23
Detroit 1975 23
Detroit 1976 23
Detroit 1977 61
Detroit 1978 89
Detroit 1979 89
Detroit 1980 89
Detroit 1981 89
Detroit 1982 89
Detroit 1983 89
Detroit 1984 89
Detroit 1985 89
Detroit 1986 116
Detroit 1987 116
Detroit 1988 116
Detroit 1989 116
Detroit 1990 116
Detroit 1991 116
Detroit 1992 116
Detroit 1993 144
Detroit 1994 144
Detroit 1995 144
Detroit 1996 144
Detroit 1997 144
Detroit 1998 144
Detroit 1999 144
Detroit 2000 144
Detroit 2001 144
Detroit 2002 144
Detroit 2003 144
Detroit 2004 144
Detroit 2005 144
Detroit 2006 144
Detroit 2007 144
Detroit 2008 144
Detroit 2009 144
Detroit 2010 144
Detroit 2011 144
Detroit 2012 144
Detroit 2013 144
Detroit 2014 144
Detroit 2015 144
Detroit 2016 144
Detroit 2017 144
Detroit 2018 144
Detroit 2019 144
Detroit 2020 144
Detroit 2021 144
Detroit 2022 144
Detroit 2023 144
CodePudding user response:
import pandas as pd
df = pd.read_clipboard() # Your df here
cities = df["City"].unique()
years = range(1950, 2024)
index = pd.MultiIndex.from_product([cities, years], names=["City", "Year"])
out = df.set_index(["City", "Year"]).reindex(index).groupby(level=0).ffill().fillna(0).astype(int).reset_index()