Hey guys I have a data frame in python which looks something like the following
FirstName | MiddleName | LastName |
---|---|---|
Jim | J | Jones |
Anthony | K | White |
Ravi | K | Khanna |
I want to add a column for year and repeat the rows with the number of years, something like
FirstName | MiddleName | LastName | Year |
---|---|---|---|
Jim | J | Jones | 2020 |
Jim | J | Jones | 2021 |
Jim | J | Jones | 2022 |
Anthony | K | White | 2020 |
Anthony | K | White | 2021 |
Anthony | K | White | 2022 |
Ravi | K | Khanna | 2020 |
Ravi | K | Khanna | 2021 |
Ravi | K | Khanna | 2022 |
Does anyone have any advice on how I can go about doing this on Pandas? Thanks in advance!
CodePudding user response:
Try this:
l = [2020, 2021, 2022]
df.reindex(df.index.repeat(len(l))).assign(Year=np.tile(l, len(df)))
Output:
FirstName MiddleName LastName Year
0 Jim J Jones 2020
0 Jim J Jones 2021
0 Jim J Jones 2022
1 Anthony K White 2020
1 Anthony K White 2021
1 Anthony K White 2022
2 Ravi K Khanna 2020
2 Ravi K Khanna 2021
2 Ravi K Khanna 2022
Details, use reindex
and repeat
to expand dataframe to get the appropriate number of rows, then add a column 'Year' with assign
and np.tile
.
CodePudding user response:
You can do:
years = [2020,2021,2022]
df['Year'] = [years]*len(df)
df = df.explode('Year')
Output:
FirstName MiddleName LastName Year
0 Jim J Jones 2020
0 Jim J Jones 2021
0 Jim J Jones 2022
1 Anthony K White 2020
1 Anthony K White 2021
1 Anthony K White 2022
2 Ravi K Khanna 2020
2 Ravi K Khanna 2021
2 Ravi K Khanna 2022