sorry, if my title sounds a bit confusing. What I'm basically trying to do is adding new rows in a data frame, where I duplicate the value of each unique value of one column, while another column's new values are changing.
This is what my data frame looks like:
id | year |
---|---|
01 | 2022 |
02 | 2022 |
03 | 2022 |
... | ... |
99 | 2022 |
And I want it to look like this:
id | year |
---|---|
01 | 2022 |
01 | 2023 |
01 | 2024 |
02 | 2022 |
02 | 2023 |
02 | 2024 |
03 | 2022 |
... | ... |
99 | 2024 |
I.e. I want for every id to add the years 2023 and 2024 in the year column. I tried doing this with an apply function, but it always didn't work out, could you guys help me out in solving this?
CodePudding user response:
years = [2022 i for i in range(3)]
# or
years = [2022,2023, 2024]
pd.DataFrame({
'id': np.repeat((data:=df.id.to_numpy()), len(years)).reshape(-1,len(years)).flatten(),
'year': np.repeat(np.array(years), data.shape[0]).reshape(len(years), data.shape[0]).T.flatten()
})
CodePudding user response:
You can simply make a list comprehension and concat all dataframe years wirh increments of your desire. For example:
pd.concat([df.assign(year=df.year increment) for increment in range(0,3)]).sort_values(by='id').reset_index(drop=True)
This will increment your dataframe to three years as follows. You can play around with range for the desired number of extensions:
id | year |
---|---|
1 | 2022 |
1 | 2023 |
1 | 2024 |
2 | 2022 |
2 | 2023 |
2 | 2024 |
3 | 2022 |
3 | 2023 |
3 | 2024 |
CodePudding user response:
A quick solution would be to make two copies of your current dataframe and change accordingly the year date to 2023 and 2024. After you do that, concatenate all 3 datasets together using pd.concat
.