I have a dataframe that looks like this:
df = pd.DataFrame({'id':[1,3,500, 53, 1, 500], 'code1':['a0', 'b0', 'b0', 'c0', 'b0', 'a0'], 'code2':['aa', 'bb', 'cc', 'bb', 'cc', 'bb'], 'date':['2022-10-01', '2022-09-01', '2022-10-01', '2022-11-01', '2022-09-01', '2022-11-01']})
I want to expand (copy) this dataframe N times, but each time has to have a different IDs, keeping the original ID and the original combination (e.g., id=1 has code1=[a0,b0], code2=[aa, cc], date=['2022-10-01', 2022-08-01'], the new id replacing id=1 should have the same).
For N=1, I can do:
df1 = df.loc[df.index.repeat(1)] # repeat 1 time my dataframe, I can also just copy
ids = df.id.unique() # Get the original IDs
new_ids = df.id.max() 1 # Create new IDs starting from the max ID
nids = df.id.nunique() # Get the number of unique IDs
new_ids = new_ids range(0,nids) # Create a list with the new IDs
df1['id'] = df1['id'].replace(ids, new_ids) # Replace the old IDs with the new ones
df_final = pd.concat(df, df1, axis=0) # Concacatenate
For N=2 and larger, I thought of doing a for loop, but I guess there is a better way? Important thing is to keep the combinations for each IDs with code1/code2/date and keep the original IDs.
Thank you!
CodePudding user response:
You can use the key
parameter of concat
to increment a step based on the max id in the original DataFrame:
N = 4
step = df['id'].max()
out = pd.concat([df]*N, keys=range(N))
out['id'] = out.index.get_level_values(0)*step
out = out.droplevel(0)
More simple variant with numpy:
import numpy as np
N = 4
step = df['id'].max()
a = np.repeat(np.arange(N), len(df))
out = pd.concat([df]*N)
out['id'] = a*step
output:
id code1 code2 date
0 1 a0 aa 2022-10-01
1 3 b0 bb 2022-09-01
2 500 b0 cc 2022-10-01
3 53 c0 bb 2022-11-01
4 1 b0 cc 2022-09-01
5 500 a0 bb 2022-11-01
0 501 a0 aa 2022-10-01 # new id starts at 501
1 503 b0 bb 2022-09-01
2 1000 b0 cc 2022-10-01
3 553 c0 bb 2022-11-01
4 501 b0 cc 2022-09-01
5 1000 a0 bb 2022-11-01
0 1001 a0 aa 2022-10-01 # new id starts at 1001
1 1003 b0 bb 2022-09-01
2 1500 b0 cc 2022-10-01
3 1053 c0 bb 2022-11-01
4 1001 b0 cc 2022-09-01
5 1500 a0 bb 2022-11-01
0 1501 a0 aa 2022-10-01 # new id starts at 1501
1 1503 b0 bb 2022-09-01
2 2000 b0 cc 2022-10-01
3 1553 c0 bb 2022-11-01
4 1501 b0 cc 2022-09-01
5 2000 a0 bb 2022-11-01