My goal is to be able to cumulatively add rows for each group in the data frame as I have done manually below but without using a for loop or df.apply() (So basically one operation).
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]]),
columns=['group', 'a', 'b'])
df2 = pd.DataFrame(np.array([[1, 1, 1], [2, 1, 1], [2, 2, 2], [3, 1, 1], [3, 1, 1], [3, 2, 2]]),
columns=['group', 'a', 'b'])
df1 = df1.set_index('group').sort_index()
df2 = df2.set_index('group').sort_index()
print(df1)
a b
group
1 1 1
2 2 2
3 3 3
print(df2)
a b
group
1 1 1
2 1 1
2 2 2
3 1 1
3 1 1
3 2 2
CodePudding user response:
IIUC, you can use:
tmp = pd.DataFrame(1, columns=df1.columns, index=df1.index.repeat(range(len(df1))))
df2 = pd.concat([tmp, df1]).sort_index()
print(df2)
# Output
a b
group
1 1 1
2 1 1
2 2 2
3 1 1
3 1 1
3 3 3
One line:
df2 = pd.concat([pd.DataFrame(1, columns=df1.columns, index=df1.index.repeat(range(len(df1)))), df1]).sort_index()