I have a pandas dataframe like so:
ID A B C D
1 a1 b1 c1 d1
1 a2 b2 c2 d2
1 a3 b3 c3 d3
1 a4 b4 c4 d4
1 a5 b5 c5 d5
2 a6 b6 c6 d6
2 a7 b7 c7 d7
3 a8 b8 c8 d8
Is there some way I can add rows (with dummy values a0 b0 c0 d0 for remaining columns) for ID 2 and 3 (and others) so all ID values have the same number of rows (5). Please note that I only need to add rows as I already have executed a groupby to have at max 5 rows per ID.
df = df.groupby('id').head(5)
The dummy rows need to have same values (a0, b0, c0, d0) aside from the ID. Please ask for any further information that might be required.
EXPECTED OUTPUT
ID A B C D
1 a1 b1 c1 d1
1 a2 b2 c2 d2
1 a3 b3 c3 d3
1 a4 b4 c4 d4
1 a5 b5 c5 d5
2 a6 b6 c6 d6
2 a7 b7 c7 d7
2 a0 b0 c0 d0
2 a0 b0 c0 d0
2 a0 b0 c0 d0
3 a8 b8 c8 d8
3 a0 b0 c0 d0
3 a0 b0 c0 d0
3 a0 b0 c0 d0
3 a0 b0 c0 d0
CodePudding user response:
We can use reindex
per group, then use fillna
with a dictionary:
rows = np.arange(5)
df = df.groupby("ID").apply(lambda x: x.reset_index(drop=True).reindex(rows)).reset_index(drop=True)
df["ID"] = df["ID"].ffill().astype(int)
df = df.fillna({"A": "a0", "B": "b0", "C": "c0", "D": "d0"})
ID A B C D
0 1 a1 b1 c1 d1
1 1 a2 b2 c2 d2
2 1 a3 b3 c3 d3
3 1 a4 b4 c4 d4
4 1 a5 b5 c5 d5
5 2 a6 b6 c6 d6
6 2 a7 b7 c7 d7
7 2 a0 b0 c0 d0
8 2 a0 b0 c0 d0
9 2 a0 b0 c0 d0
10 3 a8 b8 c8 d8
11 3 a0 b0 c0 d0
12 3 a0 b0 c0 d0
13 3 a0 b0 c0 d0
14 3 a0 b0 c0 d0
CodePudding user response:
Append missing ID
values and fill nan values
out = df.append(pd.DataFrame(df['ID'].unique().repeat(5 - df['ID'].value_counts()),
columns=['ID'])).fillna({'A': 'a0', 'B': 'b0', 'C': 'c0', 'D': 'd0'}) \
.sort_values('ID').reset_index(drop=True)
print(out)
# Output:
ID A B C D
0 11 a1 b1 c1 d1
1 11 a2 b2 c2 d2
2 11 a3 b3 c3 d3
3 11 a4 b4 c4 d4
4 11 a5 b5 c5 d5
5 12 a6 b6 c6 d6
6 12 a7 b7 c7 d7
7 12 a0 b0 c0 d0
8 12 a0 b0 c0 d0
9 12 a0 b0 c0 d0
10 13 a8 b8 c8 d8
11 13 a0 b0 c0 d0
12 13 a0 b0 c0 d0
13 13 a0 b0 c0 d0
14 13 a0 b0 c0 d0