Home > Net >  How to add dummy rows to make number of rows equal for each column value of a particular column in p
How to add dummy rows to make number of rows equal for each column value of a particular column in p

Time:11-26

I have a pandas dataframe like so:

ID A  B  C  D
1  a1 b1 c1 d1
1  a2 b2 c2 d2
1  a3 b3 c3 d3
1  a4 b4 c4 d4
1  a5 b5 c5 d5
2  a6 b6 c6 d6
2  a7 b7 c7 d7
3  a8 b8 c8 d8

Is there some way I can add rows (with dummy values a0 b0 c0 d0 for remaining columns) for ID 2 and 3 (and others) so all ID values have the same number of rows (5). Please note that I only need to add rows as I already have executed a groupby to have at max 5 rows per ID.

df = df.groupby('id').head(5)

The dummy rows need to have same values (a0, b0, c0, d0) aside from the ID. Please ask for any further information that might be required.

EXPECTED OUTPUT

ID A  B  C  D
1  a1 b1 c1 d1
1  a2 b2 c2 d2
1  a3 b3 c3 d3
1  a4 b4 c4 d4
1  a5 b5 c5 d5
2  a6 b6 c6 d6
2  a7 b7 c7 d7
2  a0 b0 c0 d0
2  a0 b0 c0 d0
2  a0 b0 c0 d0
3  a8 b8 c8 d8
3  a0 b0 c0 d0
3  a0 b0 c0 d0
3  a0 b0 c0 d0
3  a0 b0 c0 d0

CodePudding user response:

We can use reindex per group, then use fillna with a dictionary:

rows = np.arange(5)
df = df.groupby("ID").apply(lambda x: x.reset_index(drop=True).reindex(rows)).reset_index(drop=True)
df["ID"] = df["ID"].ffill().astype(int)
df = df.fillna({"A": "a0", "B": "b0", "C": "c0", "D": "d0"})

    ID   A   B   C   D
0    1  a1  b1  c1  d1
1    1  a2  b2  c2  d2
2    1  a3  b3  c3  d3
3    1  a4  b4  c4  d4
4    1  a5  b5  c5  d5
5    2  a6  b6  c6  d6
6    2  a7  b7  c7  d7
7    2  a0  b0  c0  d0
8    2  a0  b0  c0  d0
9    2  a0  b0  c0  d0
10   3  a8  b8  c8  d8
11   3  a0  b0  c0  d0
12   3  a0  b0  c0  d0
13   3  a0  b0  c0  d0
14   3  a0  b0  c0  d0

CodePudding user response:

Append missing ID values and fill nan values

out = df.append(pd.DataFrame(df['ID'].unique().repeat(5 - df['ID'].value_counts()),
                columns=['ID'])).fillna({'A': 'a0', 'B': 'b0', 'C': 'c0', 'D': 'd0'}) \
        .sort_values('ID').reset_index(drop=True)
print(out)

# Output:
    ID   A   B   C   D
0   11  a1  b1  c1  d1
1   11  a2  b2  c2  d2
2   11  a3  b3  c3  d3
3   11  a4  b4  c4  d4
4   11  a5  b5  c5  d5
5   12  a6  b6  c6  d6
6   12  a7  b7  c7  d7
7   12  a0  b0  c0  d0
8   12  a0  b0  c0  d0
9   12  a0  b0  c0  d0
10  13  a8  b8  c8  d8
11  13  a0  b0  c0  d0
12  13  a0  b0  c0  d0
13  13  a0  b0  c0  d0
14  13  a0  b0  c0  d0
  • Related