Home > Mobile >  How do I replicate row in a group with size < 3 within a DataFrame?
How do I replicate row in a group with size < 3 within a DataFrame?

Time:09-29

I'm new to pandas. I have the following dataframe:

ID A B
0 Data Data
1 Data Data
2 Data Data
3 Data Data
3 Data Data
3 Data Data
3 Data Data

I want to replicate every row 3 times for a group with a size < 3. The dataframe will look like this:

ID A B
0 Data Data
0 Data Data
0 Data Data
1 Data Data
1 Data Data
1 Data Data
2 Data Data
2 Data Data
2 Data Data
3 Data Data
3 Data Data
3 Data Data
3 Data Data

Does anyone have ideas? Thanks in advance.

CodePudding user response:

IIUC you can use reindex with the clipped counts to have a minimum of 3:

s = df.groupby('ID')['ID'].size().clip(lower=3)

out = df.loc[s.index.repeat(s).rename(None)]

output:

   ID     A     B
0   0  Data  Data
0   0  Data  Data
0   0  Data  Data
1   1  Data  Data
1   1  Data  Data
1   1  Data  Data
2   2  Data  Data
2   2  Data  Data
2   2  Data  Data
3   3  Data  Data
3   3  Data  Data
3   3  Data  Data
3   3  Data  Data

intermediates:

s
# ID
# 0    3
# 1    3
# 2    3
# 3    4
# Name: ID, dtype: int64

s.index.repeat(s).rename(None)
# Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype='int64')

CodePudding user response:

Use Series.value_counts for count column, and if less like 3 set values to 3 else 1 for no repeat, then use Series.map and repeat rows by Index.repeat in DataFrame.loc:

s = df['ID'].value_counts().lt(3).map({True:3, False:1})

df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
   ID     A     B
0   0  Data  Data
0   0  Data  Data
0   0  Data  Data
1   1  Data  Data
1   1  Data  Data
1   1  Data  Data
2   2  Data  Data
2   2  Data  Data
2   2  Data  Data
3   3  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data

If there are some group with 2 values after repeat get:

print (df)
   ID     A     B
0   0  Data  Data
1   1  Data  Data
2   2  Data  Data
3   2  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data

s = df['ID'].value_counts().lt(3).map({True:3, False:1})
print (s)
3    1
2    3
0    3
1    3
Name: ID, dtype: int64

df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
   ID     A     B
0   0  Data  Data
0   0  Data  Data
0   0  Data  Data
1   1  Data  Data
1   1  Data  Data
1   1  Data  Data
2   2  Data  Data
2   2  Data  Data
2   2  Data  Data
3   2  Data  Data
3   2  Data  Data
3   2  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data

But if need repeat if 1 values 3 times, if 2 values 2 times else no repeat (repeat 1) solution is change:

s = df['ID'].value_counts().map({1:3, 2:2}).fillna(1)
print (s)
3    1.0
2    2.0
0    3.0
1    3.0
Name: ID, dtype: float64

df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
   ID     A     B
0   0  Data  Data
0   0  Data  Data
0   0  Data  Data
1   1  Data  Data
1   1  Data  Data
1   1  Data  Data
2   2  Data  Data
2   2  Data  Data
3   2  Data  Data
3   2  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data
  • Related