I'm new to pandas. I have the following dataframe:
ID | A | B |
---|---|---|
0 | Data | Data |
1 | Data | Data |
2 | Data | Data |
3 | Data | Data |
3 | Data | Data |
3 | Data | Data |
3 | Data | Data |
I want to replicate every row 3 times for a group with a size < 3. The dataframe will look like this:
ID | A | B |
---|---|---|
0 | Data | Data |
0 | Data | Data |
0 | Data | Data |
1 | Data | Data |
1 | Data | Data |
1 | Data | Data |
2 | Data | Data |
2 | Data | Data |
2 | Data | Data |
3 | Data | Data |
3 | Data | Data |
3 | Data | Data |
3 | Data | Data |
Does anyone have ideas? Thanks in advance.
CodePudding user response:
IIUC you can use reindex
with the clip
ped counts to have a minimum of 3:
s = df.groupby('ID')['ID'].size().clip(lower=3)
out = df.loc[s.index.repeat(s).rename(None)]
output:
ID A B
0 0 Data Data
0 0 Data Data
0 0 Data Data
1 1 Data Data
1 1 Data Data
1 1 Data Data
2 2 Data Data
2 2 Data Data
2 2 Data Data
3 3 Data Data
3 3 Data Data
3 3 Data Data
3 3 Data Data
intermediates:
s
# ID
# 0 3
# 1 3
# 2 3
# 3 4
# Name: ID, dtype: int64
s.index.repeat(s).rename(None)
# Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype='int64')
CodePudding user response:
Use Series.value_counts
for count column, and if less like 3
set values to 3
else 1
for no repeat, then use Series.map
and repeat rows by Index.repeat
in DataFrame.loc
:
s = df['ID'].value_counts().lt(3).map({True:3, False:1})
df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
ID A B
0 0 Data Data
0 0 Data Data
0 0 Data Data
1 1 Data Data
1 1 Data Data
1 1 Data Data
2 2 Data Data
2 2 Data Data
2 2 Data Data
3 3 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data
If there are some group with 2
values after repeat get:
print (df)
ID A B
0 0 Data Data
1 1 Data Data
2 2 Data Data
3 2 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data
s = df['ID'].value_counts().lt(3).map({True:3, False:1})
print (s)
3 1
2 3
0 3
1 3
Name: ID, dtype: int64
df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
ID A B
0 0 Data Data
0 0 Data Data
0 0 Data Data
1 1 Data Data
1 1 Data Data
1 1 Data Data
2 2 Data Data
2 2 Data Data
2 2 Data Data
3 2 Data Data
3 2 Data Data
3 2 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data
But if need repeat if 1 values 3 times, if 2 values 2 times else no repeat (repeat 1) solution is change:
s = df['ID'].value_counts().map({1:3, 2:2}).fillna(1)
print (s)
3 1.0
2 2.0
0 3.0
1 3.0
Name: ID, dtype: float64
df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
ID A B
0 0 Data Data
0 0 Data Data
0 0 Data Data
1 1 Data Data
1 1 Data Data
1 1 Data Data
2 2 Data Data
2 2 Data Data
3 2 Data Data
3 2 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data