Replicate X times specific rows of pandas dataframe-CodePudding

I have a dataframe and want to replicate specific rows based on value in a column and looking for a simple way to do it. For example in the dummy data how can i replicate row where Age is 24 - 2 more times and where Age is 22 - 2 more times.

import pandas as pd

data= [['Karan',23],['Rohit',22],['Macron',22], ['Sahil',21],['Aryan',24]]

df = pd.DataFrame(data, columns=['Name','Age'])

So my final dataframe should look like [row order not important]

Name Age
Karan 23
Rohit 22
Rohit 22
Macron 22
Macron 22
Sahil 21
Aryan 24
Aryan 24

Tried [https://stackoverflow.com/questions/24029659/python-pandas-replicate-rows-in-dataframe] but no working for data with string.

CodePudding user response：

You can use Index.repeat:

df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))]

How it works:

determine whether Age is in [22,24]
add 1 (the False values become 1, the True become 2)
repeat the index and reindex

or, for more flexibility, with numpy.where, you can pick any value you want:

import numpy as np
df.loc[df.index.repeat(np.where(df['Age'].isin([22,24]), 2, 1))]

output:

     Name  Age
0   Karan   23
1   Rohit   22
1   Rohit   22
2  Macron   22
2  Macron   22
3   Sahil   21
4   Aryan   24
4   Aryan   24

resetting the index:

df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))].reset_index(drop=True)

output:

     Name  Age
0   Karan   23
1   Rohit   22
2   Rohit   22
3  Macron   22
4  Macron   22
5   Sahil   21
6   Aryan   24
7   Aryan   24

CodePudding user response：

This is a possible solution (rows order is not preserved):

import numpy as np
import pandas as pd

mask = df['Age'].isin([22, 24])
df = pd.DataFrame(np.concatenate((
    np.repeat(df[mask].values, 2, axis=0),
    df[~mask].values
)), columns=df.columns)

Output:

     Name Age
0   Rohit  22
1   Rohit  22
2  Macron  22
3  Macron  22
4   Aryan  24
5   Aryan  24
6   Karan  23
7   Sahil  21