I have a dataframe and want to replicate specific rows based on value in a column and looking for a simple way to do it. For example in the dummy data how can i replicate row where Age is 24 - 2 more times and where Age is 22 - 2 more times.
import pandas as pd
data= [['Karan',23],['Rohit',22],['Macron',22], ['Sahil',21],['Aryan',24]]
df = pd.DataFrame(data, columns=['Name','Age'])
So my final dataframe should look like [row order not important]
Name Age
Karan 23
Rohit 22
Rohit 22
Macron 22
Macron 22
Sahil 21
Aryan 24
Aryan 24
Tried [https://stackoverflow.com/questions/24029659/python-pandas-replicate-rows-in-dataframe] but no working for data with string.
CodePudding user response:
You can use Index.repeat
:
df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))]
How it works:
- determine whether Age is in [22,24]
- add 1 (the
False
values become1
, theTrue
become2
) - repeat the index and reindex
or, for more flexibility, with numpy.where
, you can pick any value you want:
import numpy as np
df.loc[df.index.repeat(np.where(df['Age'].isin([22,24]), 2, 1))]
output:
Name Age
0 Karan 23
1 Rohit 22
1 Rohit 22
2 Macron 22
2 Macron 22
3 Sahil 21
4 Aryan 24
4 Aryan 24
resetting the index:
df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))].reset_index(drop=True)
output:
Name Age
0 Karan 23
1 Rohit 22
2 Rohit 22
3 Macron 22
4 Macron 22
5 Sahil 21
6 Aryan 24
7 Aryan 24
CodePudding user response:
This is a possible solution (rows order is not preserved):
import numpy as np
import pandas as pd
mask = df['Age'].isin([22, 24])
df = pd.DataFrame(np.concatenate((
np.repeat(df[mask].values, 2, axis=0),
df[~mask].values
)), columns=df.columns)
Output:
Name Age
0 Rohit 22
1 Rohit 22
2 Macron 22
3 Macron 22
4 Aryan 24
5 Aryan 24
6 Karan 23
7 Sahil 21