Python - expand rows in dataframe n-times-CodePudding

I need to make a function to expand a dataframe. For example, the input of the function is :

df = pd.DataFrame({
    'Name':['Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha'],
    'Cart':['book', 'phonecase', 'shirt', 'phone', 'food', 'bag']
})

suppose the n value is 3. Then, for each person inside the Name column, I have to add 3 more new rows and leave the Cart as np.nan. The output should be like this :

df = pd.DataFrame({
    'Name':['Ali', 'Ali', 'Ali', 'Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha', 'Sasha', 'Sasha', 'Sasha'],
    'Cart':['book', 'phonecase', 'shirt', np.nan, np.nan, np.nan, 'phone', 'food', 'bag', np.nan, np.nan, np.nan]
})

How can I solve this with using copy() and append()?

CodePudding user response：

You can use np.repeat with pd.Series.unique:

n = 3
print (df.append(pd.DataFrame(np.repeat(df["Name"].unique(), n), columns=["Name"])))

    Name       Cart
0    Ali       book
1    Ali  phonecase
2    Ali      shirt
3  Sasha      phone
4  Sasha       food
5  Sasha        bag
0    Ali        NaN
1    Ali        NaN
2    Ali        NaN
3  Sasha        NaN
4  Sasha        NaN
5  Sasha        NaN

CodePudding user response：

Try this one: (it adds n rows to each group of rows with the same Name value)

import pandas as pd
import numpy as np

n = 3
list_of_df_unique_names = [df[df["Name"]==name] for name in df["Name"].unique()]
df2 = pd.concat([d.append(pd.DataFrame({"Name":np.repeat(d["Name"].values[-1], n)}))\
                      for d in list_of_df_unique_names]).reset_index(drop=True)
print(df2)

Output:

     Name       Cart
0     Ali       book
1     Ali  phonecase
2     Ali      shirt
3     Ali        NaN
4     Ali        NaN
5     Ali        NaN
6   Sasha      phone
7   Sasha       food
8   Sasha        bag
9   Sasha        NaN
10  Sasha        NaN
11  Sasha        NaN

CodePudding user response：

Maybe not the most beautiful of all solutions, but it works. Say that you want to add 4 NaN rows by group. Then, given your df:

df = pd.DataFrame({
    'Name':['Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha'],
    'Cart':['book', 'phonecase', 'shirt', 'phone', 'food', 'bag']
})

you can creat an empty dataframe DF and loop trough the range (1,4), filter the df you had and in every loop add an empty row:

DF = []
names = list(set(df.Name))
for i in range(4):
    for name in names:
        gf = df[df['Name']=='{}'.format(name)]
        a = pd.concat([gf, gf.groupby('Name')['Cart'].apply(lambda x: x.shift(-1).iloc[-1]).reset_index()]).sort_values('Name').reset_index(drop=True)
        DF.append(a)
DF_full = pd.concat(DF)

Now, you'll end up with copies of your original df, so you need to dump them without dumping the NaN rows:

DFF = DF_full.sort_values(['Name','Cart'])
DFF = DFF[(~DFF.duplicated()) | (DFF['Cart'].isnull())]

which gives:

 Name       Cart
0    Ali       book
1    Ali  phonecase
2    Ali      shirt
3    Ali        NaN
3    Ali        NaN
3    Ali        NaN
3    Ali        NaN
2  Sasha        bag
1  Sasha       food
0  Sasha      phone
3  Sasha        NaN
3  Sasha        NaN
3  Sasha        NaN
3  Sasha        NaN