I need to make a function to expand a dataframe. For example, the input of the function is :
df = pd.DataFrame({
'Name':['Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha'],
'Cart':['book', 'phonecase', 'shirt', 'phone', 'food', 'bag']
})
suppose the n value is 3. Then, for each person inside the Name column, I have to add 3 more new rows and leave the Cart as np.nan. The output should be like this :
df = pd.DataFrame({
'Name':['Ali', 'Ali', 'Ali', 'Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha', 'Sasha', 'Sasha', 'Sasha'],
'Cart':['book', 'phonecase', 'shirt', np.nan, np.nan, np.nan, 'phone', 'food', 'bag', np.nan, np.nan, np.nan]
})
How can I solve this with using copy() and append()?
CodePudding user response:
You can use np.repeat
with pd.Series.unique
:
n = 3
print (df.append(pd.DataFrame(np.repeat(df["Name"].unique(), n), columns=["Name"])))
Name Cart
0 Ali book
1 Ali phonecase
2 Ali shirt
3 Sasha phone
4 Sasha food
5 Sasha bag
0 Ali NaN
1 Ali NaN
2 Ali NaN
3 Sasha NaN
4 Sasha NaN
5 Sasha NaN
CodePudding user response:
Try this one: (it adds n rows to each group of rows with the same Name
value)
import pandas as pd
import numpy as np
n = 3
list_of_df_unique_names = [df[df["Name"]==name] for name in df["Name"].unique()]
df2 = pd.concat([d.append(pd.DataFrame({"Name":np.repeat(d["Name"].values[-1], n)}))\
for d in list_of_df_unique_names]).reset_index(drop=True)
print(df2)
Output:
Name Cart
0 Ali book
1 Ali phonecase
2 Ali shirt
3 Ali NaN
4 Ali NaN
5 Ali NaN
6 Sasha phone
7 Sasha food
8 Sasha bag
9 Sasha NaN
10 Sasha NaN
11 Sasha NaN
CodePudding user response:
Maybe not the most beautiful of all solutions, but it works. Say that you want to add 4 NaN rows by group. Then, given your df:
df = pd.DataFrame({
'Name':['Ali', 'Ali', 'Ali', 'Sasha', 'Sasha', 'Sasha'],
'Cart':['book', 'phonecase', 'shirt', 'phone', 'food', 'bag']
})
you can creat an empty dataframe DF
and loop trough the range (1,4), filter the df you had and in every loop add an empty row:
DF = []
names = list(set(df.Name))
for i in range(4):
for name in names:
gf = df[df['Name']=='{}'.format(name)]
a = pd.concat([gf, gf.groupby('Name')['Cart'].apply(lambda x: x.shift(-1).iloc[-1]).reset_index()]).sort_values('Name').reset_index(drop=True)
DF.append(a)
DF_full = pd.concat(DF)
Now, you'll end up with copies of your original df, so you need to dump them without dumping the NaN rows:
DFF = DF_full.sort_values(['Name','Cart'])
DFF = DFF[(~DFF.duplicated()) | (DFF['Cart'].isnull())]
which gives:
Name Cart
0 Ali book
1 Ali phonecase
2 Ali shirt
3 Ali NaN
3 Ali NaN
3 Ali NaN
3 Ali NaN
2 Sasha bag
1 Sasha food
0 Sasha phone
3 Sasha NaN
3 Sasha NaN
3 Sasha NaN
3 Sasha NaN