Home > Enterprise >  Is there a pandas function to duplicate each row of a dataframe n times, assigning each of n categor
Is there a pandas function to duplicate each row of a dataframe n times, assigning each of n categor

Time:12-22

What is the easiest way to go from:

df = pd.DataFrame({'col1': [1,1,2,3], 'col2': [2,4,3,5]})
group_l = ['a', 'b']
df

    col1    col2
0   1   2
1   1   4
2   2   3
3   3   5

to

    col1    col2    group
0   1   2   a
1   1   4   a
2   2   3   a
3   3   5   a
0   1   2   b
1   1   4   b
2   2   3   b
3   3   5   b

I've thought of a few solutions but none seem great.

  • Use pd.MultiIndex.from_product, then reset_index. This would work fine if the initial DataFrame only had one column.
  • Add a new column group where each element is ['a', 'b']. Use pd.DataFrame.explode. Feels inefficient.

CodePudding user response:

You might create copies, set group value accordingly and concatenate them, that is

import pandas as pd
df = pd.DataFrame({'col1': [1,1,2,3], 'col2': [2,4,3,5]})
df1 = df.copy()
df2 = df.copy()
df1['group'] = 'A'
df2['group'] = 'B'
df_out = pd.concat([df1,df2])
print(df_out)

gives output

   col1  col2 group
0     1     2     A
1     1     4     A
2     2     3     A
3     3     5     A
0     1     2     B
1     1     4     B
2     2     3     B
3     3     5     B

CodePudding user response:

One approach, using pd.concat:

group_l = ['a', 'b']
res = pd.concat([df.assign(group=e) for e in group_l])
print(res)

Output

   col1  col2 group
0     1     2     a
1     1     4     a
2     2     3     a
3     3     5     a
0     1     2     b
1     1     4     b
2     2     3     b
3     3     5     b
  • Related