Home > Software engineering >  Pandas making a dataframe with a repeating column
Pandas making a dataframe with a repeating column

Time:12-11

Having the two vectors below, I am trying to get the desired output without a for loop.

re = [1,2]
po = [1, 3, 5, 10, 20]

Desired output:

re  po
1   1
1   3
1   5
1   10
1   20
2   1
2   3
2   5
2   10
2   20

Any assistance is appreciated.

CodePudding user response:

You can multiply each list by the length of the other and pass it to pd.DataFrame:

out = pd.DataFrame([re*len(po), po*len(re)], index=['re','po']).T.sort_values(by=['re','po'])

Output:

   re  po
0   1   1
6   1   3
2   1   5
8   1  10
4   1  20
5   2   1
1   2   3
7   2   5
3   2  10
9   2  20

CodePudding user response:

You can use the merge function to do a cross merge. To do this, you would make each vector a dataframe then merge them like this :

re = [1,2]
po = [1, 3, 5, 10, 20]
reDf = pd.DataFrame({"re":re})
po = pd.DataFrame({"po":po})
repo = re.merge(po,how="cross")

Or if you don't want to define new variables:

re = [1,2]
po = [1, 3, 5, 10, 20]
repo = pd.DataFrame({"re":re}).merge(pd.DataFrame({"po":po}), how="cross")

Output:

   re  po
   1   1
   1   3
   1   5
   1  10
   1  20
   2   1
   2   3
   2   5
   2  10
   2  20

CodePudding user response:

You can use list comprehension: output = [[x,y] for x in re for y in po]

Here's the full code.

import pandas as pd

re = [1,2]
po = [1, 3, 5, 10, 20]

output = [[x,y] for x in re for y in po]

df = pd.DataFrame(output)
df.columns = ['re', 'po']

display(df)

Output:

   re  po
0   1   1
1   1   3
2   1   5
3   1  10
4   1  20
5   2   1
6   2   3
7   2   5
8   2  10
9   2  20

CodePudding user response:

You can use itertools.product and the DataFrame constructor:

re = [1,2]
po = [1, 3, 5, 10, 20]

from itertools import product
df = pd.DataFrame(product(re, po), names=['re', 'po'])

You can also use pandas.MultiIndex.from_product and transform to_frame:

re = [1,2]
po = [1, 3, 5, 10, 20]

df = pd.MultiIndex.from_product([re, po], names=['re', 'po']).to_frame(index=False)

Output:

   re  po
0   1   1
1   1   3
2   1   5
3   1  10
4   1  20
5   2   1
6   2   3
7   2   5
8   2  10
9   2  20
  • Related