Home > Back-end >  Itertools product between list and df with multiple columns
Itertools product between list and df with multiple columns

Time:10-15

I have the following df:

df = pd.DataFrame({'A': ['foo', 'bar', 'dex', 'tru'],
                   'B': ['abc', 'def', 'ghi', 'jkl']})

Which looks like:

     A
0  foo
1  bar
2  dex
3  tru

I also have the following list:

block_number = [1000, 2000]

I want the following output new_df:

df = pd.DataFrame({'A': ['foo', 'bar', 'dex', 'tru', 'foo', 'bar', 'dex', 'tru'],
                   'B': ['abc', 'def', 'ghi', 'jkl', 'abc', 'def', 'ghi', 'jkl'],
                   'block_number': [1000, 1000, 1000, 1000, 2000, 2000, 2000, 2000]})

Which looks like:

     A    B  block_number
0  foo  abc          1000
1  bar  def          1000
2  dex  ghi          1000
3  tru  jkl          1000
4  foo  abc          2000
5  bar  def          2000
6  dex  ghi          2000
7  tru  jkl          2000

I basically need have each item in my_list per row on df, but do that for all the different items in my list.

from itertools import product

df_out = pd.DataFrame(
    product(df.A, block_number), columns=["A", "block_number"]
).sort_values(by="block_number")

print(df_out)

from Andrej at Multiply and repeat every row of df by n-item list only gives me correct results for a single column df. What if my df has 2 or more (even thousands of columns)?

CodePudding user response:

Try using the keys parameter in pd.concat()

(pd.concat([df]*len(block_number),
keys = block_number,
names = ['block_number'])).reset_index(level=0)

Output:

   block_number    A    B
0          1000  foo  abc
1          1000  bar  def
2          1000  dex  ghi
3          1000  tru  jkl
0          2000  foo  abc
1          2000  bar  def
2          2000  dex  ghi
3          2000  tru  jkl

CodePudding user response:

Instead of using itertools.product which requires extracting each cell value in the dataframe, create a new dataframe with the final row count by concatentation, then assign block numbers by slice.

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'dex', 'tru'],
                   'B': ['abc', 'def', 'ghi', 'jkl']})

block_number = [1000, 2000]

new_df = pd.concat([df] * len(block_number), ignore_index=True)
new_df['block_number'] = block_number[0]
for i, num in enumerate(block_number[1:], 1):
    new_df.loc[i*len(df):(i 1)*len(df), 'block_number'] = num

print(new_df)

CodePudding user response:

You can try this for any number of columns:

import itertools
df_new = pd.DataFrame(itertools.product(*[df.values,block_number]))
df_new = df_new[0].apply(pd.Series).rename(columns = lambda x: chr(x 65)).assign(
    block_number = df_new[1]).sort_values(by="block_number").reset_index(drop=True)
print(df_new)

Output:

     A    B  block_number
0  foo  abc          1000
1  bar  def          1000
2  dex  ghi          1000
3  tru  jkl          1000
4  foo  abc          2000
5  bar  def          2000
6  dex  ghi          2000
7  tru  jkl          2000
  • Related