I have the following df:
df = pd.DataFrame({'A': ['foo', 'bar', 'dex', 'tru'],
'B': ['abc', 'def', 'ghi', 'jkl']})
Which looks like:
A
0 foo
1 bar
2 dex
3 tru
I also have the following list:
block_number = [1000, 2000]
I want the following output new_df
:
df = pd.DataFrame({'A': ['foo', 'bar', 'dex', 'tru', 'foo', 'bar', 'dex', 'tru'],
'B': ['abc', 'def', 'ghi', 'jkl', 'abc', 'def', 'ghi', 'jkl'],
'block_number': [1000, 1000, 1000, 1000, 2000, 2000, 2000, 2000]})
Which looks like:
A B block_number
0 foo abc 1000
1 bar def 1000
2 dex ghi 1000
3 tru jkl 1000
4 foo abc 2000
5 bar def 2000
6 dex ghi 2000
7 tru jkl 2000
I basically need have each item in my_list
per row on df
, but do that for all the different items in my list.
from itertools import product
df_out = pd.DataFrame(
product(df.A, block_number), columns=["A", "block_number"]
).sort_values(by="block_number")
print(df_out)
from Andrej at Multiply and repeat every row of df by n-item list only gives me correct results for a single column df
. What if my df has 2 or more (even thousands of columns)?
CodePudding user response:
Try using the keys parameter in pd.concat()
(pd.concat([df]*len(block_number),
keys = block_number,
names = ['block_number'])).reset_index(level=0)
Output:
block_number A B
0 1000 foo abc
1 1000 bar def
2 1000 dex ghi
3 1000 tru jkl
0 2000 foo abc
1 2000 bar def
2 2000 dex ghi
3 2000 tru jkl
CodePudding user response:
Instead of using itertools.product
which requires extracting each cell value in the dataframe, create a new dataframe with the final row count by concatentation, then assign block numbers by slice.
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'bar', 'dex', 'tru'],
'B': ['abc', 'def', 'ghi', 'jkl']})
block_number = [1000, 2000]
new_df = pd.concat([df] * len(block_number), ignore_index=True)
new_df['block_number'] = block_number[0]
for i, num in enumerate(block_number[1:], 1):
new_df.loc[i*len(df):(i 1)*len(df), 'block_number'] = num
print(new_df)
CodePudding user response:
You can try this for any number of columns:
import itertools
df_new = pd.DataFrame(itertools.product(*[df.values,block_number]))
df_new = df_new[0].apply(pd.Series).rename(columns = lambda x: chr(x 65)).assign(
block_number = df_new[1]).sort_values(by="block_number").reset_index(drop=True)
print(df_new)
Output:
A B block_number
0 foo abc 1000
1 bar def 1000
2 dex ghi 1000
3 tru jkl 1000
4 foo abc 2000
5 bar def 2000
6 dex ghi 2000
7 tru jkl 2000