I have this dataset:
NAME VALUE1 VALUE2
0 Alpha 100 A1
1 Alpha 100 A1
2 Alpha 200 A2
I would like a for loop that can loop through each individual value and make every possible solution, without repeating, and then push each pair into an array, like so:
[
[ALPHA, 100]
[ALHA, 100]
[ALPHA, 200]
[ALPHA, A1]
[ALPHA, A1]
[ALPHA, A2]
[100, A1]
[100, A1]
[200, A2]
[ALPHA, 100, A1]
[ALPHA, 100, A1]
[ALPHA, 200, A2]
]
I currently have this which gives me my desired result, but if I was to consume a larger dataset, I'm thinking there must be an easier way to automate the process.
A = []
for index, row in df.iterrows():
A.append([df.iloc[index,0], df.iloc[index,1]])
for index, row in df.iterrows():
A.append([df.iloc[index,0], df.iloc[index,2]])
for index, row in df.iterrows():
A.append([df.iloc[index,1], df.iloc[index,2]])
for index, row in df.iterrows():
A.append([df.iloc[index,0], df.iloc[index, 1], df.iloc[index,2]])
CodePudding user response:
This is easier without converting to pandas first, but this will do it:
import pandas as pd
import itertools
data = [
['Alpha', 100, 'A1'],
['Alpha', 100, 'A1'],
['Alpha', 200, 'A2']
]
df = pd.DataFrame(data, columns=['NAME','VALUE1','VALUE2'])
print(df)
# Extract the columns.
c1 = set(df['NAME'].tolist())
c2 = set(df['VALUE1'].tolist())
c3 = set(df['VALUE2'].tolist())
for z in itertools.product(c1,c2):
print(z)
for z in itertools.product(c1,c3):
print(z)
for z in itertools.product(c2,c3):
print(z)
for z in itertools.product(c1,c2,c3):
print(z)
Output:
NAME VALUE1 VALUE2
0 Alpha 100 A1
1 Alpha 100 A1
2 Alpha 200 A2
('Alpha', 200)
('Alpha', 100)
('Alpha', 'A1')
('Alpha', 'A2')
(200, 'A1')
(200, 'A2')
(100, 'A1')
(100, 'A2')
('Alpha', 200, 'A1')
('Alpha', 200, 'A2')
('Alpha', 100, 'A1')
('Alpha', 100, 'A2')
itertools.product
is relatively recent. You can actually do the two-level loops using itertools.permutations
instead of three different loops.
FOLLOWUP
Here's how you generalize it to more than 3 columns. Basically, in a loop, you do 2 at a time, then 3 at a time, etc. itertools.combinations
can return all the combinations we need.
import itertools
import pandas as pd
data = [
['Alpha', 100, 'A1', 999],
['Alpha', 100, 'A1', 888],
['Alpha', 200, 'A2', 777]
]
columns=['NAME','VALUE1','VALUE2','VALUE3']
df = pd.DataFrame(data, columns=columns)
print(df)
for r in range(2,len(columns) 1):
for cols in itertools.combinations( columns, r ):
cdata = [set(df[c].tolist()) for c in cols]
for z in itertools.product(*cdata):
print(z)
Output:
NAME VALUE1 VALUE2 VALUE3
0 Alpha 100 A1 999
1 Alpha 100 A1 888
2 Alpha 200 A2 777
('Alpha', 200)
('Alpha', 100)
('Alpha', 'A1')
('Alpha', 'A2')
('Alpha', 888)
('Alpha', 777)
('Alpha', 999)
(200, 'A1')
(200, 'A2')
(100, 'A1')
(100, 'A2')
(200, 888)
(200, 777)
(200, 999)
(100, 888)
(100, 777)
(100, 999)
('A1', 888)
('A1', 777)
('A1', 999)
('A2', 888)
('A2', 777)
('A2', 999)
('Alpha', 200, 'A1')
('Alpha', 200, 'A2')
('Alpha', 100, 'A1')
('Alpha', 100, 'A2')
('Alpha', 200, 888)
('Alpha', 200, 777)
('Alpha', 200, 999)
('Alpha', 100, 888)
('Alpha', 100, 777)
('Alpha', 100, 999)
('Alpha', 'A1', 888)
('Alpha', 'A1', 777)
('Alpha', 'A1', 999)
('Alpha', 'A2', 888)
('Alpha', 'A2', 777)
('Alpha', 'A2', 999)
(200, 'A1', 888)
(200, 'A1', 777)
(200, 'A1', 999)
(200, 'A2', 888)
(200, 'A2', 777)
(200, 'A2', 999)
(100, 'A1', 888)
(100, 'A1', 777)
(100, 'A1', 999)
(100, 'A2', 888)
(100, 'A2', 777)
(100, 'A2', 999)
('Alpha', 200, 'A1', 888)
('Alpha', 200, 'A1', 777)
('Alpha', 200, 'A1', 999)
('Alpha', 200, 'A2', 888)
('Alpha', 200, 'A2', 777)
('Alpha', 200, 'A2', 999)
('Alpha', 100, 'A1', 888)
('Alpha', 100, 'A1', 777)
('Alpha', 100, 'A1', 999)
('Alpha', 100, 'A2', 888)
('Alpha', 100, 'A2', 777)
('Alpha', 100, 'A2', 999)
FOLLOWUP #2
This creates "sub dataframes" with every combination of columns. I'll leave it as an exercise for the reader to convert those to lists of rows.
import itertools
import pandas as pd
data = [
['Alpha', 100, 'A1',999],
['Alpha', 100, 'A1',888],
['Alpha', 200, 'A2',777]
]
columns=['NAME','VALUE1','VALUE2','VALUE3']
df = pd.DataFrame(data, columns=columns)
print(df)
for r in range(2,len(columns) 1):
for cols in itertools.combinations( columns, r ):
print(df[list(cols)])
Output:
NAME VALUE1 VALUE2 VALUE3
0 Alpha 100 A1 999
1 Alpha 100 A1 888
2 Alpha 200 A2 777
NAME VALUE1
0 Alpha 100
1 Alpha 100
2 Alpha 200
NAME VALUE2
0 Alpha A1
1 Alpha A1
2 Alpha A2
NAME VALUE3
0 Alpha 999
1 Alpha 888
2 Alpha 777
VALUE1 VALUE2
0 100 A1
1 100 A1
2 200 A2
VALUE1 VALUE3
0 100 999
1 100 888
2 200 777
VALUE2 VALUE3
0 A1 999
1 A1 888
2 A2 777
NAME VALUE1 VALUE2
0 Alpha 100 A1
1 Alpha 100 A1
2 Alpha 200 A2
NAME VALUE1 VALUE3
0 Alpha 100 999
1 Alpha 100 888
2 Alpha 200 777
NAME VALUE2 VALUE3
0 Alpha A1 999
1 Alpha A1 888
2 Alpha A2 777
VALUE1 VALUE2 VALUE3
0 100 A1 999
1 100 A1 888
2 200 A2 777
NAME VALUE1 VALUE2 VALUE3
0 Alpha 100 A1 999
1 Alpha 100 A1 888
2 Alpha 200 A2 777