Home > Net >  Issue in executing a specific type of nested 'for' loop on columns of a panda dataframe
Issue in executing a specific type of nested 'for' loop on columns of a panda dataframe

Time:12-06

I have a panda dataframe that has values like below. Though in real I am working with lot more columns and historical data

         AUD    USD JPY EUR
      0 0.67    1   140 1.05

I want to iterate over columns to create dataframe with columns AUDUSD, AUDJPY, AUDEUR, USDJPY, USDEUR and JPYEUR where for eg AUDUSD is calculated as product of AUD column and USD colum

I tried below

for col in df:
    for cols in df:
        cf[col cols]=df[col]*df[cols]

But it generates table with unneccessary values like AUDAUD, USDUSD or duplicate value like AUDUSD and USDAUD. I think if i can somehow set "cols =col 1 till end of df" in second for loop I should be able to resolve the issue. But i don't know how to do that ??

Result i am looking for is a table with below columns and their values

AUDUSD, AUDJPY, AUDEUR, USDJPY, USDEUR, JPYEUR

CodePudding user response:

You can use itertools.combinations with pandas.Series.mul and pandas.concat.

Try this :

from itertools import combinations
​
combos = list(combinations(df.columns, 2))
​
out = pd.concat([df[col[1]].mul(df[col[0]]) for col in combos], axis=1, keys=combos)
​
out.columns = out.columns.map("".join)

# Output :

print(out)
   AUDUSD  AUDJPY  AUDEUR  USDJPY  USDEUR  JPYEUR
0    0.67    93.8  0.7035     140    1.05   147.0

# Used input :

df = pd.DataFrame({'AUD': [0.67], 'USD': [1], 'JPY': [140], 'EUR': [1.05]})

CodePudding user response:

I thought it intuitive that your first approach was to use an inner / outer loop and think this solution works in the same spirit:

# Added a Second Row for testing
df = pd.DataFrame(
    {'AUD': [0.67, 0.91], 'USD': [1, 1], 'JPY': [140, 130], 'EUR': [1.05, 1]},
)

# Instantiated the Second DataFrame
cf = pd.DataFrame()

# Call the index of the columns as an integer
for i in range(len(df.columns)):

    # Increment the index   1, so you aren't looking at the same column twice
    # Also, limit the range to the length of your columns
    for j in range(i 1, len(df.columns)):
        print(f'{df.columns[i]}'   f'{df.columns[j]}')  # VERIFY
    
        # Create a variable of the column names mashed together
        combine = f'{df.columns[i]}'   f'{df.columns[j]}

        # Assign the rows to be a product of the mashed column series
        cf[combine] = df[df.columns[i]] * df[df.columns[j]]
    

print(cf)  # VERIFY

The console Log looks like this:

AUDUSD
AUDJPY
AUDEUR
USDJPY
USDEUR
JPYEUR
   AUDUSD  AUDJPY  AUDEUR  USDJPY  USDEUR  JPYEUR
0    0.67    93.8  0.7035     140    1.05   147.0
1    0.91   118.3  0.9100     130    1.00   130.0
  • Related