Home > Enterprise >  How to do math operations on a dataframe with an undefined number of columns?
How to do math operations on a dataframe with an undefined number of columns?

Time:12-10

I have a data frame in which there is an indefinite number of columns, to be defined later. Like this:

index GDP 2004 2005 ...
brasil 1000 0.10 0.10 ...
china 1000 0.15 0.10 ...
india 1000 0.05 0.10 ...
df = pd.DataFrame({'index': ['brasil', 'china', 'india'],
                   'GDP': [1000,1000,1000],
                   '2004': [0.10, 0.15, 0.5],
                   '2005': [0.10, 0.10, 0.10]})

Being the column GDP the initial GDP, and the columns from 2004 onwards being floats, representing percentages, relating to GDP growth in each year.

Using percentages to get the absolute number of the GDP in each year, based on initial GDP. I need a dataframe like this:

index GDP 2004 2005
brasil 1000 1100 1210
china 1000 1150 1265
india 1000 1050 1155

I tried to use itertuples, df.columns and for loops, but i probably missing something.

Remembering that there are an indefinite number of columns.

Thank you very much in advance!

CodePudding user response:

A simple way is to count the columns and loop over:

num = df.shape[1]
start = 2

for idx in range(start, num):
    df.iloc[:, idx] = df.iloc[:, idx-1] * (1 df.iloc[:, idx])

print(df)

which gives

    index   GDP    2004    2005
0  brasil  1000  1100.0  1210.0
1   china  1000  1150.0  1265.0
2   india  1000  1050.0  1155.0

CodePudding user response:

You can use df.columns to access a list of the dataframes columns.

Then you can do a loop over all of these column names. Here is an example of your data frame where I multiplied every value by 2. If you want to do different operations to different columns you can add conditions into the loop.

df = pd.DataFrame({'index': ['brasil', 'china', 'india'],
               'GDP': [1000,1000,1000],
               '2004': [0.10, 0.15, 0.5],
               '2005': [0.10, 0.10, 0.10]})


for colName in df.columns:
    df[colName] *= 2

print(df)

this returns...

          index   GDP  2004  2005
0  brasilbrasil  2000   0.2   0.2
1    chinachina  2000   0.3   0.2
2    indiaindia  2000   1.0   0.2

Hope this helps!

  • Related