Home > Software engineering >  How to update a df using a for loop and arrays on Python?
How to update a df using a for loop and arrays on Python?

Time:03-12

Suppose that I create the following df:

import pandas as pd

#column names
column_names = ["Time", "Currency", "Volatility expected", "Event", "Actual", "Forecast", "Previous"]

#create a dataframe including the column names
df = pd.DataFrame(columns=column_names)

Then, I create the following array that will have the cell values to add to my df:

rows = ["2:00", "GBP", "", "Construction Output (MoM) (Jan)", "1.1%", "0.5%", "2.0%",
        "2:00", "GBP", "", "U.K. Construction Output (YoY) (Jan)", "9.9%", "9.2%", "7.4%"]

So, how can I use a for loop to update my df so it ends up like this:

|Time   |Currency  |Volatility expected    |Event                               |Actual   |Forecast   |Previous  |
------------------------------------------------------------------------------------------------------------------
|02:00  |GBP       |                       |Construction Output (MoM) (Jan)     |1.1%     |0.5%       |2.0%      |
|04:00  |GBP       |                       |U.K. Construction Output (YoY) (Jan)|9.9%     |9.2%       |7.4%      |

I tried:

column_name_location = 0
for row in rows:
    df.at['0', df[column_name_location]] = row
    column_name_location  = 1

print(df)

But got:

KeyError: 0

May I get some advice here?

CodePudding user response:

If rows is a flat list of items, you can convert it to a numpy array to reshape it first

Assuming rows is actualy a list of sub-lists, each sub-list being a row, you can create a pd.Series from each row using the dataframe's column names as the Series's index, and then use df.append to append them all:

df.append([pd.Series(r, index=df.columns) for r in rows])

If rows is actually just a flat list, you'll need to convert it to a numpy array to reshape it:

rows = np.array(rows).reshape(-1, 7).tolist()

CodePudding user response:

It looks like you have created one list containing 14 items. You could instead make it as a list containing 2 items where each item is a list with 7 values.

rows = [["2:00", "GBP", "", "Construction Output (MoM) (Jan)", "1.1%", "0.5%", "2.0%"],
       ["2:00", "GBP", "", "U.K. Construction Output (YoY) (Jan)", "9.9%", "9.2%", "7.4%"]]

With this, we can create a dataframe directly as shown below

df = pd.DataFrame(rows, columns=column_names)
print(df)

This outputs 2 rows

   Time Currency Volatility expected                                 Event Actual Forecast Previous
0  2:00      GBP                           Construction Output (MoM) (Jan)   1.1%     0.5%     2.0%
1  2:00      GBP                      U.K. Construction Output (YoY) (Jan)   9.9%     9.2%     7.4%
  • Related