Home > Net >  What code will iterate rows using pandas and append the data to a new df?
What code will iterate rows using pandas and append the data to a new df?

Time:01-19

I am trying to reorganize a temperature data set to get it in the same format as other data sets I have been using. I am having trouble iterating through the data frame and appending the data to a new data frame.

  • Here is the data:
     Year   Jan   Feb   Mar    Apr    May    Jun    Jul    Aug    Sep    Oct   Nov   Dec
0    1901 -3.16 -4.14  2.05   6.85  13.72  18.27  22.22  20.54  15.30  10.50  2.60 -2.68
1    1902 -3.73 -2.67  1.78   7.62  14.35  18.21  20.51  19.81  14.97   9.93  3.20 -4.02
2    1903 -3.93 -4.39  2.44   7.18  13.07  17.22  20.25  19.67  15.00   9.35  1.52 -2.84
3    1904 -5.49 -3.92  1.83   7.22  13.46  17.78  20.22  19.25  15.87   9.60  3.20 -2.31
4    1905 -4.89 -4.40  4.54   8.01  13.20  18.24  20.25  20.21  16.15   8.42  3.47 -3.28
..    ...   ...   ...   ...    ...    ...    ...    ...    ...    ...    ...   ...   ...
116  2017 -2.07  1.77  3.84  10.02  14.21  19.69  22.57  20.38  17.15  10.85  4.40 -0.77
117  2018 -2.36 -0.56  3.39   7.49  16.39  20.09  22.39  21.01  17.57  10.37  2.48 -0.57
118  2019 -2.38 -1.85  2.93   9.53  14.10  19.21  22.38  21.31  18.41   9.37  3.00 -0.08
119  2020 -1.85 -0.98  4.50   8.34  14.61  19.66  22.42  21.69  16.75   9.99  4.92 -0.38
120  2021 -0.98 -3.86  3.94   8.41  14.06  20.63  22.22  21.23  17.48  11.47  3.54  0.88
  • Here is the code that I have tried:
df = pds.read_excel("Temp_Data.xlsx")
data = pds.dataframe()

for i in range(len(df)):
    data1 = df.iloc[i]
    data.append(data1)

  • Here is the result of that code:
print(data)

Feb       -0.72
Mar        0.75
Apr        6.77
May       14.44
Jun       18.40
Jul       20.80
Aug       20.13
Sep       16.17
Oct       10.64
Nov        2.71
Dec       -2.80
Name: 43, dtype: float64, Year    1945.00
Jan       -2.62
Feb       -0.75
Mar        4.00
Apr        7.29
May       12.31
Jun       16.98
Jul       20.76
Aug       20.11
Sep       16.08
Oct        9.82
Nov        2.09
Dec       -3.87

Note: for some reason the data starts at 1945 and goes to 2021.

  • Here is how I am trying to format the data eventually:
      Date   Temp
0   190101  -3.16
1   190102  -4.14
2   190103   2.05
3   190104   6.85
4   190105  13.72
5   190106  18.27
6   190107  22.22
7   190108  20.54
8   190109  15.30
9   190110  10.50
10  190111   2.60
11  190112  -2.68
12  190201  -3.73
13  190202  -2.67
14  190203   1.78
15  190204   7.62
16  190205  14.35
17  190206  18.21
18  190207  20.51
19  190208  19.81
20  190209  14.97
21  190210   9.93
22  190211   3.20
23  190212  -4.02

CodePudding user response:

You can use melt to reshape your dataframe then create the Date column from Year and Month columns:

months = {'Jan': '01', 'Feb': '02', 'Mar': '03', 'Apr': '04', 
          'May': '05', 'Jun': '06', 'Jul': '07', 'Aug': '08',
          'Sep': '09', 'Oct': '10', 'Nov': '11', 'Dec': '12'}

# Convert Year and Month columns to YYYMM
to_date = lambda x: x.pop('Year').astype(str)   x.pop('Month').map(months)

out = (df.melt(id_vars='Year', var_name='Month', value_name='Temp')
         .assign(Date=to_date).set_index('Date').sort_index().reset_index())

Output:

>>> out
       Date   Temp
0    190101  -3.16
1    190102  -4.14
2    190103   2.05
3    190104   6.85
4    190105  13.72
..      ...    ...
115  202108  21.23
116  202109  17.48
117  202110  11.47
118  202111   3.54
119  202112   0.88

[120 rows x 2 columns]
  • Related