python stock data, fill open with previous close-CodePudding

I import a csv with s&p 500 data and some of the opening price data is set to 0.

import pandas as pd

#import spx data
spx = pd.read_csv('C:/Users/joshu/Desktop/SPX1970.csv')
spx['DateTime'] = pd.to_datetime(spx['Date'],utc=False, format="%d-%b-%y")
spx.sort_values(by=['DateTime'],ascending=True, inplace=True)
spx

and the output is below. Some of the opening price data is missing

    Date    Open    High    Low     Close   DateTime
13285   5-Jan-70    0   94.25   92.53   93.46   1970-01-05
13284   6-Jan-70    0   93.81   92.13   92.82   1970-01-06
13283   7-Jan-70    0   93.38   91.93   92.63   1970-01-07
13282   8-Jan-70    0   93.47   91.99   92.68   1970-01-08
13281   9-Jan-70    0   93.25   91.82   92.4    1970-01-09
...     ...     ...     ...     ...     ...     ...
4   29-Aug-22   4,034.58    4,062.99    4,017.42    4,030.61    2022-08-29
3   30-Aug-22   4,041.25    4,044.98    3,965.21    3,986.16    2022-08-30
2   31-Aug-22   4,000.67    4,015.37    3,954.53    3,955.00    2022-08-31
1   1-Sep-22    3,936.73    3,970.23    3,903.65    3,966.85    2022-09-01
0   2-Sep-22    3,994.66    4,018.43    3,906.21    3,924.26    2022-09-02

13286 rows × 6 columns

I could easily use excel and assign my 0 - in the open column - to the previous close but would like to figure out how to do in with python.

I looked at using various methods from the "working with missing data" section of the pandas documentation but that doesn't really fit my case. Searched the web, not really finding my case either. So I wrote the following, which is not working. The if condition for open = = 0 is never true

# for open value = 0 copy previous close into open and copy this row's close for next iteration

    for index, row in spx.iterrows():
        
        if row['Open'] == 0:
            row['Open'] = C
            C = row['Close']
            print (C) # I added this to see if this was ever true
        else:
            C = row['Close']
            continue
    spx

Any help is appreciated.

EDIT: I corrected my data types and the logic prints out the close if the open is 0 - so i believe it works so I'm stuck at row['Open'] = C not working.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13287 entries, 0 to 13286
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   index     13287 non-null  int64         
 1   Date      13287 non-null  object        
 2   Open      13287 non-null  float64       
 3   High      13287 non-null  float64       
 4   Low       13287 non-null  float64       
 5   Close     13287 non-null  float64       
 6   DateTime  13287 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(4), int64(1), object(1)
memory usage: 726.8  KB

CodePudding user response：

There are three main problems you are trying to solve, and I believe this code covers the three. See the comments in the below code for what each of the steps do.

import pandas as pd


df = pd.DataFrame({"a": [7, 0, 0, 3], "b": [2, 4, 6, 8]})

# Find all instances of zeros in your data
zeros = (df['a'] == 0)


# Set those to None, which is what fillna wants. Note you can use masks to do the following too
df.loc[zeros, 'a'] = None

# Make a shifted copy of b - this covers your "previous close value" requirement
previous_b = df['b'].shift()

# Replace missing values (Nones) in a with the value in b
df['a'] = df['a'].fillna(previous_b)