I import a csv with s&p 500 data and some of the opening price data is set to 0.
import pandas as pd
#import spx data
spx = pd.read_csv('C:/Users/joshu/Desktop/SPX1970.csv')
spx['DateTime'] = pd.to_datetime(spx['Date'],utc=False, format="%d-%b-%y")
spx.sort_values(by=['DateTime'],ascending=True, inplace=True)
spx
and the output is below. Some of the opening price data is missing
Date Open High Low Close DateTime
13285 5-Jan-70 0 94.25 92.53 93.46 1970-01-05
13284 6-Jan-70 0 93.81 92.13 92.82 1970-01-06
13283 7-Jan-70 0 93.38 91.93 92.63 1970-01-07
13282 8-Jan-70 0 93.47 91.99 92.68 1970-01-08
13281 9-Jan-70 0 93.25 91.82 92.4 1970-01-09
... ... ... ... ... ... ...
4 29-Aug-22 4,034.58 4,062.99 4,017.42 4,030.61 2022-08-29
3 30-Aug-22 4,041.25 4,044.98 3,965.21 3,986.16 2022-08-30
2 31-Aug-22 4,000.67 4,015.37 3,954.53 3,955.00 2022-08-31
1 1-Sep-22 3,936.73 3,970.23 3,903.65 3,966.85 2022-09-01
0 2-Sep-22 3,994.66 4,018.43 3,906.21 3,924.26 2022-09-02
13286 rows × 6 columns
I could easily use excel and assign my 0 - in the open column - to the previous close but would like to figure out how to do in with python.
I looked at using various methods from the "working with missing data" section of the pandas documentation but that doesn't really fit my case. Searched the web, not really finding my case either. So I wrote the following, which is not working. The if condition for open = = 0 is never true
# for open value = 0 copy previous close into open and copy this row's close for next iteration
for index, row in spx.iterrows():
if row['Open'] == 0:
row['Open'] = C
C = row['Close']
print (C) # I added this to see if this was ever true
else:
C = row['Close']
continue
spx
Any help is appreciated.
EDIT:
I corrected my data types and the logic prints out the close if the open is 0 - so i believe it works so I'm stuck at row['Open'] = C
not working.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13287 entries, 0 to 13286
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 13287 non-null int64
1 Date 13287 non-null object
2 Open 13287 non-null float64
3 High 13287 non-null float64
4 Low 13287 non-null float64
5 Close 13287 non-null float64
6 DateTime 13287 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(4), int64(1), object(1)
memory usage: 726.8 KB
CodePudding user response:
There are three main problems you are trying to solve, and I believe this code covers the three. See the comments in the below code for what each of the steps do.
import pandas as pd
df = pd.DataFrame({"a": [7, 0, 0, 3], "b": [2, 4, 6, 8]})
# Find all instances of zeros in your data
zeros = (df['a'] == 0)
# Set those to None, which is what fillna wants. Note you can use masks to do the following too
df.loc[zeros, 'a'] = None
# Make a shifted copy of b - this covers your "previous close value" requirement
previous_b = df['b'].shift()
# Replace missing values (Nones) in a with the value in b
df['a'] = df['a'].fillna(previous_b)