I have a dataframe that contains two columns.
Then a conditional statement is defined. (Change the value stored in the second column to 'ON' if the value in the first column is larger or equal to 13.)
So I would like to save the dataframe into a .csv file if a change has taken place. (based on the comparison)
import pandas as pd
data = {'numbers': [11, 12, 13, 14, 15],
'switch' : ['OFF', 'OFF', 'OFF', 'OFF', 'OFF']}
df = pd.DataFrame(data)
df.loc[df['numbers'] >= 13, 'switch'] = 'ON'
print (df)
Output:
numbers switch
0 11 OFF
1 12 OFF
2 13 ON
3 14 ON
4 15 ON
The first change is taking place in row 3. That is the point when I would like to save the dataframe first. Then carry on the comparison and save again when the change is detected in row 4, then again in row 5. (The dataframe would be saved and overwritten 3 times.)
If that is too troublesome to implement saving the changed dataframe ONCE in the end when the comparison finished running would suffice.
If the question is not clear enough, please do not hesitate to ask for clarification and I will try my best to provide additional information.
CodePudding user response:
You are changing the switch values in a vectorized operation, so there is no way to pinpoint a time when the first value is changed. I would thus go for the other idea you suggested, i.e. save the changed dataframe in the end, if there has been a change. You could do that like this:
import pandas as pd
data = {'numbers': [11, 12, 13, 14, 15],
'switch' : ['OFF', 'OFF', 'OFF', 'OFF', 'OFF']}
df = pd.DataFrame(data)
df_new = df.copy()
df_new.loc[df_new['numbers'] >= 13, 'switch'] = 'ON'
if (df_new.switch != df.switch).any():
df_new.to_csv('data_updated.csv')