Home > Blockchain >  Can those problematic elements be changed as zeros with a loop in order to keep the sustainability?
Can those problematic elements be changed as zeros with a loop in order to keep the sustainability?

Time:10-12

I have a dataframe which contain 'date' as index and 'Sample Value' feature. This dataframe shows sample results on these dates. However, some of the sample results are repeating since the samples could not be taken on the date. For instance, first sample was taken on 2019-08-17 07:30:00. Since the second sample could not be taken on 2019-08-17 08:00:00, it shows the result as same as the previous one. I need to change these values with zeros in order to clean the data but I could not figure out how to do it. Is there any possible way to make this happen? I will put the desired outcome as dataframe_desired

dataframe =     
date                    Sample Result
2019-08-17 07:30:00     548.700012
2019-08-17 08:00:00     548.700012
2019-08-17 08:30:00     548.700012
2019-08-17 09:00:00     553.099976
2019-08-17 09:30:00     555.346976
2019-08-17 10:00:00     548.700012
2019-08-17 10:30:00     548.700012
2019-08-17 11:00:00     546.750000
2019-08-17 11:30:00     546.750000

dataframe_desired = 

date                    Sample Result
2019-08-17 07:30:00     548.700012
2019-08-17 08:00:00     0.000000
2019-08-17 08:30:00     0.000000
2019-08-17 09:00:00     553.099976
2019-08-17 09:30:00     555.346976
2019-08-17 10:00:00     548.700012
2019-08-17 10:30:00     0.000000
2019-08-17 11:00:00     546.750000
2019-08-17 11:30:00     0.000000

CodePudding user response:

This should do the job...

dataframe["Dup Result"] = dataframe["Sample Result"].duplicated(keep='first')
dataframe["Clean Result"] = dataframe.apply(lambda row: 0.0 if row["Dup Result"] else row["Sample Result"], axis=1)
dataframe = dataframe.drop("Dup Result", axis=1)

CodePudding user response:

If I understoof your question correctly, you're trying to set values that are equal to the previous to zero. This can be done by with the diff method:

df.loc[df.diff().squeeze()==0]=0

Here diff returns a dataframe with diff_i,j = df_i,j - df_i-1,j. Squeezez is used to cast the single column dataframe to a serie so when can pass it as an index to loc. We then set the value 0 where the differences were 0.

CodePudding user response:

You can use diff:

df.loc[df['Sample Result'].diff().eq(0), 'Sample Result'] = 0
  • Related