Want to merge 2 rows with nearly the same time index-CodePudding

So I'm working on a script in python using pandas where I sometimes get an output that looks like this:

Time index	Value 1	Value 2
19:30	5	0
21:30	4	0
21:31	"NaN"	1
23:30	4	0

Basically the 21:30/21:31 values correspond to each other but are misplaced due to faulty sensors or rounding errors. I was wondering if anyone know of a way to merge two rows when their time indexes are so similar (so within a minute of one another)? This would look as following:

Time index	Value 1	Value 2
19:30	5	0
21:30	4	1
23:30	4	0

Thanks for the help in advance!

CodePudding user response：

You can use a custom group with aggregation to first/max:

# form new group if diff > 2min
m = ~pd.to_datetime(df['Time index']).diff().le('2min')

# aggregate all columns with max except "Time index"
agg_f = {c: 'max' for c in df}
agg_f['Time index'] = 'first'

# groupby   aggregation
out = df.groupby(m.cumsum(), as_index=False).agg(agg_f)

NB. assuming NaN is a real NaN, else replace as first step: .replace({'"NaN"', pd.NA}).

output:

  Time index  Value 1  Value 2
0      19:30      5.0        0
1      21:30      4.0        1
2      23:30      4.0        0

CodePudding user response：

Code:

#Create dict by unique time, where time is key and set two list as zero
df_dic = {i[0]:[0,0] for i in [row.tolist() for i, row in df.iterrows()]}

#running the loop over df rows 
#compare the values of dict and df by time if its bigger then update
for i in [row.tolist() for i, row in df.iterrows()]:
    for j in range(1,len(i)):
        if i[j] > df_dic[i[0]][j-1]:
            df_dic[i[0]][j-1] = i[j]

#Convert dict to list and list to DF
pd.DataFrame([sum([[k],v],[]) for k,v in df_dic.items()],columns=df.columns)

Output:

   Timeindex    Value 1 Value 2
0   19:30           5.0     0.0
1   21:30           4.0     1.0
3   23:30           4.0     0.0