Home > Enterprise >  Removing observation with min value in a column out of the dataframe one by one using loop in Python
Removing observation with min value in a column out of the dataframe one by one using loop in Python

Time:09-21

I have a dataframe "data" that looks like this:

f1 f2 f3
11 34 a
14 10 a
20 12 a
15 19 b
19 29 b
29 30 b

I want to find the minimum value of f2 IF f3 is a. I don't want to find min value of f2 for f3 = a or f3 = b, just when f3 = a. And then I want to remove the observation that is associated with that min value in f2 in the dataframe. So I had this code:

a_part = data[data['f3'] == 'a'
min1 = a_part['f2'].min()
min1 = data['f2'] = min1
data_new_1 = pd.dataframe(data.loc[~min1])

Which works well. And now my dataframe looks like:

f1 f2 f3
11 34 a
20 12 a
15 19 b
19 29 b
29 30 b

However, I want to remove the min value of f2 when f3 = a and the associated observation one by one by using a loop, and having a new dataframe each time. So essentially data_new_2 looks like:

f1 f2 f3
11 34 a
15 19 b
19 29 b
29 30 b

Until there's only b left in f3. I tried to do a loop for it:

for i in range(1,6):
    IN = data_new_i[['f3'] == 'a']
    min1 = a_part['f2'].min() 
    min1 = data_new_i['f2'] == min1
    vars()[data_new_i  ] = pd.DataFrame(data.loc[~min1])

And this doesn't work. I'm very unfamiliar with the way Python treats new dataframe name with loop index. I think I have to use dict to put new dataframe in, but I don't know how I can extract a column of a dataframe from a dict, and how I can save the new dataframe into the dict. Can someone help me please?

CodePudding user response:

In my solution ouput is list of DataFrames.

If there are always unique values in column f2 use loop by index values of sorted column and drop row by minimal value:

out = []
data1 = data.sort_values('f2')
for i in data1.loc[data1['f3'] == 'a', 'f2'].index:
    data = data.drop(i)
    out.append(data)
print (out)
[   f1  f2 f3
0  11  34  a
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b,    f1  f2 f3
0  11  34  a
3  15  19  b
4  19  29  b
5  29  30  b,    f1  f2 f3
3  15  19  b
4  19  29  b
5  29  30  b]

If possible duplicates and need remove all dupes, like here in first loop all rows with f2=10 use:

print (data)
   f1  f2 f3
0  11  10  a
1  14  10  a
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b

out = []
data1 = data.sort_values('f2')
for i, g in data1.groupby(data1.loc[data1['f3'] == 'a', 'f2']):
    data = data.drop(g.index)
    out.append(data)
print (out)
[   f1  f2 f3
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b,    f1  f2 f3
3  15  19  b
4  19  29  b
5  29  30  b]

It is not recommended, but possible create DataFrames by groups:

data1 = data.sort_values('f2')
for j, (i, g) in enumerate(data1.groupby(data1.loc[data1['f3'] == 'a', 'f2']), 1):
    data = data.drop(g.index)
    globals()[f'data_new_{j}'] = data
print (data_new_1)
   f1  f2 f3
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b

print (data_new_2)
   f1  f2 f3
3  15  19  b
4  19  29  b
5  29  30  b
  • Related