Hello I want to tidy up this list to keep only the highest result. Example like: 'M', 'm' and 'O', 'o' are duplicate in the list I want to keep them as their greatest result.
Here is my list :
list_no_ok = [['A', 'a', 'Finished', '30 September 2020 12:46 PM', '8.08'],
['B', 'b', 'Finished', '30 September 2020 2:52 PM', '3.46'],
['J', 'j', 'In progress', '-', '-'],
['K', 'k', 'Finished', '1 October 2020 11:32 AM', '10.00'],
['M', 'm', 'Finished', '1 October 2020 2:12 PM', '6.15'],
['M', 'm', 'Finished', '1 October 2020 2:20 PM', '9.10'],
['N', 'n', 'Finished', '1 October 2020 3:38 PM', '7.69'],
['O', 'o', 'Finished', '1 October 2020 5:06 PM', '4.87'],
['O', 'o', 'Finished', '1 October 2020 5:37 PM', '5.90']]
and I want this as output:
list_ok = [['A', 'a', 'Finished', '30 September 2020 12:46 PM', '8.08'],
['B', 'b', 'Finished', '30 September 2020 2:52 PM', '3.46'],
['J', 'j', 'In progress', '-', '-'],
['K', 'k', 'Finished', '1 October 2020 11:32 AM', '10.00'],
['M', 'm', 'Finished', '1 October 2020 2:20 PM', '9.10'],
['N', 'n', 'Finished', '1 October 2020 3:38 PM', '7.69'],
['O', 'o', 'Finished', '1 October 2020 5:37 PM', '5.90']]
The problem is, there is a dash in a list and I don't know how to arrange the list while keeping it.
Thanks.
CodePudding user response:
I think the best way is to just loop over the list. Hopefully this works for you.
list_by_letters = {}
for element in list_no_ok:
if element[0] in list_by_letters:
element2 = list_by_letters[element[0]]
if '-' in element2 or '-' in element:
continue
if element[-1] > element2[-1]:
list_by_letters[element[0]] = element[1:]
continue
list_by_letters[element[0]] = element[1:]
list_ok = [[key] list_by_letters[key] for key in list_by_letters]
CodePudding user response:
The Python package pandas
(https://pandas.pydata.org/) is a great way to do this, especially when your 2d list gets very large.
To start we would need to import the package:
import pandas as pd
We can put your 2d list in a pandas Dataframe and give it some sensible colums:
list_no_ok = [['A', 'a', 'Finished', '30 September 2020 12:46 PM', '8.08'],
['B', 'b', 'Finished', '30 September 2020 2:52 PM', '3.46'],
['J', 'j', 'In progress', '-', '-'],
['K', 'k', 'Finished', '1 October 2020 11:32 AM', '10.00'],
['M', 'm', 'Finished', '1 October 2020 2:12 PM', '6.15'],
['M', 'm', 'Finished', '1 October 2020 2:20 PM', '9.10'],
['N', 'n', 'Finished', '1 October 2020 3:38 PM', '7.69'],
['O', 'o', 'Finished', '1 October 2020 5:06 PM', '4.87'],
['O', 'o', 'Finished', '1 October 2020 5:37 PM', '5.90']]
df = pd.DataFrame(data=list_no_ok, columns=["col1", "col2", "status", "date", "value"])
df["value"] = pd.to_numeric(df["value"], errors='ignore')
df = df.sort_values(by=["value"])
df
col1 col2 status date value
2 J j In progress - -
3 K k Finished 1 October 2020 11:32 AM 10.00
1 B b Finished 30 September 2020 2:52 PM 3.46
7 O o Finished 1 October 2020 5:06 PM 4.87
8 O o Finished 1 October 2020 5:37 PM 5.90
4 M m Finished 1 October 2020 2:12 PM 6.15
6 N n Finished 1 October 2020 3:38 PM 7.69
0 A a Finished 30 September 2020 12:46 PM 8.08
5 M m Finished 1 October 2020 2:20 PM 9.10
Here we have put the 2d list in a Dataframe, gave it column names, we converted the column value
from string to numbers. We have added errors='ignore'
for the rows that are still in progress. Finally we have sorted it from high to low based on the value
.
We have sorted it based value
so that we can make use of the drop_duplicates
function of pandas:
df = df.drop_duplicates(['col1','col2'],keep='first')
df = df.sort_values(by=['col1','col2'])
df
col1 col2 status date value
0 A a Finished 30 September 2020 12:46 PM 8.08
1 B b Finished 30 September 2020 2:52 PM 3.46
2 J j In progress - -
3 K k Finished 1 October 2020 11:32 AM 10.00
4 M m Finished 1 October 2020 2:12 PM 6.15
6 N n Finished 1 October 2020 3:38 PM 7.69
7 O o Finished 1 October 2020 5:06 PM 4.87
Rows get dropped when they have identical col1
and col2
. Whenever this happends, it keeps the first occurrence. In this case that is were value
is the highest since we sorted on value
before we dropped rows.
We can go back to your list_ok
by sorting again on col1
and col2
.
If you really want to go back to a 2d list that can be done with:
list_ok = df.values.tolist()
list_ok
[['A', 'a', 'Finished', '30 September 2020 12:46 PM', '8.08'],
['B', 'b', 'Finished', '30 September 2020 2:52 PM', '3.46'],
['J', 'j', 'In progress', '-', '-'],
['K', 'k', 'Finished', '1 October 2020 11:32 AM', '10.00'],
['M', 'm', 'Finished', '1 October 2020 2:12 PM', '6.15'],
['N', 'n', 'Finished', '1 October 2020 3:38 PM', '7.69'],
['O', 'o', 'Finished', '1 October 2020 5:06 PM', '4.87']]