How to retain order in a list?-CodePudding

I want to return the string representing the list of the new names of all the photos in the same order as the original string. However, my final_string is currently in a different order.

def fetch_date_time(photo):
    return photo.split(", ")[2]

def prefixed_number(n, max_n):
    len_n = len(str(n))
    len_max_n = len(str(max_n))
    prefix = "".join(["0" for i in range(len_max_n - len_n)])   str(n)
    return prefix

def solution(S):
    list_of_pics = S.split("\n")
    city_dict = {}

    for pic in list_of_pics:
        city = pic.split(", ")[1]
        if city in city_dict:
            city_dict[city].append(pic)
        else:
            city_dict[city] = [pic]

    final_string = ""

    for city_group in city_dict:
        city_dict[city_group].sort(key=fetch_date_time)
        for ind, photo in enumerate(city_dict[city_group]):
            city = photo.split(",")[1]
            ext = photo.split(", ")[0].split(".")[-1]
            max_len = len(city_dict[city_group])
            number = prefixed_number(ind   1, max_len)
            city_dict[city_group][ind] = city   number   "."   ext   "\n"
        final_string  = "".join(city_dict[city_group])

    return final_string

string = """photo.jpg, Warsaw, 2013-09-05 14:08:15
john.png, London, 2015-06-20 15:13:22
myFriends.png, Warsaw, 2013-09-05 14:07:13
Eiffel.jpg, Paris, 2015-07-23 08:03:02
pisatower.jpg, Paris, 2015-07-22 23:59:59
BOB.jpg, London, 2015-08-05 00:02:03
notredame.png, Paris, 2015-09-01 12:00:00
me.jpg, Warsaw, 2013-09-06 15:40:22
a.png, Warsaw, 2016-02-13 13:33:50
b.jpg, Warsaw, 2016-01-02 15:12:22
c.jpg, Warsaw, 2016-01-02 14:34:30
d.jpg, Warsaw, 2016-01-02 15:15:01
e.png, Warsaw, 2016-01-02 09:49:09
f.png, Warsaw, 2016-01-02 10:55:32
g.jpg, Warsaw, 2016-02-29 22:13:11"""

print(solution(string))

My current output:

Warsaw01.png
 Warsaw02.jpg
 Warsaw03.jpg
 Warsaw04.png
 Warsaw05.png
 Warsaw06.jpg
 Warsaw07.jpg
 Warsaw08.jpg
 Warsaw09.png
 Warsaw10.jpg
 London1.png
 London2.jpg
 Paris1.jpg
 Paris2.jpg
 Paris3.png

Expected output:

Warsaw02.jpg
London1.png
Warsaw01.png
Paris2.jpg
Paris1.jpg
London2.jpg
Paris3.png
Warsaw03.jpg
Warsaw09.png
Warsaw07.jpg
Warsaw06.jpg
Warsaw08.jpg
Warsaw04.png
Warsaw05.png
Warsaw10.jpg

CodePudding user response：

Below code may help.

string = """photo.jpg, Warsaw, 2013-09-05 14:08:15
john.png, London, 2015-06-20 15:13:22
myFriends.png, Warsaw, 2013-09-05 14:07:13
Eiffel.jpg, Paris, 2015-07-23 08:03:02
pisatower.jpg, Paris, 2015-07-22 23:59:59
BOB.jpg, London, 2015-08-05 00:02:03
notredame.png, Paris, 2015-09-01 12:00:00
me.jpg, Warsaw, 2013-09-06 15:40:22
a.png, Warsaw, 2016-02-13 13:33:50
b.jpg, Warsaw, 2016-01-02 15:12:22
c.jpg, Warsaw, 2016-01-02 14:34:30
d.jpg, Warsaw, 2016-01-02 15:15:01
e.png, Warsaw, 2016-01-02 09:49:09
f.png, Warsaw, 2016-01-02 10:55:32
g.jpg, Warsaw, 2016-02-29 22:13:11"""

class row:
  def __init__(self, image, city, date):
    self.image=image
    self.city=city
    self.date=date

def read_rows(text):
  rows=[]
  for line in text.split('\n'):
    image,city,date=line.split(',')
    rows.append(row(image,city,date))
  return rows

def rename_city(rows):
  known_cities={}
  for row in rows:
    if row.city in known_cities:
      known_cities[row.city] =1
      row.city="%sd"%(row.city,known_cities[row.city])
    else:
      known_cities[row.city]=1
      row.city ="01"
def get_citynames(rows):
  cities=[]
  for row in rows:
    cities.append(row.city)
  return cities

def solution(input):
  rows=read_rows(input)
  sorted_rows=sorted(rows, key=lambda x: x.date)
  rename_city(sorted_rows)
  return get_citynames(rows)


print("\n".join(solution(string)))

Output

 Warsaw02
 London01
 Warsaw01
 Paris02
 Paris01
 London02
 Paris03
 Warsaw03
 Warsaw09
 Warsaw07
 Warsaw06
 Warsaw08
 Warsaw04
 Warsaw05
 Warsaw10

CodePudding user response：

To solve this problem you need:

Group your data by city;
Sort entries belong to same city by date;
Generate new filenames and get back to original order.

First of all, we need to split each line of your string by ", ":

lines = [s.split(", ") for s in string.splitlines()]

To group our lines by city we can use two different methods:

1.1. Make a dictionary where city will be a unique key and value will be list of all lines with this city:

grouped_photos = {}
for line in lines:
    city = line[1]
    if city in grouped_photos:
        grouped_photos[city].append(line)
    else:
        grouped_photos[city] = [line]

Here you can notice that there's no sense to generate lines if proceed with this method as it leads to one useless iteration, we can iterate over string.splitlines():

grouped_photos = {}
for line in string.splitlines():
    splitted = line.split(", ")
    city = splitted[1]
    if city in grouped_photos:
        grouped_photos[city].append(splitted)
    else:
        grouped_photos[city] = [splitted]

Also we can shorten code a bit using defaultdict:

from collections import defaultdict

...

grouped_photos = defaultdict(list)
for line in string.splitlines():
    splitted = line.split(", ")
    grouped_photos[splitted[1]].append(splitted)

1.2. Use groupby(). The main difference from previous method is that groupby() requires sorted data.

from itertools import groupby
from operator import itemgetter

...

lines.sort(key=itemgetter(1))
grouped_photos = {c: list(p) for c, p in groupby(lines, itemgetter(1))}

I've used dict comprehension only as temporary storage of groupby() return, we won't need it later.

Now we need to sort every list with same city by date. The common way to compare dates stored in string (which is necessary for sorting) is to initialize datetime object using some format with datetime.strptime() or with datetime.fromisoformat() if string matches standard format.
```
from datetime import datetime

...

grouped_photos["Warsaw"].sort(key=lambda x: datetime.fromisoformat(x[2]))
```
But with format you have we can also exploit lexicographic_order which python uses to compare sequences (string is sequence too). It means that we don't need to modify our date string just leave it as it is.
```
grouped_photos["Warsaw"].sort(key=itemgetter(2))
```
So, basically we need to sort every value in grouped_photos:
```
for value in grouped_photos.values():
    value.sort(key=itemgetter(2))
```
To generate new filenames and put them in original order firstly we need to store original list index. For this we should modify initial data split to include also an index of line:
```
lines = [s.split(", ")   [i] for i, s in enumerate(string.splitlines())]
```
Size of our result list will be exactly the same as in source, so to not use sorting again we can initialize result list as list on None values with same length with lines, then iterate over grouped_photos and save generated filename to initial index.

To generate filename we need name of city, index in sorted list and original file extension. To extract file extension from filename we can use splitext() or simply call str.rsplit():
```
from os.path import splitext

ext = splitext("pisatower.jpg")[1]
# OR
ext = "."   "pisatower.jpg".rsplit(".", 1)[1]
```
Let's restore original order and set new filenames:
```
from os.path import splitext

...

result = [None] * len(lines)
for photos in grouped_photos.values():
    for i, (name, city, _, index) in enumerate(photos, 1):
        result[index] = f"{city}{i}{splitext(name)[1]}"
```
The only thing left is zero-padding of index. Length of list is a maximum index, so maximum width we can obtain using string length of length of each list. There are plenty of ways to pad number, I'll use extended format syntax in this example:
```
for photos in grouped_photos.values():
    padding = len(str(len(photos)))
    for i, (name, city, _, index) in enumerate(photos, 1):
        result[index] = f"{city}{i:0{padding}}{splitext(name)[1]}"
```

Now we need to combine all together. Using common sense and basic knowledge about loops we can combine code above with certain optimizations:

from operator import itemgetter
from itertools import groupby
from os.path import splitext

string = """photo.jpg, Warsaw, 2013-09-05 14:08:15
john.png, London, 2015-06-20 15:13:22
myFriends.png, Warsaw, 2013-09-05 14:07:13
Eiffel.jpg, Paris, 2015-07-23 08:03:02
pisatower.jpg, Paris, 2015-07-22 23:59:59
BOB.jpg, London, 2015-08-05 00:02:03
notredame.png, Paris, 2015-09-01 12:00:00
me.jpg, Warsaw, 2013-09-06 15:40:22
a.png, Warsaw, 2016-02-13 13:33:50
b.jpg, Warsaw, 2016-01-02 15:12:22
c.jpg, Warsaw, 2016-01-02 14:34:30
d.jpg, Warsaw, 2016-01-02 15:15:01
e.png, Warsaw, 2016-01-02 09:49:09
f.png, Warsaw, 2016-01-02 10:55:32
g.jpg, Warsaw, 2016-02-29 22:13:11"""

lines = [s.split(", ")   [i] for i, s in enumerate(string.splitlines())]
lines.sort(key=itemgetter(1, 2))
result = [None] * len(lines)
for city, [*photos] in groupby(lines, itemgetter(1)):
    padding = len(str(len(photos)))
    for i, (name, _, _, index) in enumerate(photos, 1):
        result[index] = f"{city}{i:0{padding}}{splitext(name)[1]}"

I've noticed that you haven't used any import in your code, maybe it's some weird requirement, so here is same code without imports and syntax sugar:

string = """photo.jpg, Warsaw, 2013-09-05 14:08:15
john.png, London, 2015-06-20 15:13:22
myFriends.png, Warsaw, 2013-09-05 14:07:13
Eiffel.jpg, Paris, 2015-07-23 08:03:02
pisatower.jpg, Paris, 2015-07-22 23:59:59
BOB.jpg, London, 2015-08-05 00:02:03
notredame.png, Paris, 2015-09-01 12:00:00
me.jpg, Warsaw, 2013-09-06 15:40:22
a.png, Warsaw, 2016-02-13 13:33:50
b.jpg, Warsaw, 2016-01-02 15:12:22
c.jpg, Warsaw, 2016-01-02 14:34:30
d.jpg, Warsaw, 2016-01-02 15:15:01
e.png, Warsaw, 2016-01-02 09:49:09
f.png, Warsaw, 2016-01-02 10:55:32
g.jpg, Warsaw, 2016-02-29 22:13:11"""

grouped_photos = {}
for i, line in enumerate(string.splitlines()):
    splitted = line.split(", ")   [i]
    city = splitted[1]
    if city in grouped_photos:
        grouped_photos[city].append(splitted)
    else:
        grouped_photos[city] = [splitted]

result = [None] * (i   1)
for photos in grouped_photos.values():
    photos.sort(key=lambda x: x[2])
    padding = len(str(len(photos)))
    for i, (name, city, _, index) in enumerate(photos, 1):
        result[index] = city   str(i).zfill(padding)   "."   name.rsplit(".", 1)[1]

Add print(*result, sep="\n") to any of versions to get output in console.

Output:

Warsaw02.jpg
London1.png
Warsaw01.png
Paris2.jpg
Paris1.jpg
London2.jpg
Paris3.png
Warsaw03.jpg
Warsaw09.png
Warsaw07.jpg
Warsaw06.jpg
Warsaw08.jpg
Warsaw04.png
Warsaw05.png
Warsaw10.jpg

You can help my country, check my profile info.