Home > Mobile >  replace substrings of elements within list and keep original elements
replace substrings of elements within list and keep original elements

Time:07-06

I have a list names.

names = ['Dr. Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Dirk Adams, GRÜNE', 'Blechschmidt, DIE LINKE', 'Steffen Harzer, LINKE', 'Gerd Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'David-Christian Eckardt, SPD', 'Christine Ursula Klaus, SPD', 'Klaus von der Krone, CDU', 'Antje Ehrlich-Strathausen, SPD', 'Benno Lemke, PDS']

names = [re.sub('(?<!DIE)\sLINKE', ' DIE LINKE', line) for line in names]
names = [re.sub('(?<!DIE)\sGRÜNE', ' BÜNDNIS 90/DIE GRÜNEN', line) for line in names]
names = [re.sub('Die Linke', 'DIE LINKE', line) for line in names]
names = [re.sub('PDS', 'DIE LINKE', line) for line in names]
names = [re.sub('Dr.\s', '', line) for line in names]
actual_names = [re.sub('((?:^|(?:[.!?]\s))(\w )\s)', '', line) for line in names]

print(actual_names)

actual_names = ['Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Adams, BÜNDNIS 90/DIE GRÜNEN', 'Blechschmidt, DIE LINKE', 'Harzer, DIE LINKE', 'Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'David-Christian Eckardt, SPD', 'Ursula Klaus, SPD', 'von der Krone, CDU', 'Ehrlich-Strathausen, SPD', 'Lemke, DIE LINKE']

Questions:

  1. How do i need to change the regex in order to account for the names that have a - within them (see 'David-Christian Eckardt, SPD'
  2. How do i need to change the code in order to keep the original elements?

desired_names = ['Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Adams, BÜNDNIS 90/DIE GRÜNEN', 'Adams, GRÜNE', 'Blechschmidt, DIE LINKE', 'Harzer, DIE LINKE', 'Harzer, LINKE', 'Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'Eckardt, SPD', 'Klaus, SPD', 'von der Krone, CDU', 'Ehrlich-Strathausen, SPD', 'Lemke, PDS', 'Lemke, DIE LINKE']

Order within list does not matter

CodePudding user response:

Is regex in this case necessary? You can use str.split with maxsplit=1 parameter:

names = [
    "Dr. Augsten, BÜNDNIS 90/DIE GRÜNEN",
    "Dirk Adams, GRÜNE",
    "Blechschmidt, DIE LINKE",
    "Steffen Harzer, LINKE",
    "Gerd Schuchardt, Minister für Wissenschaft, Forschung und Kultur",
    "David-Christian Eckardt, SPD",
    "Christine Ursula Klaus, SPD",
    "Klaus von der Krone, CDU",
    "Antje Ehrlich-Strathausen, SPD",
    "Benno Lemke, PDS",
]

m = {"LINKE": "DIE LINKE", "GRÜNE": "BÜNDNIS 90/DIE GRÜNEN", "PDS": "DIE LINKE"}

out = [n.split(", ", maxsplit=1) for n in names]
out = [", ".join([a.split()[-1], m.get(b, b)]) for a, b in out]

print(out)

Prints:

[
    "Augsten, BÜNDNIS 90/DIE GRÜNEN",
    "Adams, BÜNDNIS 90/DIE GRÜNEN",
    "Blechschmidt, DIE LINKE",
    "Harzer, DIE LINKE",
    "Schuchardt, Minister für Wissenschaft, Forschung und Kultur",
    "Eckardt, SPD",
    "Klaus, SPD",
    "Krone, CDU",
    "Ehrlich-Strathausen, SPD",
    "Lemke, DIE LINKE",
]
  • Related