Home > Mobile >  How to group same words in dictionary in Pandas?
How to group same words in dictionary in Pandas?


I have a German to English dictionary with multiple entries for some words. I want to group those entries such that the English translations for the same german word are separated by a comma.

I have the following dataframe:

Deutsch                Englisch
spindeldürr            spindly
Garn {n} [auch fig.]   yarn
Schnur {f}             twine
Naht {f}               suture
zunähen                to suture
Faden {m}              strand [thread]
Faden {m}              thread [also fig.: of conversation]
Flussbett {n}          riverbed
Flussbett {n}          channel [of a river]
streuen                to strew

And I want to produce:

Deutsch                Englisch
spindeldürr            spindly
Garn {n} [auch fig.]   yarn
Schnur {f}             twine
Naht {f}               suture
zunähen                to suture
Faden {m}              strand [thread], thread [also fig.: of conversation]
Flussbett {n}          riverbed, channel [of a river]
streuen                to strew

I created this dataframe from a .txt file using the following code:

import pandas as pd

df = pd.read_csv('test.txt', delimiter='::')
df.columns = df.columns.str.strip()

How can I achieve this using Pandas or other common packages?

CodePudding user response:

Try groupby:

# Old versions of Pandas
>>> df.groupby('Deutsch', sort=False)['Englisch'].agg(', '.join).reset_index()

# Newer versions
>>> df.groupby('Deutsch', sort=False, as_index=False)['Englisch'].agg(', '.join)

                Deutsch                                           Englisch
0           spindeldürr                                            spindly
1  Garn {n} [auch fig.]                                               yarn
2            Schnur {f}                                              twine
3              Naht {f}                                             suture
4               zunähen                                          to suture
5             Faden {m}  strand [thread], thread [also fig.: of convers...
6         Flussbett {n}                     riverbed, channel [of a river]
7               streuen                                           to strew
  • Related