Home > Mobile >  How to create a column that iterates every unique value for an item from a pandas dataset in python?
How to create a column that iterates every unique value for an item from a pandas dataset in python?

Time:09-03

I'm new to Python and could use some help.

Say I have a dataset that looks like this:

      Serial Number        Source
0          AB100          Donatelle
1          AB200          Qure
2          AB100          Donatelle
3          AB200          Qure
4          AB100          Grand Avenue
5          AB200          Eagle Services
6          AB300          Donatelle
7          AB300          Donatelle
8          AB100          Qure
9          AB100          Eagle Services

And I need to add a column to it, like this:

       Serial Number        Source         SN Data Sources
0          AB100          Donatelle        Donatelle, Grand Avenue, Qure, Eagle Services
1          AB200          Qure             Qure, Eagle Services
2          AB100          Donatelle        Donatelle, Grand Avenue, Qure, Eagle Services
3          AB200          Qure             Qure, Eagle Services
4          AB100          Grand Avenue     Donatelle, Grand Avenue, Qure, Eagle Services
5          AB200          Eagle Services   Qure, Eagle Services
6          AB300          Donatelle        Donatelle
7          AB300          Donatelle        Donatelle
8          AB100          Qure             Donatelle, Grand Avenue, Qure, Eagle Services
9          AB100          Eagle Services   Donatelle, Grand Avenue, Qure, Eagle Services

My knowledge is still limited, pardon me.

I'm working with a 40k rows dataframe and I need to generate a column that brings all different sources that are present in the dataframe for the serial number in each row.

Can anybody help me out? Thanks and kind regards

CodePudding user response:

You can groupby on "Serial Number" column and apply a list to "Source" column.

Next create a dictionary of the grouped df and convert it to a df.

Finally merge the dfs together and clean up the columns.

data = {
    "Serial Number": ["AB100", "AB200", "AB100", "AB200", "AB100", "AB200"],
    "Source": ["Donatelle", "Qure", "Grand Avenue", "Eagle Services", "Qure", "Grand Avenue"]
}
df = pd.DataFrame(data)

grouped_df = df.groupby("Serial Number")["Source"].apply(list).reset_index()
mapping = grouped_df.set_index("Serial Number")["Source"].to_dict()
mapping_df = pd.DataFrame.from_dict(mapping, orient="index").unstack().reset_index()

final_df = pd.merge(
    grouped_df,
    mapping_df,
    left_on="Serial Number",
    right_on="level_1"
).rename(columns={0: "Source", "Source": "SN Data Sources"}[["Serial Number", "Source", "SN Data Sources"]]

print(final_df)

  Serial Number          Source                       SN Data Sources
0         AB100       Donatelle       [Donatelle, Grand Avenue, Qure]
1         AB100    Grand Avenue       [Donatelle, Grand Avenue, Qure]
2         AB100            Qure       [Donatelle, Grand Avenue, Qure]
3         AB200            Qure  [Qure, Eagle Services, Grand Avenue]
4         AB200  Eagle Services  [Qure, Eagle Services, Grand Avenue]
5         AB200    Grand Avenue  [Qure, Eagle Services, Grand Avenue]

CodePudding user response:

df = pd.DataFrame(
    {'Serial Number': ['AB100', 'AB200', 'AB100', 'AB200', 'AB100', 'AB200', 'AB300', 'AB300', 'AB100', 'AB100'],
     'Source': ['Donatelle', 'Qure', 'Donatelle', 'Qure', 'Grand Avenue', 'Eagle Services', 'Donatelle', 'Donatelle',
                'Qure', 'Eagle Services']})

df = df.merge(df.groupby('Serial Number').agg(set)['Source'].apply(lambda x: ', '.join(x)), how='left',
              on='Serial Number').rename(columns={'Source_x': 'Source', 'Source_y': 'SN Data Sources'})
print(df)

Prints:

  Serial Number          Source                                SN Data Sources
0         AB100       Donatelle  Donatelle, Eagle Services, Grand Avenue, Qure
1         AB200            Qure                           Eagle Services, Qure
2         AB100       Donatelle  Donatelle, Eagle Services, Grand Avenue, Qure
3         AB200            Qure                           Eagle Services, Qure
4         AB100    Grand Avenue  Donatelle, Eagle Services, Grand Avenue, Qure
5         AB200  Eagle Services                           Eagle Services, Qure
6         AB300       Donatelle                                      Donatelle
7         AB300       Donatelle                                      Donatelle
8         AB100            Qure  Donatelle, Eagle Services, Grand Avenue, Qure
9         AB100  Eagle Services  Donatelle, Eagle Services, Grand Avenue, Qure
  • Related