Home > Back-end >  Python: Concatenating Strings from Grouped analysis
Python: Concatenating Strings from Grouped analysis

Time:06-22

I need some help concatenating some strings from a DF.

I have the below "original" df:

df = pd.DataFrame({'Material': [1,1,1,2,2,2,3,3,3,3],
                'BOM': ['A','B',np.nan,'A',np.nan,'C','A','A','B','C']})
Material BOM
0 1 A
1 1 B
2 1 NaN
3 2 A
4 2 NaN
5 2 C
6 3 A
7 3 A
8 3 B
9 3 C

Expected Result is:

Material BOM
1 A, B
2 A, C
3 A, B, C

I'm new with Python and am thinking of doing it with a mix of Loop and Dict but definitely think there must be a better way.

The tricky bit is that I think I need to group by material and check column "BOM" for the distinct options (different than NaN) then some sort of ", ".join them.

I've tried the following:

df.groupby('Material')['BOM'].apply(', '.join)

And got the error: "TypeError: sequence item 2: expected str instance, float found"

If anyone could help me I would be really grateful

CodePudding user response:

You can dropna and drop_duplicates, then GroupBy.agg:

df.dropna().drop_duplicates().groupby('Material').agg(', '.join)

or:

df.dropna().groupby('Material').agg(lambda s: ', '.join(s.drop_duplicates()))

output:

              BOM
Material         
1            A, B
2            A, C
3         A, B, C

CodePudding user response:

3 solutions depending on output requirements:

Using .agg: df.dropna().drop_duplicates().groupby('Material').agg(', '.join)

Output:

              BOM
Material         
1            A, B
2            A, C
3         A, B, C

Using .apply(list): df.dropna().drop_duplicates().groupby('Material')['BOM'].apply(list).reset_index()

Output:

   Material        BOM
0         1     [A, B]
1         2     [A, C]
2         3  [A, B, C]

Using .apply(set): df.dropna().groupby('Material')['BOM'].apply(set).reset_index()

Output:

   Material        BOM
0         1     {A, B}
1         2     {A, C}
2         3  {A, C, B}
  • Related