Home > Software design >  How to show multiple matches in the same row separated by comma in Pandas Dataframe?
How to show multiple matches in the same row separated by comma in Pandas Dataframe?

Time:09-21

got a problem with showing any matching value the same row as my lookup value, so i.e.

my first table:

some_primary_key
unique_value_1
unique_value_2
unique_value_3

other table:

some_primary_key values
unique_value_1 some_value_1
unique_value_1 some_value_2
unique_value_2 some_value_3
unique_value_2 some_value_4
unique_value_3 some_value_5
unique_value_3 some_value_6

and finally I'd like to have this:

some_primary_key values
unique_value_1 some_value_1, some_value_2
unique_value_2 some_value_3, some_value_4
unique_value_3 some_value_5, some_value_6

Should I use list comprehension iterating through df items and create list of lists of matching values? Any ideas?

Answer:

It is my sample solution:

import pandas as pd

data = {'some_primary_key':['unique_value_1',
                            'unique_value_2',
                            'unique_value_3']*2,
        'values':['some_value_1', 'some_value_3', 'some_value_5',
                  'some_value_2', 'some_value_4', 'some_value_6']
                      }
                                                                          
df = pd.DataFrame(data=data)


list_of_values = []
for item in df['some_primary_key']:
    filtered_values = df[df['some_primary_key']==item]
    list_of_values.append(','.join(x for x in filtered_values['values']))

df['values'] = list_of_values
df = df.drop_duplicates()
print(df)

Any other, neat solutions? :)

CodePudding user response:

Possible alternatives could be:

import pandas as pd

data = {
    'some_primary_key':['unique_value_1', 'unique_value_2', 'unique_value_3']*2,
    'values':['some_value_1', 'some_value_3', 'some_value_5', 'some_value_2', 'some_value_4', 'some_value_6']
}
                                                                      
df = pd.DataFrame(data=data)

df = df.groupby(['some_primary_key'])['values'].apply(lambda x: ', '.join(x)).reset_index()

# OR

df = df.groupby(['some_primary_key'])['values'].apply(', '.join).reset_index()

# OR

df = df.groupby('some_primary_key')['values'].apply(', '.join) # Will return series data

# OR

df = df.groupby('some_primary_key')['values'].agg(', '.join) # Will return series data

# OR

df = df.groupby(['some_primary_key'], as_index = False).agg({'values': ', '.join})

# OR

df = df.groupby('some_primary_key').agg({'values': ', '.join}).reset_index()

print(df)

Output:

  some_primary_key                     values
0   unique_value_1  some_value_1, some_value_2
1   unique_value_2  some_value_3, some_value_4
2   unique_value_3  some_value_5, some_value_6

CodePudding user response:

What you are looking for is groupby. You can apply any custom function with transform after grouping them over some_primary_key.

You can try this:

concat_func  = lambda x: ','.join(map(str, x.sort_values(ascending=True).unique()))
df['values'] = df.groupby(['some_primary_key'])['values'].transform(concat_func)

Then, df will have concatenated values for each some_primary_key which makes duplicated values. Therefore, just remove the duplicated rows:

df = df.drop_duplicates()

Output:

  some_primary_key                     values
0   unique_value_1  some_value_1,some_value_2
1   unique_value_2  some_value_3,some_value_4
2   unique_value_3  some_value_5,some_value_6

PS:

In concat_func, sort_values() and unique() methods are applied to have a nicer view and prevent occurrence of the same values in a row. Otherwise, if df is:

  some_primary_key        values
0   unique_value_1  some_value_1
1   unique_value_1  some_value_1

the output will be:

  some_primary_key                     values
0   unique_value_1  some_value_1,some_value_1

If this is the desired output just use following concat_func:

concat_func = lambda x: ','.join(map(str, x))
  • Related