got a problem with showing any matching value the same row as my lookup value, so i.e.
my first table:
some_primary_key |
---|
unique_value_1 |
unique_value_2 |
unique_value_3 |
other table:
some_primary_key | values |
---|---|
unique_value_1 | some_value_1 |
unique_value_1 | some_value_2 |
unique_value_2 | some_value_3 |
unique_value_2 | some_value_4 |
unique_value_3 | some_value_5 |
unique_value_3 | some_value_6 |
and finally I'd like to have this:
some_primary_key | values |
---|---|
unique_value_1 | some_value_1, some_value_2 |
unique_value_2 | some_value_3, some_value_4 |
unique_value_3 | some_value_5, some_value_6 |
Should I use list comprehension iterating through df items and create list of lists of matching values? Any ideas?
Answer:
It is my sample solution:
import pandas as pd
data = {'some_primary_key':['unique_value_1',
'unique_value_2',
'unique_value_3']*2,
'values':['some_value_1', 'some_value_3', 'some_value_5',
'some_value_2', 'some_value_4', 'some_value_6']
}
df = pd.DataFrame(data=data)
list_of_values = []
for item in df['some_primary_key']:
filtered_values = df[df['some_primary_key']==item]
list_of_values.append(','.join(x for x in filtered_values['values']))
df['values'] = list_of_values
df = df.drop_duplicates()
print(df)
Any other, neat solutions? :)
CodePudding user response:
Possible alternatives could be:
import pandas as pd
data = {
'some_primary_key':['unique_value_1', 'unique_value_2', 'unique_value_3']*2,
'values':['some_value_1', 'some_value_3', 'some_value_5', 'some_value_2', 'some_value_4', 'some_value_6']
}
df = pd.DataFrame(data=data)
df = df.groupby(['some_primary_key'])['values'].apply(lambda x: ', '.join(x)).reset_index()
# OR
df = df.groupby(['some_primary_key'])['values'].apply(', '.join).reset_index()
# OR
df = df.groupby('some_primary_key')['values'].apply(', '.join) # Will return series data
# OR
df = df.groupby('some_primary_key')['values'].agg(', '.join) # Will return series data
# OR
df = df.groupby(['some_primary_key'], as_index = False).agg({'values': ', '.join})
# OR
df = df.groupby('some_primary_key').agg({'values': ', '.join}).reset_index()
print(df)
Output:
some_primary_key values
0 unique_value_1 some_value_1, some_value_2
1 unique_value_2 some_value_3, some_value_4
2 unique_value_3 some_value_5, some_value_6
CodePudding user response:
What you are looking for is groupby
. You can apply any custom function with transform
after grouping them over some_primary_key
.
You can try this:
concat_func = lambda x: ','.join(map(str, x.sort_values(ascending=True).unique()))
df['values'] = df.groupby(['some_primary_key'])['values'].transform(concat_func)
Then, df
will have concatenated values
for each some_primary_key
which makes duplicated values
. Therefore, just remove the duplicated rows:
df = df.drop_duplicates()
Output:
some_primary_key values
0 unique_value_1 some_value_1,some_value_2
1 unique_value_2 some_value_3,some_value_4
2 unique_value_3 some_value_5,some_value_6
PS:
In concat_func
, sort_values()
and unique()
methods are applied to have a nicer view and prevent occurrence of the same values
in a row. Otherwise, if df
is:
some_primary_key values
0 unique_value_1 some_value_1
1 unique_value_1 some_value_1
the output will be:
some_primary_key values
0 unique_value_1 some_value_1,some_value_1
If this is the desired output just use following concat_func
:
concat_func = lambda x: ','.join(map(str, x))