First of all, sorry if the title is hard to understand.
Target:
I'm trying to group by source, type
columns, add the min,max
columns for each group by result
,and then add the related target
column to both min
and max
columns (in front of the value).
I can't figure out how to get the Pandas result in this format:
source | type | min | max |
---|---|---|---|
Person1 | bow | Person 2: 0.001 | Person 3: 0.05 |
I have a list of dictionaries as followed:
`[{'source': 'Person1', 'target': 'Person2', 'type': 'bow', 'similarity': 0.636}, {'source': 'Person1', 'target': 'Person2', 'type': 'bigram', 'similarity': 0.040}, {'source': 'Person1', 'target': 'Person2', 'type': 'tfidf', 'similarity': 0.433}, {'source': 'Person1', 'target': 'Person3', 'type': 'bow', 'similarity': 0.699}, {'source': 'Person1', 'target': 'Person3', 'type': 'bigram', 'similarity': 0.171}, {'source': 'Person1', 'target': 'Person3', 'type': 'tfidf', 'similarity': 0.522}]`
In table from this looks like:
source | target | type | similarity |
---|---|---|---|
Person1 | Person2 | bow | 0.636 |
Person1 | Person2 | bigram | 0.040 |
Person1 | Person2 | tfidf | 0.433 |
Person1 | Person3 | bow | 0.699 |
Person1 | Person3 | bigram | 0.171 |
Person1 | Person3 | tfidf | 0.522 |
For the group by, min/max I'm using the following:
df = df.groupby(['source','type']).similarity.agg(['min','max'])
Which results into:
source | type | min | max |
---|---|---|---|
Person1 | bow | 0.636 | 0.699 |
Person1 | bigram | 0.040 | 0.171 |
Person1 | tfidf | 0.433 | 0.522 |
All good to this point, but how do I get the output into the following structure:
[Source]:source;[Type]: type; [min]: Target:min(similarity); [max]: Target:max(similarity)
source | type | min | max |
---|---|---|---|
Person1 | bow | Person2: 0.636 | Person3: 0.699 |
Person1 | bigram | Person2: 0.040 | Person3: 0.171 |
Person1 | tfidf | Person3: 0.433 | Person3: 0.522 |
Am I supposed to use .loc to find the row for the min / max values and then somehow add those to the result?
CodePudding user response:
Here is an approach with GroupBy
and pandas.merge
:
g = df.groupby(by=['source', 'type'], sort=False)
out = (
pd.merge(df.loc[g['similarity'].idxmin()]
.rename(columns= {'similarity': 'sim_min', 'target': 'target_min'}),
df.loc[g['similarity'].idxmax()]
.rename(columns= {'similarity': 'sim_max', 'target': 'target_max'}),
on=['source','type'])
.assign(min=lambda x: x.pop('target_min') ': ' x.pop('sim_min').astype(str),
max=lambda x: x.pop('target_max') ': ' x.pop('sim_max').astype(str))
)
Another variant :
g = df.groupby(by=['source', 'type'], sort=False)
out = (
pd.merge(df.loc[g['similarity'].idxmin()]
.assign(min= lambda x: x[['target', 'similarity']]
.astype(str).agg(": ".join, axis=1)),
df.loc[g['similarity'].idxmax()]
.assign(max= lambda x: x[['target', 'similarity']]
.astype(str).agg(": ".join, axis=1)),
on=['source','type'], suffixes=('', '_'))
.loc[:, ['source', 'type', 'min', 'max']]
)
# Output :
print(out)
source type min max
0 Person1 bow Person2: 0.636 Person3: 0.699
1 Person1 bigram Person2: 0.04 Person3: 0.171
2 Person1 tfidf Person2: 0.433 Person3: 0.522
CodePudding user response:
Example
data = [['Person1', 'Person2', 'bow', 0.636],
['Person1', 'Person2', 'bigram', 0.04],
['Person1', 'Person2', 'tfidf', 0.433],
['Person1', 'Person3', 'bow', 0.699],
['Person1', 'Person3', 'bigram', 0.171],
['Person1', 'Person3', 'tfidf', 0.522]]
df = pd.DataFrame(data, columns=['source', 'target', 'type', 'similarity'])
df
source target type similarity
0 Person1 Person2 bow 0.6
1 Person1 Person2 bigram 0.0
2 Person1 Person2 tfidf 0.4
3 Person1 Person3 bow 0.7
4 Person1 Person3 bigram 0.2
5 Person1 Person3 tfidf 0.5
Process
df.groupby(['source','type']).agg([min, max])
result:
target similarity
min max min max
source type
Person1 bigram Person2 Person3 0.0 0.2
bow Person2 Person3 0.6 0.7
tfidf Person2 Person3 0.4 0.5
make result to your desired output
Add :
to value of target column of df, change value of result to str, and combine them.
Full Code and Output
(df.assign(target=df['target'] ': ')
.groupby(['source','type']).agg([min, max]).astype('str')
.groupby(level=1, axis=1, sort=False).sum().reset_index())
output:
source type min max
0 Person1 bigram Person2: 0.04 Person3: 0.171
1 Person1 bow Person2: 0.636 Person3: 0.699
2 Person1 tfidf Person2: 0.433 Person3: 0.522