I have a dataframe that have sorted codes according to volume and trade.
Code | Volume | Trade |
---|---|---|
ApplA | 500 | 1000 |
Amazon | 1000 | 500 |
250 | 750 | |
ApplE | 100 | 1500 |
df_samecompanies['Codes'] #This DF consist of same companies and their different shares.Like ApplA,ApplE and AmazA and AmazonB
df['Volume Order'] = df['Volume'].rank(ascending=False)
df['Trade Order'] = df['Trade'].rank(ascending=False)
df['Trade Order2'] = df['Trade'].rank(ascending=True)
df['Max Ordered Number'] = df[['Volume Order', 'Trade Order']].max(axis=1)
df['Final Sorted Number'] = df[['Max Ordered Number', 'Trade Order2']].apply(tuple, axis=1).rank(ascending=False)
df.drop('Trade Order2', axis=1)
When I try this code, my out
Code Volume Trade Value Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 <NA> 2 2 2 4
1 Amazon 1000 500 <NA> 1 4 4 2
2 Facebook 250 750 <NA> 3 3 3 3
3 ApplE 100 1500 <NA> 4 1 4 1
as you can see, 2 same code were in the final sorted range, how could I delete the second one. I mean that, APPLA and APPLE same code(starting with APPL). APPLA has "4 final sorted number." APPLE has "1 final sorted number". Since, they are same code(APPL), I want to delete from list the smallest one who has the "4 final sorted number".
I want to delete from my list who has the smallest final sorted number , if they are in the same codes(APPL)
**df_samecompanies['Codes'] #This DF consist of same companies and their different shares.Like ApplA,ApplE and AmazA and AmazonB
I have another dataframe which consist of same companies.(like amazA,amazonB and apple,appla). This df include only 'E' , 'D', 'A', 'B' as a last characters.
if same companies DF in DF(main df):
if last character == 'D' or 'A':
check final sorted number and who has smallest final sort number, delete another code from list**
df_samecompanies =
Codes | Code | Shares |
---|---|---|
ApplA | Apple | A |
Amazon | Amazon | Empty |
Empty | ||
ApplE | Apple | E |
AmazA | Amazon | A |
AmazonB | Amazon | B |
CodePudding user response:
Use df_companies
dataframe to group same rows and keep only the best ranking for each group.
Input data:
>>> df
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 2.0 2.0 2.0 4.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
3 ApplE 100 1500 4.0 1.0 4.0 1.0
>>> df_companies
Codes Code Shares
0 ApplA Apple A
1 Amazon Amazon Empty
2 Facebook Facebook Empty
3 ApplE Apple E
4 AmazA Amazon A
5 AmazonB Amazon B
out = df.sort_values('Final Sorted Number') \
.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2')) \
.drop_duplicates('Code2') \
.drop(columns=['Code2', 'Codes'])
Output result:
>>> out
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplE 100 1500 4.0 1.0 4.0 1.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
Step by step:
# Reduce number of columns for readability
>>> df = df[['Code', 'Final Sorted Number']]
Code Final Sorted Number
0 ApplA 4.0
1 Amazon 2.0
2 Facebook 3.0
3 ApplE 1.0
# Sort rows by 'Final Sorted Number'
>>> df = df.sort_values('Final Sorted Number')
Code Final Sorted Number
3 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0
0 ApplA 4.0
# Merge dataframes on a common key: 'Code' for left, 'Codes' for right
>>> df = df.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
3 ApplA 4.0 Apple ApplA
Now, we have 2 new columns from df_companies
: Code (renamed into Code2 by suffix) and Codes. Let's continue:
# Keep only the first row for each 'Code2' name
>>> df = df.drop_duplicates('Code2')
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
# Remove added columns from the other dataframe
>>> df = df.drop(columns=['Code2', 'Codes'])
Code Final Sorted Number
0 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0