Home > database >  Check if 2 or more items in array are the same and delete one of them according to sorting number -
Check if 2 or more items in array are the same and delete one of them according to sorting number -

Time:10-05

I have a dataframe that have sorted codes according to volume and trade.

Code Volume Trade
ApplA 500 1000
Amazon 1000 500
Facebook 250 750
ApplE 100 1500
df_samecompanies['Codes'] #This DF consist of same companies and their different shares.Like ApplA,ApplE and AmazA and AmazonB

df['Volume Order'] = df['Volume'].rank(ascending=False)
df['Trade Order']  = df['Trade'].rank(ascending=False)
df['Trade Order2']  = df['Trade'].rank(ascending=True)
df['Max Ordered Number']  = df[['Volume Order', 'Trade Order']].max(axis=1)
df['Final Sorted Number'] = df[['Max Ordered Number', 'Trade Order2']].apply(tuple, axis=1).rank(ascending=False)
df.drop('Trade Order2', axis=1)

When I try this code, my out

       Code  Volume  Trade  Value  Volume Order  Trade Order  Max Ordered Number  Final Sorted Number
0     ApplA     500   1000   <NA>             2            2                   2                    4
1    Amazon    1000    500   <NA>             1            4                   4                    2
2  Facebook     250    750   <NA>             3            3                   3                    3
  3   ApplE     100   1500   <NA>             4            1                   4                    1

as you can see, 2 same code were in the final sorted range, how could I delete the second one. I mean that, APPLA and APPLE same code(starting with APPL). APPLA has "4 final sorted number." APPLE has "1 final sorted number". Since, they are same code(APPL), I want to delete from list the smallest one who has the "4 final sorted number".

I want to delete from my list who has the smallest final sorted number , if they are in the same codes(APPL)

**df_samecompanies['Codes'] #This DF consist of same companies and their different shares.Like ApplA,ApplE and AmazA and AmazonB

I have another dataframe which consist of same companies.(like amazA,amazonB and apple,appla). This df include only 'E' , 'D', 'A', 'B' as a last characters.


if same companies DF in DF(main df): 
   if last character == 'D' or 'A': 

 check final sorted number and who has smallest final sort number, delete another code from list**

df_samecompanies =

Codes Code Shares
ApplA Apple A
Amazon Amazon Empty
Facebook Facebook Empty
ApplE Apple E
AmazA Amazon A
AmazonB Amazon B

CodePudding user response:

Use df_companies dataframe to group same rows and keep only the best ranking for each group.

Input data:

>>> df
       Code  Volume  Trade  Volume Order  Trade Order  Max Ordered Number  Final Sorted Number
0     ApplA     500   1000           2.0          2.0                 2.0                  4.0
1    Amazon    1000    500           1.0          4.0                 4.0                  2.0
2  Facebook     250    750           3.0          3.0                 3.0                  3.0
3     ApplE     100   1500           4.0          1.0                 4.0                  1.0

>>> df_companies
      Codes      Code Shares
0     ApplA     Apple      A
1    Amazon    Amazon  Empty
2  Facebook  Facebook  Empty
3     ApplE     Apple      E
4     AmazA    Amazon      A
5   AmazonB    Amazon      B
out = df.sort_values('Final Sorted Number') \
        .merge(df_companies[['Code', 'Codes']], how='left', 
               left_on='Code', right_on='Codes', suffixes=('', '2')) \
        .drop_duplicates('Code2') \
        .drop(columns=['Code2', 'Codes'])

Output result:

>>> out
       Code  Volume  Trade  Volume Order  Trade Order  Max Ordered Number  Final Sorted Number
0     ApplE     100   1500           4.0          1.0                 4.0                  1.0
1    Amazon    1000    500           1.0          4.0                 4.0                  2.0
2  Facebook     250    750           3.0          3.0                 3.0                  3.0

Step by step:

# Reduce number of columns for readability
>>> df = df[['Code', 'Final Sorted Number']]
       Code  Final Sorted Number
0     ApplA                  4.0
1    Amazon                  2.0
2  Facebook                  3.0
3     ApplE                  1.0


# Sort rows by 'Final Sorted Number'
>>> df = df.sort_values('Final Sorted Number')
       Code  Final Sorted Number
3     ApplE                  1.0
1    Amazon                  2.0
2  Facebook                  3.0
0     ApplA                  4.0

# Merge dataframes on a common key: 'Code' for left, 'Codes' for right
>>> df = df.merge(df_companies[['Code', 'Codes']], how='left', 
                  left_on='Code', right_on='Codes', suffixes=('', '2'))
       Code  Final Sorted Number     Code2     Codes
0     ApplE                  1.0     Apple     ApplE
1    Amazon                  2.0    Amazon    Amazon
2  Facebook                  3.0  Facebook  Facebook
3     ApplA                  4.0     Apple     ApplA

Now, we have 2 new columns from df_companies: Code (renamed into Code2 by suffix) and Codes. Let's continue:

# Keep only the first row for each 'Code2' name
>>> df = df.drop_duplicates('Code2')
       Code  Final Sorted Number     Code2     Codes
0     ApplE                  1.0     Apple     ApplE
1    Amazon                  2.0    Amazon    Amazon
2  Facebook                  3.0  Facebook  Facebook

# Remove added columns from the other dataframe
>>> df = df.drop(columns=['Code2', 'Codes'])
       Code  Final Sorted Number
0     ApplE                  1.0
1    Amazon                  2.0
2  Facebook                  3.0
  • Related