Home > Blockchain >  Check if elements from different lists are in df column and append to another column
Check if elements from different lists are in df column and append to another column

Time:03-20

I have a df like this:

Casa Name Clase_jfs Categoria
Just_For_Sports mochila reebok active ACCESORIOS mochila
Just_For_Sports tubo lejopi de pelotas softee ACCESORIOS tubo
Just_For_Sports pack de medias puma x2 ACCESORIOS pack
Just_For_Sports gorro adidas de natación 3 rayas ACCESORIOS natacion

And 27 different Lists like these:

MODA=['mochila','wear', 'urban', 'pack']
TENIS=['tubo', 'raqueta','red']
NATACION=['natacion', 'pileta','tapon']

on the other hand I have an empty list:

intermedia1=[]

this is my current script:

for element in df_JFS['Categoria']:       
    if  element in VOLEY:
        intermedia1.append('VOLEY')
    elif element in UNIFORMES:
        intermedia1.append('UNIFORMES')
    elif element in TREKKING_OUTDOOR_ADVENTURE:
        intermedia1.append('TREKKING_OUTDOOR_ADVENTURE')        
    elif element in TRAINING:
        intermedia1.append('TRAINING')        
    elif element in TENIS:
        intermedia1.append('TENIS')        
    elif element in SURF:
        intermedia1.append('SURF')        
    elif element in SQUASH:
        intermedia1.append('SQUASH')  
    elif element in SKATEBOARD:
        intermedia1.append('SKATEBOARD')                    
    elif element in RUNNING:
        intermedia1.append('RUNNING')        
    elif element in RUGBY:
        intermedia1.append('RUGBY')
    elif element in PING_PONG:
        intermedia1.append('PING_PONG')
    elif element in PESAS:
        intermedia1.append('PESAS')
    elif element in PADDLE:
        intermedia1.append('PADDLE')
    elif element in NATACION:
        intermedia1.append('NATACION')
    elif element in MODA:
        intermedia1.append('MODA')
    elif element in INFANTIL:
        intermedia1.append('INFANTIL')
    elif element in HOCKEY:
        intermedia1.append('HOCKEY')
    elif element in HANDBALL:
        intermedia1.append('HANDBALL')
    elif element in GOLF:
        intermedia1.append('GOLF')
    elif element in FUTBOL:
        intermedia1.append('FUTBOL')
    elif element in FRONTON:
        intermedia1.append('FRONTON')
    elif element in CICLISMO:
        intermedia1.append('CICLISMO')
    elif element in BASQUET:
        intermedia1.append('BASQUET')
    elif element in BASICOS:
        intermedia1.append('BASICOS')
    elif element in BASEBALL_SOFTBALL:
        intermedia1.append('BASEBALL_SOFTBALL')
    elif element in ARTES_MARCIALES_Y_BOX:
        intermedia1.append('ARTES_MARCIALES_Y_BOX')
    elif element in AEROBICS_Y_FITNESS:
        intermedia1.append('AEROBICS_Y_FITNESS')
    else:
        intermedia1.append('OTROS')
        
df_JFS['Categoria']=intermedia1

How can it be done efficiently?

output should look like this:

Casa Name Clase_jfs Categoria
Just_For_Sports mochila reebok active ACCESORIOS MODA
Just_For_Sports tubo lejopi de pelotas softee ACCESORIOS TENIS
Just_For_Sports pack de medias puma x2 ACCESORIOS MODA
Just_For_Sports gorro adidas de natación 3 rayas ACCESORIOS NATACION

df['Categoria'] value, should be the name of the list where the word was found

Thanks!

CodePudding user response:

Not sure about the time efficiency, but if you want to prevent boilerplate coding, you can use apply function along with a few other steps:

import pandas as pd
# Defining the lists of data(rest of the code)
# .
# .
myDict ={'MODA':MODA, "TENIS":TENIS, "NATACION":NATACION} 
def search(valueToSearch):
  for key, valuesList in myDict.items():
    if valueToSearch in  valuesList:
      return key
  return "Not Found"
df["Categoria"] = df["Categoria"].apply(search)
df

Output

Casa Name Clase_jfs Categoria
0 Just_For_Sports mochila reebok active ACCESORIOS MODA
1 Just_For_Sports tubo lejopi de pelotas softee ACCESORIOS TENIS
2 Just_For_Sports pack de medias puma x2 ACCESORIOS MODA
3 Just_For_Sports gorro adidas de natación 3 rayas ACCESORIOS NATACION

Note that, you should define the myDict as shown above. If you have any other list, you should define them in myDict variable in the same way.

CodePudding user response:

There are few approaches I would suggests

Approach 1

The complexity of finding something in a list is O(n). it optimise that you can use a set instead which is O(1).

MODA = set(['mochila', 'wear', 'urban', 'pack'])

Approach 2

If all of the value of all the list is unique, you can create a dict that map values to key. You can just write a loop to map value to key the result should be like below:

{
 'mochila': "MODA",
 'wear': "MODA",
 'urban': "MODA",
 'pack': "MODA",
 ...
}
  • Related