Home > Software engineering >  Delete specific strings from pandas dataframe with operators chaining
Delete specific strings from pandas dataframe with operators chaining

Time:10-13

I want to delete specific strings with regular expressions from the column Sorte which I don't want to have in my dataframe file_df with the following code:

file_df = file_df[(file_df.Sorte != 'sonstige') & (file_df.Sorte != 'verauslagte Portokosten')
                  & (file_df.Sorte != 'erhaltenenzahlung Re  vom')
                  & (file_df.Sorte != 'geleistetenzahlung aus Re-Nr')
                  & (file_df.Sorte != '^.*Holzkisten geliefert.*$')
                  & (file_df.Sorte != '^.*Infomaterialktionspakete.*$')
                  & (file_df.Sorte != '^.*Aloe Vera  haben wir nicht im Sortiment.*$') 
                  & (file_df.Sorte != '^.*Anzeigenvorlage Planten ut`norden.*$')]

But somehow when I execute this code these strings still are in the dataset and I can not figure out why. I wanted to chain this expression to not create so many copies.

CodePudding user response:

Maybe something like this:

ls = ['sonstige', 'verauslagte Portokosten', 'erhaltenenzahlung Re  vom', ...]
file_df = file_df[~ file_df.country.str.contains('|'.join(ls))]

CodePudding user response:

Thank you for your answers! I also saw, that the code I posted in the question somehow worked for some strings in the dataset, for others not...

But I figured out another solution that worked out for me derived from the answer at https://stackoverflow.com/a/54410702/14553595, which is basically a combination of your suggestions:

file_df = file_df.loc[:,~(file_df.columns.str.contains('^.*Fracht.*$', case=False)
                          | file_df.columns.str.contains('^.*Angebotspaket.*$', case=False)
                          | file_df.columns.str.contains('^.*Werbe.*$', case=False)
                          | file_df.columns.str.contains('^.*Vita.Verde.*$', case=False)
                          | file_df.columns.str.contains('^.*zahlung.*$', case=False)
                          | file_df.columns.str.contains('^.*Europalette.*$', case=False)
                          | file_df.columns.str.contains('^.*Angebotspaket.*$', case=False)
                          | file_df.columns.str.contains('^.*Aufkleber fuer Saeule.*$', case=False)
                          | file_df.columns.str.contains('^.*Aufsetzer.*$', case=False)
                          | file_df.columns.str.contains('^.*Ausstellen der Pflanzen in die Beete  pauschal.*$', case=False)
                          | file_df.columns.str.contains('^.*Ausstellungsflaeche.*$', case=False)
                          | file_df.columns.str.contains('^.*Auswaschen.*$', case=False)
                          | file_df.columns.str.contains('^.*Bild.*$', case=False)
                          | file_df.columns.str.contains('^.*etikette.*$', case=False)
                          )]
  • Related