Is there any solution to remove the 0th value from multiple values of a column separated by comma-CodePudding

I want to remove the word DISCONTINUED from a column using a loop and train the model for further use. The code that I have tried:

# 1.

for i in  df['DESCRIPTION'][0]:
       if i[0]== 'DISCONTINUED':
           df.i[0].pop(0)

# 2.

for item in df['DESCRIPTION']:
       if str(item)[0]=='DISCONTINUED':
           df.remove(item[0])

Note: df is the dataset name and DESCRIPTION is the column name. The column has the dtype object. I have tried to convert that into str, but it didn't work.

The values in the column DESCRIPTION are:

data = {'DESCRIPTION': 
        ['ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038',
         'COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753',
         'DISCONTINUED, COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES',
         'COMMSCOPE, 252110, LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE',
         'ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP']}

I want to remove DISCONTINUED from a column having multiple values in it separated by a comma.

CodePudding user response：

Your initial approaches fail, in part because you are treating the values in df.DESCRIPTION as if they are lists that contain strings, rather than simply strings. E.g.:

print(type(df['DESCRIPTION'][0]))
<class 'str'>

You can use Series.replace in this case:

Data

import pandas as pd

# just adding some extra `DISCONTINUED` in `3, 4`
data = {'DESCRIPTION': ['ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038',
                        'COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753',
                        'DISCONTINUED, COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES',
                        'COMMSCOPE, 252110, DISCONTINUED, LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE',
                        'ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP, DISCONTINUED']}
df = pd.DataFrame(data)

print(df)

                                                                                 DESCRIPTION
0                                         ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038
1                             COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753
2               DISCONTINUED, COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES
3  COMMSCOPE, 252110, DISCONTINUED, LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE
4                                 ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP, DISCONTINUED

# So, we have `DISCONTINUED` in `2` (at start), in `3` (third "elem"), and `4` (at end).

Code

df.DESCRIPTION = df.DESCRIPTION.replace(r',?\s?DISCONTINUED,?\s?','', regex=True)

print(df)
                                                                 DESCRIPTION
0                         ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038
1             COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753
2             COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES
3  COMMSCOPE, 252110LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE
4                               ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP

Explanation regex pattern ,?\s?DISCONTINUED,?\s?:

? matchees the previous token between zero and one times. E.g. we use it for potential commas (,) and spaces (\s) before and/or after DISCONTINUED.

Alternative method could be:

df.DESCRIPTION = df.DESCRIPTION.apply(lambda row: ', '.join(
    [c for c in row.split(', ') if c != 'DISCONTINUED']))

CodePudding user response：

If your data set is a matrix where the first column could contain the word discontinued then it should probably look something like:

for row in df['DESCRIPTION']:
    if row[0] == 'DISCONTINUED'
        df.remove(row[0])

The issues with the code you have tried are that #1 only iterates the first row of df and #2 is only checking the first character of the string created by str(item) so it will never equal Discontinued. Changing str(item)[0] to str(item[0]) could also work, but if it's already a string there's no reason to 'reconvert' it to a string.