Home > Blockchain >  Removing rows from a Data Frame column which contains lists if a specific string is within the list
Removing rows from a Data Frame column which contains lists if a specific string is within the list

Time:09-07

Suppose I have a DataFrame pd with a column called 'elements' which contains a list of a list of objects as shown below:

print(df2['elements'])

0       [Element B, Element Cr, Element Re]
1       [Element B, Element Rh, Element Sc]
2        [Element B, Element Mo, Element Y]
3       [Element Al, Element B, Element Lu]
4       [Element B, Element Dy, Element Os]

I would like to search through the column and if, for example, Element Mo is in that row delete the whole row to look like this:

print(df2['elements'])

0       [Element B, Element Cr, Element Re]
1       [Element B, Element Rh, Element Sc]
2       [Element Al, Element B, Element Lu]
3       [Element B, Element Dy, Element Os]

I'm currently trying to do it with a for loop and if statements like this:

for entry in df2['elements']:
    if 'Element Mo' in entry:
        df2.drop(index=[entry],axis=0, inplace=True)
    else:
        continue

But it is not working and giving me a KeyError: [] not found in axis.

CodePudding user response:

here is one way to do it

string='Element Mo'

df[df['col1'].apply(lambda x: string not in x)]
col1
0   [Element B, Element Cr, Element Re]
1   [Element B, Element Rh, Element Sc]
3   [Element Al, Element B, Element Lu]
4   [Element B, Element Dy, Element Os]

CodePudding user response:

A pandas Series is sort of like a dictionary, where the keys are the index and the values are the series values.

So, entry isn't in the index. You could loop over the index, use the index to reference the values, e.g.:

for ind in df2.index.values:
    entry = df2.loc[ind, "elements"]
    if 'Element Mo' in entry:
        df2.drop(index=ind, axis=0, inplace=True)

However, it would be far better to use a vectorized solution. This isn't really possible with a series of lists (this really breaks the pandas data model), but you could at least subset your series once instead of iteratively reshaping. For example:

in_values = df2["elements"].apply(lambda x: "Element Mo" in x)
dropped = df2.loc[~in_values]

CodePudding user response:

Here's an alternative option using apply (below, any row that contains 2 is removed):

df = pd.DataFrame(
    [[-1,0,1],
     [1,2,3],
     [4,5,2],
     [6,7,8]])

ix = ~df.apply(lambda x: 2 in x.values, axis=1)
df[ix]

returns:

     0  1   2
0   -1  0   1
3   6   7   8
  • Related