Home > other >  Dropping Rows that Contain a Specific String wrapped in square brackets?
Dropping Rows that Contain a Specific String wrapped in square brackets?

Time:01-31

I'm trying to drop rows which contain strings that are wrapped in a column. I want to drop all values that contain the strings '[removed]', '[deleted]'. My df looks like this:

  Comments

1 The main thing is the price appreciation of the token (this determines the gains or losses more 
  than anything). Followed by the ecosystem for the liquid staking asset, the more opportunities 
  and protocols that accept the asset as collateral, the better. Finally, the yield for staking 
  comes into play.

2 [deleted]

3 [removed]

4 I could be totally wrong, but sounds like destroying an asset and claiming a loss, which I 
  believe is fraudulent. Like someone else said, get a tax guy - for this year anyway and then 
  you'll know for sure. Peace of mind has value too.

I have tried df[df["Comments"].str.contains("removed")==False] But when i try to save the dataframe, it is still not removed.

EDIT: My full code

import pandas as pd
sol2020 = pd.read_csv("Solana_2020_Comments_Time_Adjusted.csv")
sol2021 = pd.read_csv("Solana_2021_Comments_Time_Adjusted.csv")
df = pd.concat([sol2021, sol2020], ignore_index=True, sort=False)
df[df["Comments"].str.contains("deleted")==False]
df[df["Comments"].str.contains("removed")==False]

CodePudding user response:

Try this

I have created a data frame for comments column and used my own comments but it should work for you

import pandas as pd

sample_data = { 'Comments': ['first comment whatever','[deleted]','[removed]','last comments whatever']}

df = pd.DataFrame(sample_data)

data = df[df["Comments"].str.contains("deleted|removed")==False]

print(data)

output I got

 Comments
0  first comment whatever
3  last comments whatever

CodePudding user response:

You can do it like this:

new_df = df[~(df['Comments'].str.startswith('[') & df['Comments'].str.endswith(']'))].reset_index(drop=True)

Output:

>>> new_df
                                            Comments
0  The main thing is the price appreciation of th...
3  I could be totally wrong, but sounds like dest...

That will remove all rows where the value of the Comments column for that row starts with [ and ends with ].

  •  Tags:  
  • Related