I have a very large pandas dataframe, with the columns being users and the rows being yes/no questions about the user. So every cell in the dataframe contains "yes" or "no". I only want to see the first 3 "no"s in each column. I already replaced every "yes" in the dataframe with an empty string "". How can I keep the first 3 "no"s in every column(for every user) and replace the rest with an empty string. I thought I could use the limit parameter in df.replace() to do so but I haven't found any good explanation for what it does and experimenting with it myself hasn't helped. Thanks in advance for any help. My first time posting on Stack overflow so apologies in advance for any mistakes I made while asking this question.
Intial:
User 1 | User 2 | User 3 |
---|---|---|
no | no | |
no | no | |
no | no | |
no | no | |
no | no | |
no | ||
no | no |
Expected Output:
User 1 | User 2 | User 3 |
---|---|---|
no | no | |
no | no | |
no | no | |
no | ||
no | ||
CodePudding user response:
Use cumsum
:
df = pd.DataFrame({'User1': ['', 'no', 'no', 'no', 'no'],
'User2': ['no', 'no', 'no', 'no', '']})
df[(df == 'no').cumsum() > 3] = ''
CodePudding user response:
Just an addition to @psarka 's answer, does your data-frame contain values other than "No" ? As @Psarka 's answer would not remove values other than "No".