a DataFrame name is df_y I want to find some data('K','m') on many columns so i made this code
df_y[df_y[['column1', 'column2', 'column3']].str.contains('K|M')]
then I could see a error "'DataFrame' object has no attribute 'str'"
i think the code has problem by containg many columns...
IDK how to make that correctly..!
CodePudding user response:
You can try df.apply
df_y[df_y[['column1', 'column2', 'column3']].apply(lambda x:x.str.contains('K|M'))]
CodePudding user response:
for indexing and selecting data use '&' and '|' operators. https://pandas.pydata.org/docs/user_guide/indexing.html
df_y[df_y['column1'].str.contains('K|M') & df_y['column2'].str.contains('K|M') & df_y['column3'].str.contains('K|M')]
CodePudding user response:
You have an inconveniently large number of columns and want to find where a rare string appears in any of those columns. Ok. A pair of text processing solutions come to mind.
1. CSV file
Serialize out to the filesystem with df.to_csv('y.csv') and then
$ egrep -n --color 'K\|M' y.csv
2. str()
Perhaps you prefer to use that approach while remaining entirely within python.
There are good reasons for people criticizing the
slow speed of non-vectorized operations like .iterrows()
.
But if you want a quick'n'dirty solution,
this should suffice:
for i in range(len(df)):
row = str(df.iloc[i])
if 'K|M' in row:
print(i)
print(df.iloc[i])
CodePudding user response:
You can try the following code:
temp = df.copy()
num_of_columns = 2
temp.iloc[:, 1:3] = temp.iloc[:, 1:3].apply(lambda x: x.str.contains('K|M'))
index = temp[temp.iloc[:, 1:3].eq([True] * num_of_columns).all(1)].index.to_numpy()
df.iloc[index]
- Replace num_of_columns with number of columns you want to perform the operation on
- replace 1:3 inside the temp.iloc with the columns you want to work on