Home > Blockchain >  how to find specific data in many columns?
how to find specific data in many columns?

Time:06-27

a DataFrame name is df_y I want to find some data('K','m') on many columns so i made this code

df_y[df_y[['column1', 'column2', 'column3']].str.contains('K|M')]

then I could see a error "'DataFrame' object has no attribute 'str'"

i think the code has problem by containg many columns...

IDK how to make that correctly..!

CodePudding user response:

You can try df.apply

df_y[df_y[['column1', 'column2', 'column3']].apply(lambda x:x.str.contains('K|M'))]

CodePudding user response:

for indexing and selecting data use '&' and '|' operators. https://pandas.pydata.org/docs/user_guide/indexing.html

df_y[df_y['column1'].str.contains('K|M') & df_y['column2'].str.contains('K|M') & df_y['column3'].str.contains('K|M')]

CodePudding user response:

You have an inconveniently large number of columns and want to find where a rare string appears in any of those columns. Ok. A pair of text processing solutions come to mind.

1. CSV file

Serialize out to the filesystem with df.to_csv('y.csv') and then

$ egrep -n --color 'K\|M' y.csv

2. str()

Perhaps you prefer to use that approach while remaining entirely within python.

There are good reasons for people criticizing the slow speed of non-vectorized operations like .iterrows(). But if you want a quick'n'dirty solution, this should suffice:

for i in range(len(df)):
    row = str(df.iloc[i])
    if 'K|M' in row:
        print(i)
        print(df.iloc[i])

CodePudding user response:

You can try the following code:

temp = df.copy()
num_of_columns = 2
temp.iloc[:, 1:3] = temp.iloc[:, 1:3].apply(lambda x: x.str.contains('K|M'))
index = temp[temp.iloc[:, 1:3].eq([True] * num_of_columns).all(1)].index.to_numpy()
df.iloc[index]
  • Replace num_of_columns with number of columns you want to perform the operation on
  • replace 1:3 inside the temp.iloc with the columns you want to work on
  • Related