df:
first last email
0 Corey Schafer [email protected]
1 Jane Doe [email protected]
2 John Doe [email protected]
From a big CSV file, how can I find a specific word like John, without knowing on what column or row he is? If there are several names with John, can I get all the info in the row or column where the names are?
CodePudding user response:
That's the way to do i believe.
import pandas as pd
df = pd.read_csv('data.csv')
df[df['first'].str.contains('John')] # returns all rows where John in the column 'first'
df[df['first'].str.contains('John')].index.tolist() # get the index of the rows
The contains
method is case sensitive, to make it case insensitive you can do something like that:
df["first"].str.contains("John", case=False)
To find in a header column (like the first row)
df.columns.get_loc("first") # Output : 0 (the column index)
To find in a specific column
df["first"].str.contains("John").any() # Output : True
To find in a specific row
df.loc[0].str.contains("John").any() # Output : True
If you want to get only row index
df[df["first"] == "John"].index[0]
CodePudding user response:
To search an entire DataFrame for a given value without knowing which column that value will be in, we can use .applymap()
in conjunction with .any(axis=1)
:
search_str = 'John'
search = df[df.applymap(lambda x: x == search_str).any(axis=1)]
search
will be a view of your original df
with only the rows where any column value is 'John'
.