I have a dataframe with two columns, Date and Name. I want to search for a specific name and get the different Dates on which that name occurs. Then, I only want to keep those rows that have those dates. This is my data:
Meeting Dates | Name |
---|---|
1746-06-02 | Sa Majesté (#p32) |
1746-06-02 | Maupertuis (#p4) |
1746-06-02 | Jordan (#p31) |
1746-06-09 | Maupertuis (#p4) |
1746-06-09 | Voltaire (#p37) |
1746-06-09 | de la Condamine (#p38) |
1746-06-09 | Süssmilch (#p16) |
1746-06-09 | Sa Majesté (#p32) |
1746-06-09 | Formey (#p27) |
1746-06-16 | Marggraf (#p20) |
1746-06-23 | Dohna (#p39) |
1746-06-23 | Euler (#p10) |
I have used the following code to find all occurrences of my name:
df["Name"].value_counts()["Sa Majesté (#p32)"]
I know what to store the corresponding dates but I am unsure how. Once I have those dates, I plan to store the values and filter through my original dataframe and keep only those rows with those dates.
I would really appreciate any help/pointers with this.
CodePudding user response:
IIUC, is this what you're looking for? a function that takes a name and return the filtered dataset?
def find_name(name):
return df.loc[df['Name'].eq(name)]
find_name('Sa Majesté (#p32)')
OR
def find_name(name):
return df.loc[df['Name'].str.contains(name, regex=False)]
find_name('(#p32)')
Meeting Dates Name
0 1746-06-02 Sa Majesté (#p32)
7 1746-06-09 Sa Majesté (#p32)
OR
if you just need the list of dates
def find_name(name):
return df.loc[df['Name'].eq(name)]['Meeting Dates'].to_list()
find_name('Sa Majesté (#p32)')
['1746-06-02 ', '1746-06-09 ']
CodePudding user response:
You can filter on the string value, then get the unique for Meeting Dates column.
>>> df[df['Name'].eq("Sa Majesté (#p32)")]['Meeting Dates'].unique()
array(['1746-06-02', '1746-06-09'], dtype=object)
And if you want to do it for all names, then you can use groupby
and get the unique dates:
>>> df.groupby('Name')['Meeting Dates'].agg('unique')
Name
Dohna (#p39) [1746-06-23]
Euler (#p10) [1746-06-23]
Formey (#p27) [1746-06-09]
Jordan (#p31) [1746-06-02]
Marggraf (#p20) [1746-06-16]
Maupertuis (#p4) [1746-06-02, 1746-06-09]
Sa Majesté (#p32) [1746-06-02, 1746-06-09]
Süssmilch (#p16) [1746-06-09]
Voltaire (#p37) [1746-06-09]
de la Condamine (#p38) [1746-06-09]
Name: Meeting Dates, dtype: object
CodePudding user response:
My provided solution is not based on your specific dataset, however is valid for your problem. My hope is you are able to see the logic and implement in your own use case.
id | name |
---|---|
1 | .. |
2 | .. |
3 | .. |
According to this general dataset, we are trying to find all rows
with a certain value
under the columns name "name"
.. then we take the id of those rows and then find ALL rows with those values.
# IMPORT
import pandas as pd
# DEFINE DATAFRAME
df = pd.read_csv("MOCK_DATA.csv")
# FIND ALL ROWS THAT THE SECOND COLUMNS HAS NAME "Two-toed tree sloth"
tempOne = df[(df == "Two-toed tree sloth").any(axis=1)]
# SUM UP ALL OF THE ID'S THAT THIS IS TRUE
tempTwo = df[(df == "Two-toed tree sloth").any(axis=1)]["id"].to_list()
# OUTPUT ALL ROWS WITH THESE LIST ITEMS
df.loc[df["id"].isin(tempTwo)]