How to append a dataframe row to a list if a specific value is found?-CodePudding

I am attempting to add all rows that contain a specific DateTime value to a list and then print said list. I iterate through all rows in a dataframe, looking for a specific value within a column. If the value appears, I would like to add that specific row to a list.

Code:

with open('Layer1.csv', newline = '') as csvfile2:
        df = pd.read_csv('Layer1.csv')

        AudioMothIDs = getID()
        AudioMothIDs.remove('NA')

        csv_reader = csv.reader(csvfile2)
        
        for row in csv_reader:
            orig_list = []
            #Iterates through each unique ID
            for x in AudioMothIDs:
                ID_df = df[df['AudioMothID'] == x]
          
                #Iterates through all rows in the ID dataframe
                for index, rows in ID_df.iterrows():
                    #Searches for a specific DateTime within the StartDateTime column
                    if '16.06.2019 15:00' in ID_df.StartDateTime.values:
                        #Attempts to add rows with the specific DateTime to a list
                        current_list = [rows.AudioMothID,rows.StartDateTime]
                        orig_list.append(current_list)
            print(orig_list)

Appending rows.AudioMothID and rows.StartDatetime appends all rows within the ID to the list instead of just the rows with '16.06.2019 15:00' in the StartDateTime column. I have tried using ID_df.iloc[index] as well which similarly adds all rows instead of only the rows that contain the specified string.

How would I only append the rows to a list that contain '16.06.2019 15:00' in the StartDateTime column?

CodePudding user response：

Your code could be condensed to be more pandas-idiomatic. Maybe something like:

# Load CSV into a pandas DataFrame, no need for csv.reader or with open()
df = pd.read_csv('Layer1.csv')

# Copy all rows with the desired StartDateTime as a new DataFrame
res = df[df['StartDateTime'] == '16.06.2019 15:00'].copy()

print(res)

CodePudding user response：

I think you're making quite a few unnecessary steps, e.g. reading the csv file twice, iterating over each row and then over each ID..

Let's simply use pandas for this:

# read dataframe using pandas
df = pd.read_csv('Layer1.csv')

# filter for date, select specific columns and convert to list
df[df.StartDateTime == '16.06.2019 15:00'][[df.AudioMothID, df.StartDateTime]].values.tolist()

[EDIT] adding on to address your comments:

How to select values within a certain hour

# ensure date column is in the right format
df['StartDateTime'] = pd.to_datetime(df['StartDateTime'])

# filter by hour of date
df[df.StartDateTime.dt.hour == 15]

How to randomly sample for each hour and each ID

# ensure date column is in the right format
df['StartDateTime'] = pd.to_datetime(df['StartDateTime'])

# round date column to nearest hour
df['StartDateTime_nearest_hour'] = df['StartDateTime'].dt.round('H')

# randomly sample 1 by each hour of each day and ID
df.groupby(['AudioMothID', 'StartDateTime_nearest_hour'].sample(n=1)

# if you want to sample an hour of a random day instead:
df['StartDateTime_hour'] = df['StartDateTime'].dt.hour
df.groupby(['AudioMothID', 'StartDateTime_hour'].sample(n=1)