Home > database >  Partial string filter pandas
Partial string filter pandas

Time:11-09

On Pandas 1.3.4 and Python 3.9.

So I'm having issues filtering for a partial piece of the string. The "Date" column is listed in the format of MM/DD/YYYY HH:MM:SS A/PM where the most recent one is on top. If the date is single digit (example: November 3rd), it does not have the 0 such that it is 11/3 instead of 11/03. Basically I'm looking to go look at column named "Date" and have python read parts of the string to filter for only today.

This is what the original csv looks like. This is what I want to do to the file. Basically looking for a specific date but not any time of that date and implement the =RIGHT() formula. However this is what I end up with with the following code.

from datetime import date
import pandas as pd


df = pd.read_csv(r'file.csv', dtype=str)

today = date.today()
d1 = today.strftime("%m/%#d/%Y")  # to find out what today is

df = pd.DataFrame(df, columns=['New Phone', 'Phone number', 'Date'])
df['New Phone'] = df['Phone number'].str[-10:]

df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)

df_today.to_csv(r'file.csv', index=False)

CodePudding user response:

This line is wrong:

df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)

All you're doing there is creating a mask; essentially what that is is just a Pandas series, containg True or False in each row, according to the condition you created the mask in. The spreadsheet get's only FALSE as you showed because non of the items in the Date contain the string that the variable d1 holds...

Instead, try this:

from datetime import date
import pandas as pd

# Load the CSV file, and change around the columns
df = pd.DataFrame(pd.read_csv(r'file.csv', dtype=str), columns=['New Phone', 'Phone number', 'Date'])

# Take the last ten chars of each phone number
df['New Phone'] = df['Phone number'].str[-10:]

# Convert each date string to a pd.Timestamp, removing the time
df['Date'] = pd.to_datetime(df['Date'].str.split(r'\s ', n=1).str[0])

# Get the phone numbers that are from today
df_today = df[df['Date'] == date.today().strftime('%m/%d/%Y')]

# Write the result to the CSV file
df_today.to_csv(r'file.csv', index=False)
  • Related