Home > Enterprise >  Filter pandas dataframe using a single character of a string column
Filter pandas dataframe using a single character of a string column

Time:07-02

In the following dataframe, "day" is a string column for a 7-character binary code to specify whether or not an event occurs on a particular day. The first character indicates whether or not the event occurs on Monday, and final character indicates whether the event occurs on Sunday.

For example:

    event  day
 0  A      1000010
 1  B      1010100
 2  C      0100010
 3  D      0000011

Event A occurs on Monday and Saturday, event B occurs on Monday, Wednesday and Friday, and event D occurs on Saturday and Sunday.

Question: How can I filter a dataframe using a specific character of the "day" column? For example, if I want to show all rows for events on Saturday, something like day[5]=="1" should output rows 2 and 3 (containing events "C" and "D").

I've tried various combinations such as df.loc[(df['day'][5]=="1")] based on other examples but they don't work for filtering by a single character of a string.

(I know it's unconventional but the system has served me well using Bash scripts with Awk; just trying to develop it further in Python with Pandas).

CodePudding user response:

As you have strings, you can use slicing and comparison to '1':

day = 0
df[df['day'].str[day].eq('1')]    # if Monday = 0
# or
day = 1
df[df['day'].str[day-1].eq('1')]  # if Monday = 1

output:

  event      day
0     A  1000010
1     B  1010100

CodePudding user response:

You can make the string to dataframe each column for one week day

s = df.day.apply(lambda x : pd.Series(list(x)))


df[s[0]=='1']

CodePudding user response:

You can use this.for check 1 in 2nd index.

index = 1
df.loc[(df['day'].str[index]=="1")

output is

  event      day
2     C  0100010

CodePudding user response:

you could create a column for each day:

import pandas as pd

df = {'event': ['A','B','C','D'], 'day': ['1000010','1010100','0100010','0000011']}
df = pd.DataFrame(data=df)
df

df['Mon'] = df['day'].astype(str).str[0]
df['Tue'] = df['day'].astype(str).str[1]
df['Wed'] = df['day'].astype(str).str[2]
df['Thu'] = df['day'].astype(str).str[3]
df['Fri'] = df['day'].astype(str).str[4]
df['Sat'] = df['day'].astype(str).str[5]
df['Sun'] = df['day'].astype(str).str[6]

print(df)
  • Related