Home > OS >  How to loop through pandas dataframe for same values and extract data from other columns?
How to loop through pandas dataframe for same values and extract data from other columns?

Time:09-20

I have a dataframe like this:

enter image description here

data = [['a', 10, '8/5/2021'], ['a', 15, '8/12/2021'], ['a', 8, '8/18/2021'], ['a', 5, '8/23/2021'], ['b', 3, '8/18/2021'], ['b', 10, '8/23/2021'], ['c', 8, '7/30/2021'], ['c', 12, '8/5/2021'], ['c', 3, '8/12/2021']]
df = pd.DataFrame(data, columns=['Name', 'Hours Slept', 'Date'])

What I'm trying to do is create a new dataframe that shows the name, # of times the person slept 10 hours or more, and the dates that correspond to those times. Something like this:

enter image description here

The code should be able to account/ignore NaN and the symbol '*' in the hours slept column.

I know I can do something like this to figure out the # of times the person slept 10 hours or more:

counter = 0    
for i in df['Hours Slept']:
   if i >= 10:
       counter =1
print(counter)

But I can't seem to figure out how to do it in combination with the duplicate Names and how to extract the date.

I cannot select specific Names like using df.loc["a"] because I'll like something that can iterate through all the names (my code has more than a, b, c).

Thank you in advance!

CodePudding user response:

You can use groupby and use aggregation functions:

(
    df
    .query("`Hours Slept` >= 10")
    .groupby("Name", as_index=False)
    .agg({"Hours Slept": "size", "Date": ", ".join})
)
  Name  Hours Slept                 Date
0    a            2  8/5/2021, 8/12/2021
1    b            1            8/23/2021
2    c            1             8/5/2021

CodePudding user response:

another approach:

df['over10h'] = np.where(df['Hours Slept'].ge(10), 1, 0)

df
    Name    Hours Slept Date        over10h
0   a       10          8/5/2021    1
1   a       15          8/12/2021   1
2   a       8           8/18/2021   0
3   a       5           8/23/2021   0
4   b       3           8/18/2021   0
5   b       10          8/23/2021   1
6   c       8           7/30/2021   0
7   c       12          8/5/2021    1
8   c       3           8/12/2021   0

df1 = df.loc[df['over10h'].ge(1)].groupby(['Name', 'over10h'])['Date'].apply(list).reset_index(name='Dates')

df1["# of times >=10"] = df1['Dates'].map(len)

df1 = df1.drop(columns='over10h')

df1
    Name    Dates                   # of times >=10
0   a       [8/5/2021, 8/12/2021]   2
1   b       [8/23/2021]             1
2   c       [8/5/2021]              1
  • Related