first time poster and python beginner so hopefully asking correctly.
I need to loop through all rows in a dataframe, checking for a string match in one column. If there is a match then I want to insert a date into a new column, if not then use a different date.
I need to iterate through the rows as each time the condition is met I want to advance the date by one day.
For now I'm just trying to make sure the string match is working so I'm just inserting two different strings depending on the match result. The problem is that my condition seem to never be met. I can't see where its going wrong.
My code so far is
for index in df.index:
if df.loc[index, 'time'] == '00:00':
df['date'] = "wooo"
else:
df['date'] = "blah"
The result I'm seeing is in the image below. As you can see it is not picking up the row that matches the condition.
CodePudding user response:
df['date'] = "wooo"
sets the entire column to "wooo"
, which I'm sure is not your intention. You should just set that row's date
column
for index in df.index:
if df.loc[index, 'time'] == '00:00':
df.loc[index, 'date'] = "wooo"
else:
df.loc[index, 'date'] = "blah"
Better yet, use logical indexing and set multiple rows at once instead of iterating over rows and setting each value individually:
df['date'] = 'blah' # Set everything to blah
zero_rows = df['time'] == '00:00' # Find the rows where time is 00:00
df.loc[zero_rows, 'date'] = 'wooo' # Set these rows to wooo
For example, with
df = pd.DataFrame([['00:00', 2, 3], ['00:01', 5, 6], ['00:02', 8, 9], ['00:00', 10, 11]], columns=['time', 'b', 'c'])
we get:
time b c date
0 00:00 2 3 wooo
1 00:01 5 6 blah
2 00:02 8 9 blah
3 00:00 10 11 wooo
Re. your comment:
The issue with the logical indexing is that eventually I’m trying to assign a date which counts forwards at the start of each new day (time = 00:00), so until a 00:00 is found insert todays date in each row. Once found, insert today 1 and continue until found again and then insert today 2 etc. The number of rows until the first 00:00 is found varies depending on when the dataframe is created.
This is easily implemented. Let's start with a longer dataframe:
df = pd.DataFrame([['00:00', 2, 3], ['12:00', 5, 6], ['00:00', 8, 9], ['12:00', 10, 11], ['00:00', 18, 19], ['12:00', 110, 111], ['00:00', 28, 29], ['12:00', 210, 211]], columns=['time', 'b', 'c'])
Let's pull out all the columns with time == '00:00'
.
zero_rows = df['time'] == '00:00'
midnights = df.loc[zero_rows, :]
time b c
0 00:00 2 3
2 00:00 8 9
4 00:00 18 19
6 00:00 28 29
Now, create a new column 'increment'
and fill it with nan
. Then, fill only the midnight values with numbers:
if df.iloc[0]['time'] == '00:00':
start_increment = 0 # First row is midnight, so we start with an increment of zero
else:
start_increment = 1 # First row is not midnight, so the first midnight row has an increment of 1
df['increment'] = pd.NA
df.loc[midnights.index, 'increment'] = range(start_increment, len(midnights) start_increment)
time b c increment
0 00:00 2 3 0
1 12:00 5 6 <NA>
2 00:00 8 9 1
3 12:00 10 11 <NA>
4 00:00 18 19 2
5 12:00 110 111 <NA>
6 00:00 28 29 3
7 12:00 210 211 <NA>
Next, we can use ffill()
to fill values in the increment column that are NA
.
df['increment'].ffill(inplace=True)
time b c increment
0 00:00 2 3 0
1 12:00 5 6 0
2 00:00 8 9 1
3 12:00 10 11 1
4 00:00 18 19 2
5 12:00 110 111 2
6 00:00 28 29 3
7 12:00 210 211 3
Any NA
s that still remain were before the first midnight, so they should be zeros:
df['increment'].fillna(0, inplace=True)
Now, we want to add a column 'date'
which has a value of today
increment
days. To do this, we first convert the increment
column to pd.TimeDelta
:
df['increment'] = pd.to_timedelta(df['increment'], unit='D')
time b c increment
0 00:00 2 3 0 days
1 12:00 5 6 0 days
2 00:00 8 9 1 days
3 12:00 10 11 1 days
4 00:00 18 19 2 days
5 12:00 110 111 2 days
6 00:00 28 29 3 days
7 12:00 210 211 3 days
Finally, add today's date to the increment column:
import datetime
df['date'] = datetime.datetime.today() df['increment']
time b c increment date
0 00:00 2 3 0 days 2022-02-15
1 12:00 5 6 0 days 2022-02-15
2 00:00 8 9 1 days 2022-02-16
3 12:00 10 11 1 days 2022-02-16
4 00:00 18 19 2 days 2022-02-17
5 12:00 110 111 2 days 2022-02-17
6 00:00 28 29 3 days 2022-02-18
7 12:00 210 211 3 days 2022-02-18
CodePudding user response:
You can use apply
with a Pythonic ternary-expression inside a lambda.
Try in the Python shell:
>>> import pandas as pd
>>> df = pd.DataFrame([['00:00', 2, 3], ['00:01', 5, 6], ['00:02', 8, 9], ['00:00', 10, 11]], columns=['time', 'b', 'c'])
>>> df['date'] = df.time.apply(lambda t: 'wooo' if t == '00:00' else 'blah')
>>> df
time b c date
0 00:00 2 3 wooo
1 00:01 5 6 blah
2 00:02 8 9 blah
3 00:00 10 11 wooo