I want to replace values from a column in a dataframe to NaN and then replace Nan to the value from the previous row. This is the code that I'm currently using:
import pandas as pd
file = ("C:/Users/eestrada/Desktop/Copy of Routing State Reason.xls")
finalfile = ("C:/Users/eestrada/Desktop/Copy of Routing State Reason.csv")
df = pd.read_excel(file)
date = df.iat[1,1]
date = date
date = date[0:9]
#abb = ['Jan *','Feb *','Mar *','Apr *','May *','Jun *','Jul *','Aug *','Sep *','Oct *','Nov *','Dec *']
df = pd.read_csv(finalfile)
df= df.drop([0,1,2,3,4,5,6,7])
df.columns = ['Name','UserID','Reason','PresenceState','Total_Time']
And this is how my data looks like:
Name UserID Reason PresenceState Total_Time
8 Aguilar, Bertrand 2653 NaN NaN 00:41:10
9 Jun 15 NaN NaN NaN 00:41:10
10 NaN NaN Unavailable - Break Away 00:09:03
11 NaN NaN Unavailable - Lunch Away 00:32:07
12 NaN NaN Work - Outbound Busy 00:00:00
13 Aguilar, Sarah 2546 NaN NaN 00:51:38
14 Jun 15 NaN NaN NaN 00:51:38
15 NaN NaN Unavailable - Break Away 00:21:38
16 NaN NaN Unavailable - Lunch Away 00:30:00
17 Alcala, Jose 2584 NaN NaN 02:28:09
18 Jun 15 NaN NaN NaN 02:28:09
19 NaN NaN Unavailable - Service Out Away 00:00:15
20 NaN NaN Work - IT Issues Busy 02:26:29
21 NaN NaN Work - Outbound Busy 00:01:25
22 Alonso, Eduardo 2648 NaN NaN 00:55:32
23 Jun 15 NaN NaN NaN 00:55:32
24 NaN NaN Unavailable - Break Away 00:23:20
25 NaN NaN Unavailable - Lunch Away 00:32:10
26 NaN NaN Work - 1on1Sessions Busy 00:00:02
27 Andrade, Guillermo 2526 NaN NaN 01:00:49
Now this is the desired output:
Hope you all can help me :c
CodePudding user response:
IIUC, one way using ffill
with groupby.transform
:
df["UserID"] = df["UserID"].ffill()
df["Name"] = df.groupby("UserID")["Name"].transform("first")
Output:
Name UserID Reason PresenceState \
8 Aguilar, Bertrand 2653.0 NaN NaN
9 Aguilar, Bertrand 2653.0 NaN NaN
10 Aguilar, Bertrand 2653.0 Unavailable - Break Away
11 Aguilar, Bertrand 2653.0 Unavailable - Lunch Away
12 Aguilar, Bertrand 2653.0 Work - Outbound Busy
13 Aguilar, Sarah 2546.0 NaN NaN
14 Aguilar, Sarah 2546.0 NaN NaN
15 Aguilar, Sarah 2546.0 Unavailable - Break Away
16 Aguilar, Sarah 2546.0 Unavailable - Lunch Away
17 Alcala, Jose 2584.0 NaN NaN
18 Alcala, Jose 2584.0 NaN NaN
19 Alcala, Jose 2584.0 Unavailable - Service Out Away
20 Alcala, Jose 2584.0 Work - IT Issues Busy
21 Alcala, Jose 2584.0 Work - Outbound Busy
22 Alonso, Eduardo 2648.0 NaN NaN
23 Alonso, Eduardo 2648.0 NaN NaN
24 Alonso, Eduardo 2648.0 Unavailable - Break Away
25 Alonso, Eduardo 2648.0 Unavailable - Lunch Away
26 Alonso, Eduardo 2648.0 Work - 1on1Sessions Busy
27 Andrade, Guillermo 2526.0 NaN NaN
Total_Time
8 00:41:10
9 00:41:10
10 00:09:03
11 00:32:07
12 00:00:00
13 00:51:38
14 00:51:38
15 00:21:38
16 00:30:00
17 02:28:09
18 02:28:09
19 00:00:15
20 02:26:29
21 00:01:25
22 00:55:32
23 00:55:32
24 00:23:20
25 00:32:10
26 00:00:02
27 01:00:49
CodePudding user response:
I found this amazing article that might help you https://www.geeksforgeeks.org/how-to-replace-values-in-column-based-on-condition-in-pandas/