Home > Enterprise >  Replace values in Columns with previous cell value
Replace values in Columns with previous cell value

Time:06-17

I want to replace values from a column in a dataframe to NaN and then replace Nan to the value from the previous row. This is the code that I'm currently using:

import pandas as pd

file = ("C:/Users/eestrada/Desktop/Copy of Routing State Reason.xls")
finalfile = ("C:/Users/eestrada/Desktop/Copy of Routing State Reason.csv")
df = pd.read_excel(file)
date = df.iat[1,1]
date = date
date = date[0:9]
#abb = ['Jan *','Feb *','Mar *','Apr *','May *','Jun *','Jul *','Aug *','Sep *','Oct *','Nov *','Dec *']

df = pd.read_csv(finalfile)
df= df.drop([0,1,2,3,4,5,6,7])
df.columns = ['Name','UserID','Reason','PresenceState','Total_Time']

And this is how my data looks like:

                  Name UserID                     Reason PresenceState Total_Time
8    Aguilar, Bertrand   2653                        NaN           NaN   00:41:10
9               Jun 15    NaN                        NaN           NaN   00:41:10
10                 NaN    NaN        Unavailable - Break          Away   00:09:03
11                 NaN    NaN        Unavailable - Lunch          Away   00:32:07
12                 NaN    NaN            Work - Outbound          Busy   00:00:00
13      Aguilar, Sarah   2546                        NaN           NaN   00:51:38
14              Jun 15    NaN                        NaN           NaN   00:51:38
15                 NaN    NaN        Unavailable - Break          Away   00:21:38
16                 NaN    NaN        Unavailable - Lunch          Away   00:30:00
17        Alcala, Jose   2584                        NaN           NaN   02:28:09
18              Jun 15    NaN                        NaN           NaN   02:28:09
19                 NaN    NaN  Unavailable - Service Out          Away   00:00:15
20                 NaN    NaN           Work - IT Issues          Busy   02:26:29
21                 NaN    NaN            Work - Outbound          Busy   00:01:25
22     Alonso, Eduardo   2648                        NaN           NaN   00:55:32
23              Jun 15    NaN                        NaN           NaN   00:55:32
24                 NaN    NaN        Unavailable - Break          Away   00:23:20
25                 NaN    NaN        Unavailable - Lunch          Away   00:32:10
26                 NaN    NaN        Work - 1on1Sessions          Busy   00:00:02
27  Andrade, Guillermo   2526                        NaN           NaN   01:00:49

Now this is the desired output:

enter image description here

Hope you all can help me :c

CodePudding user response:

IIUC, one way using ffill with groupby.transform:

df["UserID"] = df["UserID"].ffill()
df["Name"] = df.groupby("UserID")["Name"].transform("first")

Output:

                  Name  UserID                     Reason PresenceState  \
8    Aguilar, Bertrand  2653.0                        NaN           NaN   
9    Aguilar, Bertrand  2653.0                        NaN           NaN   
10   Aguilar, Bertrand  2653.0        Unavailable - Break          Away   
11   Aguilar, Bertrand  2653.0        Unavailable - Lunch          Away   
12   Aguilar, Bertrand  2653.0            Work - Outbound          Busy   
13      Aguilar, Sarah  2546.0                        NaN           NaN   
14      Aguilar, Sarah  2546.0                        NaN           NaN   
15      Aguilar, Sarah  2546.0        Unavailable - Break          Away   
16      Aguilar, Sarah  2546.0        Unavailable - Lunch          Away   
17        Alcala, Jose  2584.0                        NaN           NaN   
18        Alcala, Jose  2584.0                        NaN           NaN   
19        Alcala, Jose  2584.0  Unavailable - Service Out          Away   
20        Alcala, Jose  2584.0           Work - IT Issues          Busy   
21        Alcala, Jose  2584.0            Work - Outbound          Busy   
22     Alonso, Eduardo  2648.0                        NaN           NaN   
23     Alonso, Eduardo  2648.0                        NaN           NaN   
24     Alonso, Eduardo  2648.0        Unavailable - Break          Away   
25     Alonso, Eduardo  2648.0        Unavailable - Lunch          Away   
26     Alonso, Eduardo  2648.0        Work - 1on1Sessions          Busy   
27  Andrade, Guillermo  2526.0                        NaN           NaN   

   Total_Time  
8    00:41:10  
9    00:41:10  
10   00:09:03  
11   00:32:07  
12   00:00:00  
13   00:51:38  
14   00:51:38  
15   00:21:38  
16   00:30:00  
17   02:28:09  
18   02:28:09  
19   00:00:15  
20   02:26:29  
21   00:01:25  
22   00:55:32  
23   00:55:32  
24   00:23:20  
25   00:32:10  
26   00:00:02  
27   01:00:49  

CodePudding user response:

I found this amazing article that might help you https://www.geeksforgeeks.org/how-to-replace-values-in-column-based-on-condition-in-pandas/

  • Related