Home > Enterprise >  Iterate over dataframe with NaN condition
Iterate over dataframe with NaN condition

Time:05-26

I have a table where I want to extract data from table A and collate it as table B:

Table A:

Day City A City B City C
Mon NaN Mike NaN
Tue NaN NaN Joe
Wed Jack Charlie NaN

Table B:

Day Name City
Mon Mike City B
Tue Joe City C
Wed Jack City A
Wed Charlie City B

I have pulled this information from an excel sheet and am using python to undertake this task. My thinking is I need to draw the data in as a dataframe, iterate over the rows looking for entries that do not contain NaN and store their location and associated data in a new dataframe.

Unfortunately, i'm getting stuck on placing conditions to ignore the NaN entries, I am trying to test this out step-by-step and have got this far:

    import pandas as pd
    df = pd.read_excel('./csvtasks/rosta.xlsx',sheet_name='Sheet2')
    
    #open new excel to write to with new variable df2
    #determine whether null or not
    dg=df.notnull()
    #loop over rows
    for i,j in dg.iteritems():
        if dg.bool==FALSE:
            print('skipped something') #i want this to skip but using this print to see if it's actually skipped anything
        else:
            print (i,j)
            #this will be replace by some command that uses the df.iloc[something] and writes to df2, printing for now so i can see what it does
    #loop to end
    #close file

All this does is give me the whole dataframe as a bool like this:

Day City A City B City C
True False True False
True False False True
True True True False

CodePudding user response:

Try with stack

s = df.set_index('Day').stack().reset_index()
s.columns = ['Day','City','Name']
s
Out[43]: 
   Day    City     Name
0  Mon  City B     Mike
1  Tue  City C      Joe
2  Wed  City A     Jack
3  Wed  City B  Charlie
  • Related