I have a table where I want to extract data from table A and collate it as table B:
Table A:
Day | City A | City B | City C |
---|---|---|---|
Mon | NaN | Mike | NaN |
Tue | NaN | NaN | Joe |
Wed | Jack | Charlie | NaN |
Table B:
Day | Name | City |
---|---|---|
Mon | Mike | City B |
Tue | Joe | City C |
Wed | Jack | City A |
Wed | Charlie | City B |
I have pulled this information from an excel sheet and am using python to undertake this task. My thinking is I need to draw the data in as a dataframe, iterate over the rows looking for entries that do not contain NaN and store their location and associated data in a new dataframe.
Unfortunately, i'm getting stuck on placing conditions to ignore the NaN entries, I am trying to test this out step-by-step and have got this far:
import pandas as pd
df = pd.read_excel('./csvtasks/rosta.xlsx',sheet_name='Sheet2')
#open new excel to write to with new variable df2
#determine whether null or not
dg=df.notnull()
#loop over rows
for i,j in dg.iteritems():
if dg.bool==FALSE:
print('skipped something') #i want this to skip but using this print to see if it's actually skipped anything
else:
print (i,j)
#this will be replace by some command that uses the df.iloc[something] and writes to df2, printing for now so i can see what it does
#loop to end
#close file
All this does is give me the whole dataframe as a bool like this:
Day | City A | City B | City C |
---|---|---|---|
True | False | True | False |
True | False | False | True |
True | True | True | False |
CodePudding user response:
Try with stack
s = df.set_index('Day').stack().reset_index()
s.columns = ['Day','City','Name']
s
Out[43]:
Day City Name
0 Mon City B Mike
1 Tue City C Joe
2 Wed City A Jack
3 Wed City B Charlie