Pandas - Filling empty cells with a string depending one or more column values-CodePudding

Python beginner here and first question on stackoverflow.

I have a dataframe close to that one (omitted some columns)

ID Sex Unknown Male Female
1         5     2     1
2         1     0     4
3         3     3     2

With the help of this Thread I was able to get partially what I wanted: Making X empty rows under each original row depending on the sum of Unknown, Female and Male. I calculated the Sum of Unknown, Male, Female for each row and used that in the code. And later omitted it.

ID Sex Unknown Male Female
1         5     2     1







2         1     0     4




3         3     3     2

Now I would like to fill in the "Sex" column of the newly created empty rows with corresponding strings to Unknown, Male and Female = "U","M","F" depending on the value of each of those.

Something like that:

ID Sex Unknown Male Female
1   U      5     2     1
    U
    U
    U
    U
    M
    M
    F 
2   U      1     0     4 
    F
    F
    F
    F      
3   U      3     3     2
    U
    U
    M
    M
    M
    F
    F

And now I am unable to grasp a solution for that Problem. What kind of solutions are there ? Thanks!

CodePudding user response：

Does this answer to your question?

df[['Unknown', 'Male', 'Female']] = df[['Unknown', 'Male', 'Female']].apply(
    lambda row: row.multiply( [[col[0]] for col in df.columns[1:]] ), 
    axis=1
)

pd.melt(df, id_vars='Id', value_name='Sex')[['Id', 'Sex']].explode('Sex').dropna()

CodePudding user response：

IIUC, you could use repeat to repeat your index:

# compute the MultiIndex
s = df.rename(columns=lambda x: x[0]).stack()
idx = s.repeat(s).to_frame().index

# reshape and set MultiIndex
df2 = df.loc[df.index.repeat(df.sum(1))].set_axis(idx)

# mask values
m = df2.index.get_level_values(0).duplicated()
df2.loc[m] = ''

output:

Sex   Unknown Male Female
  Sex                    
1 U         5    2      1
  U                      
  U                      
  U                      
  U                      
  M                      
  M                      
  F                      
2 U         1    0      4
  F                      
  F                      
  F                      
  F                      
3 U         3    3      2
  U                      
  U                      
  M                      
  M                      
  M                      
  F                      
  F

Used input:

df = pd.DataFrame({'Unknown': {1: 5, 2: 1, 3: 3},
                   'Male': {1: 2, 2: 0, 3: 3},
                   'Female': {1: 1, 2: 4, 3: 2}}).rename_axis(columns='Sex')