Python beginner here and first question on stackoverflow.
I have a dataframe close to that one (omitted some columns)
ID Sex Unknown Male Female
1 5 2 1
2 1 0 4
3 3 3 2
With the help of this Thread I was able to get partially what I wanted: Making X empty rows under each original row depending on the sum of Unknown, Female and Male. I calculated the Sum of Unknown, Male, Female for each row and used that in the code. And later omitted it.
ID Sex Unknown Male Female
1 5 2 1
2 1 0 4
3 3 3 2
Now I would like to fill in the "Sex" column of the newly created empty rows with corresponding strings to Unknown, Male and Female = "U","M","F" depending on the value of each of those.
Something like that:
ID Sex Unknown Male Female
1 U 5 2 1
U
U
U
U
M
M
F
2 U 1 0 4
F
F
F
F
3 U 3 3 2
U
U
M
M
M
F
F
And now I am unable to grasp a solution for that Problem. What kind of solutions are there ? Thanks!
CodePudding user response:
Does this answer to your question?
df[['Unknown', 'Male', 'Female']] = df[['Unknown', 'Male', 'Female']].apply(
lambda row: row.multiply( [[col[0]] for col in df.columns[1:]] ),
axis=1
)
pd.melt(df, id_vars='Id', value_name='Sex')[['Id', 'Sex']].explode('Sex').dropna()
CodePudding user response:
IIUC, you could use repeat
to repeat your index:
# compute the MultiIndex
s = df.rename(columns=lambda x: x[0]).stack()
idx = s.repeat(s).to_frame().index
# reshape and set MultiIndex
df2 = df.loc[df.index.repeat(df.sum(1))].set_axis(idx)
# mask values
m = df2.index.get_level_values(0).duplicated()
df2.loc[m] = ''
output:
Sex Unknown Male Female
Sex
1 U 5 2 1
U
U
U
U
M
M
F
2 U 1 0 4
F
F
F
F
3 U 3 3 2
U
U
M
M
M
F
F
Used input:
df = pd.DataFrame({'Unknown': {1: 5, 2: 1, 3: 3},
'Male': {1: 2, 2: 0, 3: 3},
'Female': {1: 1, 2: 4, 3: 2}}).rename_axis(columns='Sex')