I have a df like this :
PRODUCTNUMBER | Jerarquía principal | Jerarquía secundaria marcas | COT | Ecommerce | dabra-catalog | Dexter-ecommerce | Stockcenter-ecommerce |
---|---|---|---|---|---|---|---|
AD802309 | Medias-Hombre | ADIDAS | 950699 | NaN | NaN | NaN | NaN |
AD481076 | NaN | Adidas | 950699 | NaN | NaN | NaN | NaN |
AD481137 | Medias-Hombre | Adidas | 950699 | Medias-Hombre | Medias-Hombre | Medias-Hombre | Medias-Hombre |
and I need to get this output:
PRODUCTNUMBER | PRODUCTCATEGORYNAME | PRODUCTCATEGORYHIERARCHYNAME |
---|---|---|
AD802309 | Medias-Hombre | Jerarquía principal |
AD802309 | ADIDAS | Jerarquía secundaria marcas |
AD802309 | 950699 | COT |
AD481076 | Adidas | Jerarquía secundaria marcas |
AD481076 | 950699 | COT |
AD481137 | Medias-Hombre | Jerarquía principal |
AD481137 | Adidas | Jerarquía secundaria marcas |
AD481137 | 950699 | COT |
AD481137 | Medias-Hombre | Ecommerce |
AD481137 | Medias-Hombre | dabra-catalog |
AD481137 | Medias-Hombre | Dexter-ecommerce |
AD481137 | Medias-Hombre | Stockcenter-ecommerce |
is it possible? "NaN" values must not be transposed
CodePudding user response:
Try:
df = (
df.set_index("PRODUCTNUMBER")
.stack()
.reset_index()
.rename(
columns={
0: "PRODUCTCATEGORYNAME",
"level_1": "PRODUCTCATEGORYHIERARCHYNAME",
}
)
)
df = df[["PRODUCTNUMBER", "PRODUCTCATEGORYNAME", "PRODUCTCATEGORYHIERARCHYNAME"]]
print(df)
Prints:
PRODUCTNUMBER PRODUCTCATEGORYNAME PRODUCTCATEGORYHIERARCHYNAME
0 AD802309 Medias-Hombre Jerarquía principal
1 AD802309 ADIDAS Jerarquía secundaria marcas
2 AD802309 950699 COT
3 AD481076 Adidas Jerarquía secundaria marcas
4 AD481076 950699 COT
5 AD481137 Medias-Hombre Jerarquía principal
6 AD481137 Adidas Jerarquía secundaria marcas
7 AD481137 950699 COT
8 AD481137 Medias-Hombre Ecommerce
9 AD481137 Medias-Hombre dabra-catalog
10 AD481137 Medias-Hombre Dexter-ecommerce
11 AD481137 Medias-Hombre Stockcenter-ecommerce
CodePudding user response:
Try with melt
out = df.melt('PRODUCTNUMBER',
value_name='PRODUCTCATEGORYHIERARCHYNAME',
var_name='PRODUCTCATEGORYNAME').dropna()
Out[201]:
PRODUCTNUMBER PRODUCTCATEGORYNAME PRODUCTCATEGORYHIERARCHYNAME
0 AD802309 Jerarquía principal Medias-Hombre
2 AD481137 Jerarquía principal Medias-Hombre
3 AD802309 Jerarquía secundaria marcas ADIDAS
4 AD481076 Jerarquía secundaria marcas Adidas
5 AD481137 Jerarquía secundaria marcas Adidas
6 AD802309 COT 950699
7 AD481076 COT 950699
8 AD481137 COT 950699
11 AD481137 Ecommerce Medias-Hombre
14 AD481137 dabra-catalog Medias-Hombre
17 AD481137 Dexter-ecommerce Medias-Hombre
20 AD481137 Stockcenter-ecommerce Medias-Hombre
CodePudding user response:
need simple example for answer
Example
data = {'A': {'a': 'val1', 'b': 'val3'},
'B': {'a': None, 'b': 'val4'},
'C': {'a': 'val2', 'b': None}}
df = pd.DataFrame(data)
output(df
):
A B C
a val1 None val2
b val3 val4 None
Code
when unpivot by stack
, we can drop null automatic
df.stack().reset_index().set_axis(['col1', 'col2', 'col3'], axis=1)
result:
col1 col2 col3
0 a A val1
1 a C val2
2 b A val3
3 b B val4