I have a dataset where I would like to de aggregate the values into their own unique rows as well as perform a pivot.
Data
ID type Q1 24 Q2 24
AA hello 1 1
AA hi 2 1
AA bye 1 0
AA ok 0 1
BB hello 1 0
BB hi 1 0
BB bye 0 1
BB ok 1 1
Desired
ID date type
AA Q1 24 hello
AA Q1 24 hi
AA Q1 24 hi
AA Q1 24 bye
AA Q2 24 hello
AA Q2 24 hi
AA Q2 24 ok
BB Q1 24 hello
BB Q1 24 hi
BB Q1 24 ok
BB Q1 24 bye
BB Q2 24 ok
Doing
import pandas as pd
import janitor
df.pivot_longer(
index = 'ID',
names_to = ('date', '.value'),
names_pattern = r"(Q\d )_?(. )",
sort_by_appearance = True)
I am currently researching, any suggestion is appreciated.
CodePudding user response:
For this use case, you do not need the .value
for pivot_longer
; the .value
is used if you wish to retain part of the column header, in this case you are not, you are simply flipping everything into long form and filtering for rows where the value is greater than 0 :
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = ['ID', 'type'],
names_to = 'date')
.loc[lambda df: df.value.gt(0), ['ID', 'type', 'date']]
)
ID type date
0 AA hello Q1 24
1 AA hi Q1 24
2 AA bye Q1 24
4 BB hello Q1 24
5 BB hi Q1 24
7 BB ok Q1 24
8 AA hello Q2 24
9 AA hi Q2 24
11 AA ok Q2 24
14 BB bye Q2 24
15 BB ok Q2 24
Of course, this transformation is pretty simple and you don't even need pivot_longer
, you can do it with pd.melt
:
(df
.melt(
['ID', 'type'],
var_name='date')
.loc[lambda df: df.value.gt(0), ['ID', 'type', 'date']]
)
ID type date
0 AA hello Q1 24
1 AA hi Q1 24
2 AA bye Q1 24
4 BB hello Q1 24
5 BB hi Q1 24
7 BB ok Q1 24
8 AA hello Q2 24
9 AA hi Q2 24
11 AA ok Q2 24
14 BB bye Q2 24
15 BB ok Q2 24
CodePudding user response:
IIUC, one way using pandas.Index.repeat
with pandas.concat
:
df = df.set_index(["ID", "type"])
new_df = pd.concat([pd.Series(c, index=df.index.repeat(df[c]))
for c in df]).reset_index(name="date")
# Optionally, sort values
new_df = new_df.sort_values(["ID", "date"], ignore_index=True)
Output:
ID type date
0 AA hello Q1 24
1 AA hi Q1 24
2 AA hi Q1 24
3 AA bye Q1 24
4 AA hello Q2 24
5 AA hi Q2 24
6 AA ok Q2 24
7 BB hello Q1 24
8 BB hi Q1 24
9 BB ok Q1 24
10 BB bye Q2 24
11 BB ok Q2 24