There is a dataframe of id's in the rows and columns that represent the number of times a user has visited the website.I would like a daywise view of all the user id's who have visited the website which in turn is represented by all value greater than or equal to 1
Df
Codes 2022-09-04 2022-09-03 2022-09-02
A1AA 1 0 0
A1BB 0 0 2
A1CC 5 0 0
A2DD 0 5 0
A1EE 0 1 0
A1AA 0 0 1
Expected Output
Dates Codes
2022-09-04 A1AA
2022-09-04 A1CC
2022-09-03 A2DD
2022-09-03 A1EE
2022-09-02 A1BB
2022-09-02 A1AA
CodePudding user response:
You can use: stack
to benefit with dropping if NA:
(df.set_index('Codes')
.replace(0, pd.NA)
.rename_axis(columns='Dates').stack()
.reset_index().drop(columns=0)
)
Or with melt
and loc
, although with a different order (you can sort_values
if needed):
df.melt('Codes', var_name='Date').loc[lambda d: d.pop('value').ne(0)]
output:
Codes Dates
0 A1AA 2022-09-04
1 A1BB 2022-09-02
2 A1CC 2022-09-04
3 A2DD 2022-09-03
4 A1EE 2022-09-03
5 A1AA 2022-09-02
CodePudding user response:
Doing dot
s = df.set_index('Codes')
s = s.gt(0).dot(s.columns).reset_index(name='Dates')
Out[34]:
Codes Dates
0 A1AA 2022-09-04
1 A1BB 2022-09-02
2 A1CC 2022-09-04
3 A2DD 2022-09-03
4 A1EE 2022-09-03
5 A1AA 2022-09-02
CodePudding user response:
To get the desired output with the indicated column sequence and row sorting (descending by Dates
), you can do this:
df = (
((df.set_index('Codes') != 0) @ df.columns[1:])
.reset_index()
.iloc[:, ::-1]
.set_axis(['Dates','Codes'], axis=1)
.sort_values('Dates', ascending=False, ignore_index=True) )
Explanation:
- The
@
operator is equivalent to thedot()
method of DataFrame set_index()
gets theCodes
column out of the way to performdot()
against the remaining column labels indf
reset_index()
restoresCodes
to a columniloc[]
reverses the column sequenceset_axis()
labels the columns as specified in OPsort_values()
puts the dates in descending order.
Output:
Dates Codes
0 2022-09-04 A1AA
1 2022-09-04 A1CC
2 2022-09-03 A2DD
3 2022-09-03 A1EE
4 2022-09-02 A1BB
5 2022-09-02 A1AA