Home > Net >  Replace values greater than 0 with column name
Replace values greater than 0 with column name

Time:09-20

There is a dataframe of id's in the rows and columns that represent the number of times a user has visited the website.I would like a daywise view of all the user id's who have visited the website which in turn is represented by all value greater than or equal to 1

Df

Codes  2022-09-04  2022-09-03  2022-09-02
A1AA        1           0          0
A1BB        0           0          2
A1CC        5           0          0
A2DD        0           5          0 
A1EE        0           1          0
A1AA        0           0          1

Expected Output

Dates          Codes

2022-09-04     A1AA
2022-09-04     A1CC
2022-09-03     A2DD
2022-09-03     A1EE
2022-09-02     A1BB
2022-09-02     A1AA

CodePudding user response:

You can use: stack to benefit with dropping if NA:

(df.set_index('Codes')
 .replace(0, pd.NA)
 .rename_axis(columns='Dates').stack()
 .reset_index().drop(columns=0)
)

Or with melt and loc, although with a different order (you can sort_values if needed):

df.melt('Codes', var_name='Date').loc[lambda d: d.pop('value').ne(0)]

output:

  Codes       Dates
0  A1AA  2022-09-04
1  A1BB  2022-09-02
2  A1CC  2022-09-04
3  A2DD  2022-09-03
4  A1EE  2022-09-03
5  A1AA  2022-09-02

CodePudding user response:

Doing dot

s = df.set_index('Codes')
s = s.gt(0).dot(s.columns).reset_index(name='Dates')
Out[34]: 
  Codes       Dates
0  A1AA  2022-09-04
1  A1BB  2022-09-02
2  A1CC  2022-09-04
3  A2DD  2022-09-03
4  A1EE  2022-09-03
5  A1AA  2022-09-02

CodePudding user response:

To get the desired output with the indicated column sequence and row sorting (descending by Dates), you can do this:

df = (
    ((df.set_index('Codes') != 0) @ df.columns[1:])
    .reset_index()
    .iloc[:, ::-1]
    .set_axis(['Dates','Codes'], axis=1)
    .sort_values('Dates', ascending=False, ignore_index=True) )

Explanation:

  • The @ operator is equivalent to the dot() method of DataFrame
  • set_index() gets the Codes column out of the way to perform dot() against the remaining column labels in df
  • reset_index() restores Codes to a column
  • iloc[] reverses the column sequence
  • set_axis() labels the columns as specified in OP
  • sort_values() puts the dates in descending order.

Output:

        Dates Codes
0  2022-09-04  A1AA
1  2022-09-04  A1CC
2  2022-09-03  A2DD
3  2022-09-03  A1EE
4  2022-09-02  A1BB
5  2022-09-02  A1AA
  • Related