Home > Enterprise >  Pandas : How to use one value from a column (value repeats itself) as a header from another column,
Pandas : How to use one value from a column (value repeats itself) as a header from another column,

Time:11-29

I have a data with multiple inputs from a semi-structured csv and I am trying to use one single (first) value from a set of columns (more then 500) as a header for other set of columns containing similar headers (another 500 rows)

After reading it I got something like this

import pandas as pd, numpy as np

df = pd.DataFrame({'Service': np.arange(8),
               'Ticket': np.random.rand(8),
               'Var_1': np.random.rand(8), # values column
               'Var_1_View': 'temp temp temp temp temp temp temp temp'.split(), # header of values of column
               'Var_2': np.arange(8), 
               'Var_2_View': 'pres pres pres pres pres pres pres pres'.split(),
               'Var_3': np.arange(8) * 2,
               'Var_3_View': 'shift shift shift shift shift shift shift shift'.split(),
               'D': np.arange(8) * 5,
               'Mess_3': np.random.rand(8),
               'Mess_3_View': 'id id id id id id id id'.split(),
               'E': np.arange(8)})

Headers containing values end with up to a 3 Digit number _# up to _### (more then 500 to be precise). Headers with description about values end with text : _View

I've created two dfs, one containing and other not containing the expression _View

df_headers =df.iloc[:,df.columns.str.contains('View')] # wanted headers on columns containing values
df_values =df.iloc[:,~df.columns.str.contains('View')] # headers should be replaced here

My idea was to extract first values from the df_headers as a list and using df.replace or df.rename, change headers on the df_values containing the values.

I could do it manually, but i have a huge df with different prefixxes and suffixes, but always using the _View as description to nearest column containing values.

As a result I would have the df_dont with new headers and with columns were this rule doesnt apply (Ticket, D, E, etc).

Since its my first question I would be great to have feedback, about clearness, explanation, or any other positive comments are welcome.

CodePudding user response:

It's not completely clear to me what you want to achieve so this might be off:

view_cols = {col for col in df.columns if col.endswith("_View")}
rename_dict = {
    col.replace("_View", ""): df[col].iat[0] for col in view_cols
}
new_cols = [col for col in df.columns if col not in view_cols]
df_new = df[new_cols].rename(columns=rename_dict)

Result:

   Service    Ticket      temp  pres  shift   D        id  E
0        0  0.623941  0.934402     0      0   0  0.644999  0
1        1  0.122866  0.918892     1      2   5  0.675976  1
2        2  0.472081  0.790443     2      4  10  0.825020  2
3        3  0.914086  0.849609     3      6  15  0.357074  3
4        4  0.684477  0.729126     4      8  20  0.010928  4
5        5  0.132002  0.673680     5     10  25  0.884599  5
6        6  0.841921  0.224638     6     12  30  0.197387  6
7        7  0.721800  0.412439     7     14  35  0.875199  7
  • Related