I have a data with multiple inputs from a semi-structured csv and I am trying to use one single (first) value from a set of columns (more then 500) as a header for other set of columns containing similar headers (another 500 rows)
After reading it I got something like this
import pandas as pd, numpy as np
df = pd.DataFrame({'Service': np.arange(8),
'Ticket': np.random.rand(8),
'Var_1': np.random.rand(8), # values column
'Var_1_View': 'temp temp temp temp temp temp temp temp'.split(), # header of values of column
'Var_2': np.arange(8),
'Var_2_View': 'pres pres pres pres pres pres pres pres'.split(),
'Var_3': np.arange(8) * 2,
'Var_3_View': 'shift shift shift shift shift shift shift shift'.split(),
'D': np.arange(8) * 5,
'Mess_3': np.random.rand(8),
'Mess_3_View': 'id id id id id id id id'.split(),
'E': np.arange(8)})
Headers containing values end with up to a 3 Digit number _# up to _### (more then 500 to be precise). Headers with description about values end with text : _View
I've created two dfs, one containing and other not containing the expression _View
df_headers =df.iloc[:,df.columns.str.contains('View')] # wanted headers on columns containing values
df_values =df.iloc[:,~df.columns.str.contains('View')] # headers should be replaced here
My idea was to extract first values from the df_headers as a list and using df.replace or df.rename, change headers on the df_values containing the values.
I could do it manually, but i have a huge df with different prefixxes and suffixes, but always using the _View as description to nearest column containing values.
As a result I would have the df_dont with new headers and with columns were this rule doesnt apply (Ticket, D, E, etc).
Since its my first question I would be great to have feedback, about clearness, explanation, or any other positive comments are welcome.
CodePudding user response:
It's not completely clear to me what you want to achieve so this might be off:
view_cols = {col for col in df.columns if col.endswith("_View")}
rename_dict = {
col.replace("_View", ""): df[col].iat[0] for col in view_cols
}
new_cols = [col for col in df.columns if col not in view_cols]
df_new = df[new_cols].rename(columns=rename_dict)
Result:
Service Ticket temp pres shift D id E
0 0 0.623941 0.934402 0 0 0 0.644999 0
1 1 0.122866 0.918892 1 2 5 0.675976 1
2 2 0.472081 0.790443 2 4 10 0.825020 2
3 3 0.914086 0.849609 3 6 15 0.357074 3
4 4 0.684477 0.729126 4 8 20 0.010928 4
5 5 0.132002 0.673680 5 10 25 0.884599 5
6 6 0.841921 0.224638 6 12 30 0.197387 6
7 7 0.721800 0.412439 7 14 35 0.875199 7