Say I have a df:
df = pd.DataFrame({'A.C.1_v': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C.C.1_f': [4, 5, 6], 'D': ['e', 'f', 'g'], 'E': [7, 8, 9]})
Noticed that the col of interest are those nmae includes "C.1_letter"
I have built a list corresponding of selected columns : col_list = [A.C.1_v, C.C.1_f]
Objective : Create several dataframes as follow (in this illustration only 2 dfs are built, but there could be much more in practice)
The first df
- Takes the name with the following convention name : "df_AC1_v"
- Is composed of the values of column A.C.1_v and the values of columns D and E
So, for df_AC1_v we would have the following output: output 1 without iteration
The second df
- Takes the name with the following convention name : "df_CC1_f"
- Is composed of the values of column C.C.1_f and the values of columns D and E So, for df_CC1_f, we would have the following output: Output2 without iteration
My point is to do this iteratively, but so far, what I have attempted does not work.
Here are the codes I have done. It bugs in the loop for and I do not understand why. First I extract the col list and create a list as follow:
col_list = list(df)
list_c1 = list(filter(lambda x:'.C.1' in x, col_list))
list_c1 = [str(r) for r in list_c1]
in: list_c1
out:['A.C.1_v', 'C.C.1_f']
Second I isolate the 'C.1'
list_c1_bis = []
for element in list_c1:
stock = element.split('.C.1')
list_c1_bis.append(stock)
in : list_c1_bis
out:[['A', '_v'], ['C', '_f']]
Until now, I am happy. Where it bugs is the code below:
for line in list_c1_bis:
name1 ='df' '_' line[0] 'C1' line[1]
vars()[name1] = df[[list_c1[0],'D','E']]
My outputs are indeed as follow:
in: df_AC1_v
==> OK correct
out:
output1
in: df_CC1_f
==> Wrong it has taken the inappropriate column A.C.1_v, instead of expected C.C.1_f
output2
Your suggestions are welcome !
Thanks a lot for your time and help, that will be truly appreciated
nb : please feel free to modify the first steps that work if you think you have a better solution
Kindest regards
CodePudding user response:
I strongly discouraged you to create variables dynamically with vars
, locals
or globals
. Prefer to use dictionary.
Try
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
locals()[f"df_{name}"] = df[[col, 'D', 'E']]
Output:
>>> df_AC1_v
A.C.1_v D E
0 1 e 7
1 2 f 8
2 3 g 9
>>> df_CC1_f
C.C.1_f D E
0 4 e 7
1 5 f 8
2 6 g 9
Alternative with dictionary:
dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
dfs[name] = df[[col, 'D', 'E']]
Output:
>>> dfs['AC1_v']
A.C.1_v D E
0 1 e 7
1 2 f 8
2 3 g 9
>>> dfs['CC1_f']
C.C.1_f D E
0 4 e 7
1 5 f 8
2 6 g 9
CodePudding user response:
Hi Corralien and first let me thank you for your prompt reply that is truly appreciated.
I have tried the first code
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
locals()[f"df_{name}"] = df[[col, 'D', 'E']]
But, I have the following error : File "", line 3 locals()[f"df_{name}"] = df[[col, 'D', 'E']] ^ SyntaxError: invalid syntax
I have also tried the second proposed code that gives the solution under dictionary.
dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
dfs[name] = df[[col, 'D', 'E']]
It runs without error, but when I check the existence of the DFs
in: df_AC1_v
I have the following errors : NameError: name 'df_AC1_v' is not defined
I understand that to get the df , it is required to write : dfs['AC1_v']
The second solution is acceptable, but I would prefer the first solution if it worked.
Kindest regards