Consider the case where I have three dataframes, that are called df1, df2 and df3.
df1 has 5 columns: x1, x2, x3, age, height. df2 has 3 columns: x1, x2, weight. df3 has 4 columns: x1, x2, x3, bmi.
For each of these dataframes, I want to create a list of the demographic variables e.g.
df1_demographics=['age', 'height']
df2_demographics=['weight']
df3_demographics=['bmi']
I want to be able to call the list in a loop in the following way:
for dataset in df1, df2, df3:
print(dataset_demographics)
My actual loop is very long and I need to loop through the dataframes. That's why I specifically want a way of calling the lists within a for loop looping through the dataframes.
The desired output of this loop would be
['age', 'height']
['weight']
['bmi']
CodePudding user response:
I'm not entirely sure what your question is asking. Are you looking to associate a subset of columns with each of your dataframes like so?
demographic_cols = [
['age', 'height'],
['weight'],
['bmi']
]
dataframes = [df1, df2, df3]
for dataset, demographic_cols in zip(dataframes, demographic_cols):
print(dataset, demographic_cols)
CodePudding user response:
Try renaming your vars, if you do something similar to this you'll get your desired output.
df1=['age', 'height']
df2=['weight']
df3=['bmi']
for dataset in df1, df2, df3:
print(dataset)
Output:
['age', 'height']
['weight']
['bmi']
CodePudding user response:
I think this can be done using zip
df1_demographics=['age', 'height']
df2_demographics=['weight']
df3_demographics=['bmi']
demographics = [df1_demographics, df2_demographics, df3_demographics]
dfs = [df1, df2, df3]
for df, demographics in zip(dfs, demographics):
# do what ever you want to do
# for example
for val in demographics:
print(df[val])
CodePudding user response:
I believe some more clarity is needed in the question. But here's what I understand. You wish to create a list of column names that you can extract from any of the three dataset that you have created. Here's how you can do that,
demographics = ['age','height','weight','bmi']
df1 = pd.DataFrame(np.random.randint(0,100,size=(5, 5)), columns=['x1','x2','x3','age','height'])
df2 = pd.DataFrame(np.random.randint(0,100,size=(5, 3)), columns=['x1','x2','weight'])
df3 = pd.DataFrame(np.random.randint(0,100,size=(5, 4)), columns=['x1','x2','x3','bmi'])
for dataset in df1, df2, df3:
print(dataset.loc[:,dataset.columns.isin(demographics)])
The result would look like this
Hope this helps a bit.