Home > Back-end >  pandas: using a list of dfs, perform some action on all dfs and list df names
pandas: using a list of dfs, perform some action on all dfs and list df names

Time:08-06

I made a list of a set of dfs

dflist = [df1, df2, df3, df4]

I want to loop through all the dfs in the list, print the df name, and print the first 2 lines.

In unix it is simple to do what I want:

cat dflist | while read i; do echo $i && head -2 $i; done

Which returns

US_distribution_of_soybean_aphid_2022.csv
State,Status of Aphis glycines in 2022,Source,
Connecticut,Present,Rutledge (2004),

But in pandas,

for i in dflist:
    print('i')
    print(i.head(2))

returns literal i followed by the desired head(2) results.

i

   Year           Analyte            Class  SortOrder  PercentAcresTreated  \
0  1991          Methomyl        carbamate       2840              0.00125   
1  1991  Methyl parathion  organophosphate       2900              0.01000   

Using

for i in dflist:
    print(i)

prints each df in its entirety.

Very frustrating to a newbie trying to understand the python equivalents of commands I use every day. I'm currently working in a jupyter notebook, if that matters.

Thank you! Sara

CodePudding user response:

Before proceeding, give these answers a read. You can name your dataframes with df.name after you create them. Then you can print their names with print(df.name). Just be careful to not have a column named name. In reality, you can name name anything. df.dataframe_name works as well. Just make it unique in that it is not one of your column names:

test = pd.DataFrame({"Column":[1,2,3,4,5]})
test.name = "Test"

new_test = pd.DataFrame({"Column":[1,2,3,4,5]})
new_test.name = "New Test"

third_test = pd.DataFrame({"Column":[1,2,3,4,5]})
third_test.name = "Third Test"

last_test = pd.DataFrame({"Column":[1,2,3,4,5]})
last_test.name = "Last Test"
dflist = [test, new_test, third_test, last_test]
for i in dflist:
    print(i.name)
    print(i.head(2))

Output:

Test
   Column
0       1
1       2
New Test
   Column
0       1
1       2
Third Test
   Column
0       1
1       2
Last Test
   Column
0       1
1       2

CodePudding user response:

You must use a container. Accessing names programmatically is highly discouraged.

For instance, a list:

dfs = [df1, df2, df3]

for df in dfs:
    print(df.head())

Or, if you need "names", a dictionary:

dfs = {'start': df1, 'middle': df2, 'end': df3}

for name, df in dfs.items():
    print(name)
    print(df.head())
  • Related