Home > Blockchain >  Make a new dataframe from multiple dataframes
Make a new dataframe from multiple dataframes

Time:08-13

Suppose I have 3 dataframes that are wrapped in a list. The dataframes are:

df_1 = pd.DataFrame({'text':['a','b','c','d','e'],'num':[2,1,3,4,3]})
df_2 = pd.DataFrame({'text':['f','g','h','i','j'],'num':[1,2,3,4,3]})
df_3 = pd.DataFrame({'text':['k','l','m','n','o'],'num':[6,5,3,1,2]})

The list of the dfs is:

df_list = [df_1, df_2, df_3]

Now I want to make a for loop such that goes on df_list, and for each df takes the text column and merge them on a new dataframe with a new column head called topic. Now since each text column is different from each dataframe I want to populate the headers as topic_1, topic_2, etc. The desired outcome should be as follow:

  topic_1 topic_2 topic_3
0       a       f       k
1       b       g       l
2       c       h       m
3       d       i       n
4       e       j       o

I can easily extract the text columns as:

lst = []
for i in range(len(df_list)):
    lst.append(df_list[i]['text'].tolist())

It is just that I am stuck on the last part, namely bringing the columns into 1 df without using brute force.

CodePudding user response:

You can extract the wanted columns with a list comprehension and concat them:

pd.concat([d['text'].rename(f'topic_{i}')
           for i,d in enumerate(df_list, start=1)],
          axis=1)

output:

  topic_1 topic_2 topic_3
0       a       f       k
1       b       g       l
2       c       h       m
3       d       i       n
4       e       j       o

CodePudding user response:

Generally speaking you want to avoid looping anything on a pandas DataFrame. However, in this solution I do use a loop to rename your columns. This should work assuming you just have these 3 dataframes:

import pandas as pd

df_1 = pd.DataFrame({'text':['a','b','c','d','e'],'num':[2,1,3,4,3]})
df_2 = pd.DataFrame({'text':['f','g','h','i','j'],'num':[1,2,3,4,3]})
df_3 = pd.DataFrame({'text':['k','l','m','n','o'],'num':[6,5,3,1,2]})

df_list = [df_1.text, df_2.text, df_3.text]
df_combined = pd.concat(df_list,axis=1)
df_combined.columns = [f"topic_{i 1}" for i in range(len(df_combined.columns))]
>>> df_combined
  topic_1 topic_2 topic_3
0       a       f       k
1       b       g       l
2       c       h       m
3       d       i       n
4       e       j       o

  • Related