Home > Blockchain >  Python: concat data frames then save them to one csv
Python: concat data frames then save them to one csv

Time:03-02

I have multiple data frames. I want to get some rows from each data frame based on a certain condition and add them into one data frame, then save them to one csv file. I tried multiple methods, append with data frames is deprecated.

Here is the simple code. I want to retrieve the above and below values for all the rows larger than 2. result= pd.concat() returns the required rows with the headers. That means with every iteration from the for loop, it prints the required rows. However, when I save them to csv, only the last three saved. How do I save/append the rows before adding them to the csv? What am I missing here?

df_sorted = pd.DataFrame({"ID": [1,2,3,4,5,6],
                          "User": ['a','b','c','d','e','f']})
Max = pd.DataFrame()
above = pd.DataFrame()
below = pd.DataFrame()
for i in range(len(df_sorted)): 
  
  if df_sorted.ID[i] > 2:
     Max = df_sorted.iloc[[i]] # first df

     if i < len(df_sorted) - 1:
        above = df_sorted.iloc[[i 1]] # second df
       
     if i > 0:
        below = df_sorted.iloc[[i-1]] #third df

 frames = [above, Max, below]
 result = pd.concat(frames)
 result.to_csv('new_df.csv')

The desired result should be,

ID    User
 2     b
 3     c
 4     d
 3     c
 4     d
 5     e
 4     d
 5     e
 6     f
 5     e
 6     f

what I get from result is,

ID    User   
 5     e
 6     f
 6     f

CodePudding user response:

Here it is:

columns = [ 'id', 'user']
Max = pd.DataFrame(columns=columns)
above = pd.DataFrame(columns=columns)
below = pd.DataFrame(columns=columns)

for i in range(len(df_sorted)): 


  if df_sorted.ID[i] > 2:
    
      Max.loc[i,'id' ]=df_sorted.iloc[i, 0] 
      Max.loc[i,'user' ]=df_sorted.iloc[i, 1]



  if i < len(df_sorted) - 1:
    
      above.loc[i,'id' ]=df_sorted.iloc[i 1, 0]
      above.loc[i,'user' ]=df_sorted.iloc[i 1, 1]

   
  elif i > 0:
    
      below.loc[i,'id' ]=df_sorted.iloc[i-1, 0]
      below.loc[i,'user' ]=df_sorted.iloc[i-1, 1]

result = pd.concat([above, Max, below], axis = 0)
result

enter image description here

CodePudding user response:

It seems that you did not define the Max, above and below. Now, Max and above and below are only one value and every time, they are updated.

You should define Max=pd.dataframe(columns) or array and same thing for above and below. With this, you can save the data in these dataframes and with concat, you don't lose the data.

  • Related