Home > front end >  (automatically) clean up data coming from a Python Dataframe.CSV
(automatically) clean up data coming from a Python Dataframe.CSV

Time:01-23

I've created a Dataframe within Python from some webpages I've scraped. When I imported the CSV file in my Google Sheets I encountered an issue: all data cells are mixed and matched. Sometimes a row would contain information like this: col1, col4, col2, col8, etc.

Question: Is there a way to prevent columns from being randomized in the CSV file? If not, is there an easy way to organize columns again in Google Sheets?

My current code [left out most of the unnecessary lines] -->

from bs4 import BeautifulSoup
import requests
import pandas as pd

  random_information = {
    col1,
    col2,
    col3,
    col4,
    col5,
    col6,
    col7,
    col8
  }

  randomDB.append(random_information)
  print(random_information)

df = pd.DataFrame(randomDB)
print(df)
df.to_csv('random.csv')

Any suggestions are welcome! :-)

CodePudding user response:

Yes, you can pass a dict as an argument instead. Each key would be a column of your dataframe.

pd.DataFrame({"col1": [1, 2], "col2": [1 , 3]})

The difference with your code is that you create a dataframe from a list. The code above uses a dictionary.

CodePudding user response:

There is a dedicated function provided by panda DataFrame(s): DataFrame.sort_values

here is working example:

import pandas as pd 
          
series = {}
for x in range(100):
  arr1 = []
  for i in range(10):
    arr1.append(random())
    
  series[x] = arr1
  
df = pd.DataFrame(series)
df.sort_values(by=1, inplace=True)
print(df)

df.sort_values(by=2, inplace=True)
print(df)

output1:

        0         1         2   ...        97        98        99
3  0.816715  0.008932  0.950971  ...  0.919954  0.407322  0.682435
5  0.455805  0.075427  0.502535  ...  0.686747  0.504749  0.217507
4  0.310290  0.151038  0.061864  ...  0.077576  0.783444  0.784403

output2:

        0         1         2   ...        97        98        99
8  0.677506  0.438093  0.032239  ...  0.055174  0.242884  0.794950
4  0.310290  0.151038  0.061864  ...  0.077576  0.783444  0.784403
6  0.006972  0.604672  0.251232  ...  0.496487  0.674959  0.308529
  •  Tags:  
  • Related