I've created a Dataframe within Python from some webpages I've scraped. When I imported the CSV file in my Google Sheets I encountered an issue: all data cells are mixed and matched. Sometimes a row would contain information like this: col1, col4, col2, col8, etc.
Question: Is there a way to prevent columns from being randomized in the CSV file? If not, is there an easy way to organize columns again in Google Sheets?
My current code [left out most of the unnecessary lines] -->
from bs4 import BeautifulSoup
import requests
import pandas as pd
random_information = {
col1,
col2,
col3,
col4,
col5,
col6,
col7,
col8
}
randomDB.append(random_information)
print(random_information)
df = pd.DataFrame(randomDB)
print(df)
df.to_csv('random.csv')
Any suggestions are welcome! :-)
CodePudding user response:
Yes, you can pass a dict as an argument instead. Each key would be a column of your dataframe.
pd.DataFrame({"col1": [1, 2], "col2": [1 , 3]})
The difference with your code is that you create a dataframe from a list. The code above uses a dictionary.
CodePudding user response:
There is a dedicated function provided by panda DataFrame(s): DataFrame.sort_values
here is working example:
import pandas as pd
series = {}
for x in range(100):
arr1 = []
for i in range(10):
arr1.append(random())
series[x] = arr1
df = pd.DataFrame(series)
df.sort_values(by=1, inplace=True)
print(df)
df.sort_values(by=2, inplace=True)
print(df)
output1:
0 1 2 ... 97 98 99
3 0.816715 0.008932 0.950971 ... 0.919954 0.407322 0.682435
5 0.455805 0.075427 0.502535 ... 0.686747 0.504749 0.217507
4 0.310290 0.151038 0.061864 ... 0.077576 0.783444 0.784403
output2:
0 1 2 ... 97 98 99
8 0.677506 0.438093 0.032239 ... 0.055174 0.242884 0.794950
4 0.310290 0.151038 0.061864 ... 0.077576 0.783444 0.784403
6 0.006972 0.604672 0.251232 ... 0.496487 0.674959 0.308529