I want to create a dataframe from a textfile. I scrapped some data from a website and wrote it into a .txt file. There are 10 'columns', as shown in the first 10 lines of the text file. Can anyone help me with seperating the lines into the respective columns in a pandas dataframe format? Much appreciated!
The following is an example of the text file. I would like the first 10 lines to be the column names and the subsequent lines to be under the respective columns.
NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago
CodePudding user response:
Assuming your text file is called foo.txt
, first we can build a dictionary for your data, using:
foo = {}
with open('foo.txt') as f:
head = [next(f).strip() for x in range(10)]
for i in range(500):
foo[i] = [next(f).strip() for x in range(10)]
Then simply create the dataframe using the from_dict
method:
pd.DataFrame.from_dict(foo, columns=head, orient='index')
Giving you:
NFT Collection Volume (ETH) Market Cap (ETH) Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea Rarible #Transactions #Wallets Contract date
0 Axies | Axie Infinity 4,884 480,695 5.24 .0563 .0024 0 86,807 2,389,981 189d ago
1 Sandbox's LANDs 578 112,989 6 1.11 .108 100% 394 12,879 144d ago
CodePudding user response:
Like this :
text = """NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago"""
text = text.split('\n')
text = [text[i:(i 10)] for i in range(0,len(text),10)]
df = pd.DataFrame(text[1:],columns=text[0])
CodePudding user response:
You can read your file and append the lines to a list of list. Starting a new sublist every ten lines:
txt_file = "path/to/your/file"
table = []
with open(txt_file, "r") as f:
row = []
for i, line in enumerate(f.readlines()):
if i % 10 == 0:
row = []
row.append(line.strip())
if i % 10 == 9:
table.append(row)
df = pd.DataFrame(table)
print(df)
Output:
0 1 2 3 4 5 6 7 8 9
0 NFT Collection Volume (ETH) Market Cap (ETH) Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea Rarible #Transactions #Wallets Contract date
1 Axies | Axie Infinity 4,884 480,695 5.24 .0563 .0024 0 86,807 2,389,981 189d ago
2 Sandbox's LANDs 578 112,989 6 1.11 .108 100% 394 12,879 700d ago
Edit: building on Serge's suggestion you can get rid of the row/column index with the following:
df = pd.DataFrame(table[1:], columns=table[0])
df.set_index(table[0][0], inplace=True)
You then get:
Volume (ETH) Market Cap (ETH) Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea Rarible #Transactions #Wallets Contract date
NFT Collection
Axies | Axie Infinity 4,884 480,695 5.24 .0563 .0024 0 86,807 2,389,981 189d ago
Sandbox's LANDs 578 112,989 6 1.11 .108 100% 394 12,879 700d ago