Home > Software design >  Convert Text File into Pandas Dataframe
Convert Text File into Pandas Dataframe

Time:11-03

I want to create a dataframe from a textfile. I scrapped some data from a website and wrote it into a .txt file. There are 10 'columns', as shown in the first 10 lines of the text file. Can anyone help me with seperating the lines into the respective columns in a pandas dataframe format? Much appreciated!

The following is an example of the text file. I would like the first 10 lines to be the column names and the subsequent lines to be under the respective columns.

NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago

CodePudding user response:

Assuming your text file is called foo.txt, first we can build a dictionary for your data, using:

foo = {}
with open('foo.txt') as f:
    head = [next(f).strip() for x in range(10)]
    for i in range(500):
        foo[i] = [next(f).strip() for x in range(10)]

Then simply create the dataframe using the from_dict method:

pd.DataFrame.from_dict(foo, columns=head, orient='index')

Giving you:

    NFT Collection  Volume (ETH)    Market Cap (ETH)    Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea Rarible   #Transactions   #Wallets    Contract date
0   Axies | Axie Infinity   4,884   480,695 5.24    .0563   .0024   0   86,807  2,389,981   189d ago
1   Sandbox's LANDs 578 112,989 6   1.11    .108    100%    394 12,879  144d ago

CodePudding user response:

Like this :

text = """NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago"""

text = text.split('\n')
text = [text[i:(i 10)] for i in range(0,len(text),10)]
df = pd.DataFrame(text[1:],columns=text[0])

CodePudding user response:

You can read your file and append the lines to a list of list. Starting a new sublist every ten lines:

txt_file = "path/to/your/file"

table = []
with open(txt_file, "r") as f:
    row = []
    for i, line in enumerate(f.readlines()):
        if i % 10 == 0:
            row = []
        row.append(line.strip())
        if i % 10 == 9:
            table.append(row)

df = pd.DataFrame(table)
print(df)

Output:

                       0             1                 2                3                4                5                  6              7          8              9
0         NFT Collection  Volume (ETH)  Market Cap (ETH)  Max price (ETH)  Avg price (ETH)  Min price (ETH)  % Opensea Rarible  #Transactions   #Wallets  Contract date
1  Axies | Axie Infinity         4,884           480,695             5.24            .0563            .0024                  0         86,807  2,389,981       189d ago
2        Sandbox's LANDs           578           112,989                6             1.11             .108               100%            394     12,879       700d ago

Edit: building on Serge's suggestion you can get rid of the row/column index with the following:

df = pd.DataFrame(table[1:], columns=table[0])
df.set_index(table[0][0], inplace=True)

You then get:

                      Volume (ETH) Market Cap (ETH) Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea Rarible #Transactions   #Wallets Contract date
NFT Collection
Axies | Axie Infinity        4,884          480,695            5.24           .0563           .0024                 0        86,807  2,389,981      189d ago
Sandbox's LANDs                578          112,989               6            1.11            .108              100%           394     12,879      700d ago
  • Related