Home > Software engineering >  Merging findall out multiple outputs into datframe
Merging findall out multiple outputs into datframe

Time:07-23

My code is intended to read from multiple files (2 examples below) and match digits on multiple lines of each file, and then combine all matches and filenames where found into a dataframe. However, my first issue is that multiple findall outputs are in multiple lines and I'm not sure how to append these lines properly - findall outputs are like:

65
45
78
etc

Two file examples are below:

F1:

trust 65
musca 
linca 75
trig 
torst 50

F2:

munk 65
liki 34
grub

I want my code to generate the following final dataframe:

Filename score
F1  65
F1  75
F1  50
F2  65
F2  34

My code attempt:

import os
import re
import pandas as pd

final={}
for f in *.txt:
    with open(f,"r") as In1:
        (filename,ext)=os.path.splitext(f)
        for line in In1:
            m=re.findall(r'\d ',line)
            if len(match) > 0:
                all=[]
                all.append(m)
                final[filename]=all

df=pd.DataFrame(final.items(),columns=['Filename','Score']

Can someone point me in the right direction please?

CodePudding user response:

You can try

df1 = pd.read_csv('file1', header=None)
df2 = pd.read_csv('file2', header=None)

df = (pd.concat([df1.assign(Filename='F1'),
                 df2.assign(Filename='F2')],
                ignore_index=True)
      .dropna(subset=1)
      .rename(columns={1: 'score'})
      .drop(columns=0))
print(df)

   score Filename
0   65.0       F1
2   75.0       F1
4   50.0       F1
5   65.0       F2
6   34.0       F2

CodePudding user response:

Here's a way to do what your question asks:

import pandas as pd
from io import StringIO

fileStrings = {
'F1': '''
trust 65
musca 
linca 75
trig 
torst 50
''',

'F2': '''
munk 65
liki 34
grub
'''
}

res = pd.concat([
    pd.DataFrame({
        'Filename': k,
        'score':pd.read_csv(StringIO(v), header=None, sep=' ').iloc[:,1].dropna()}) 
    for k, v in fileStrings.items()]).reset_index(drop=True)
print(res)

Output:

  Filename  score
0       F1   65.0
1       F1   75.0
2       F1   50.0
3       F2   65.0
4       F2   34.0

The above example uses the strings read from the two files detailed in the question. Changing the variable fileStrings to contain the names and string contents of any number of files will also work.

  • Related