Home > Net >  extract string from line in a file
extract string from line in a file

Time:04-27

I have two files one of them contains lines like

 0    rho is 2313.22
 1    rho is 6456.01
 .....
 18811 rho is 2154.78
 18812 rho is 2279.565
 18813 rho is 1813.690
 18814 rho is 346.20664

the second file contains some of the numbers not arranged in a sequential manner like

18812
758
2623
12569
1392

i need to extract its rho values from file1. i tried to compare between two files and if it found numbers exist it should return the rho values but couldn't do this part

with open('file1', 'r') as file1:
    with open('file2', 'r') as file2:
        same = set(file1).intersection(file2)

same.discard('\n')

with open('results.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)

CodePudding user response:

This is how you can do it with pandas:

import pandas as pd

#load file1 as csv, split on whitespace, name columns and drop redundant text columns
df1 = pd.read_csv('file1.txt', sep='\s ', names=['id', 0, 1, 'value']).drop(columns=[0, 1])

#load file2 as csv, name column
df2 = pd.read_csv('file2.txt', names=['id'])

#merge dataframes, keep only values that exist in df2 and write output to csv file
df2.merge(df1, on='id').to_csv('output.csv', index=False)

CodePudding user response:

You could take a more "data engineering" approach, and open the 2 files as csv with pandas, then do a merge.

Sample code:

import pandas as pd

# read the first file as a csv file, considering "rho is" as the separator 
rho_map = pd.read_csv('file1', sep="rho is", 
                     header=None, names=['id', 'rho',])

# read the second file
data = pd.read_csv('file2', names=['id'])

# Then merge
results = data.merge(rho_map, on='id')

With a subset from your test data, you could have file1 with:

 18811 rho is 2154.78
 18812 rho is 2279.565
 18813 rho is 1813.690
 18814 rho is 346.20664

and file2 with

18812
758
2623
12569
1392

This will give as result :

    id  rho
0   18812   2279.565
  • Related