How to split the textfile-CodePudding

04-05-1993:1.068

04-12-1993:1.079

04-19-1993:1.079

06-06-1994:1.065

06-13-1994:1.073

06-20-1994:1.079

I have text file for date-year-price for gas and i want to calculate the avg gas prices for year. So i tried to split,

with open('c:/Gasprices.txt','r') as f: 
   fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)

But I can't get year and price data but data like this.

('04', '1.068'), ('04', '1.079')

please let me know what should i know.

and plus, please let me know how to use split data to calculate the avg price per year using a dictionary if you can.

CodePudding user response：

As it was already mentioned, to get the year you should use a bit more complex split. But your format seems to be very consistent, you could probably go for:

datesprices=[(x[6:10], x[11:]) for x in fullfile]

but how to get average of it? You need to store list for specific year somewhere.

from statistics import mean

my_dict = {} # could be defaultdict too
for year, price in datesprices:
    if year not in my_dict:
        my_dict[year] = []
    my_dict[year].append(price)

for year, prices in my_dict.items():
    print(year, mean(prices))

CodePudding user response：

TRY THIS


with open('c:/Gasprices.txt','r') as f: 
    fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0],x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)

OUTPUT

[('04', '1993', '1.068'), ('04', '1993', '1.079'), ('04', '1993', '1.079'), ('06', '1994', '1.065'), ('06', '1994', '1.073'), ('06', '1994', '1.079')]

with open('c:/Gasprices.txt','r') as f: 
    fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)

OUTPUT

[('1993', '1.068'), ('1993', '1.079'), ('1993', '1.079'), ('1994', '1.065'), ('1994', '1.073'), ('1994', '1.079')]

CodePudding user response：

    txt = ['04-05-1993:1.068', '04-12-1993:1.079', '04-19-1993:1.079', '06-06-1994:1.065', '06-13-1994:1.073', '06-20-1994:1.079']

    price_per_year = {}
    number_of_years = {}
    for i in txt:
      x = txt.split(':')
      Date = x[0]
      Price = x[1]
      year = date.split('-')[2]

      if year ~in price_per_year.keys:
        price_per_year.update({year:Price})
        number_of_years.update({year:1})
      else:
        price_per_year[year]  = Price
        number_of_years[year]  = 1
 
av_price_1993 = price_per_year[1993] / number_of_years[1993]
av_price_1994
 = price_per_year[1994] / number_of_years[1994]

CodePudding user response：

I see no need to split the input lines as they have a fixed format for the date - i.e., its length is known. Therefore we can just slice.

with open('gas.txt') as gas:
    td = dict()
    for line in gas:
        year = line[6:10]
        price = float(line[11:])
        td.setdefault(year, []).append(price)
    for k, v in td.items():
        print(f'{k} {sum(v)/len(v):.3f}')

Output:

1993 1.075
1994 1.072

Note:

There is no check here for blank lines. It is assumed that there are none and that the sample shown in the question is malformed.

Also, no need to strip the incoming lines as float() is impervious to leading/trailing whitespace