Home > OS >  how to automatically classify a list of numbers
how to automatically classify a list of numbers

Time:12-23

Well, the context is: I have a list of wind speeds, let's imagine, 100 wind measurements from 0 to 50 km/h, so I want to automate the creation of a list by uploading the csv, let's imagine, every 5 km/h, that is, the ones that they go from 0 to 5, what go from 5 to 10... etc.

Let's go to the code:

wind = pd.read_csv("wind.csv")
df = pd.DataFrame(wind)
x = df["Value"]
d = sorted(pd.Series(x))
lst = [[] for i in range(0,(int(x.max()) 1),5)]

this gives me a list of empty lists, i.e. if the winds go from 0 to 54 km/h will create 11 empty lists.

Now, to classify I did this:

for i in range(0,len(lst),1):
    for e in range(0,55,5):
       for n in d:
            if n>e and n< (e 5):
               lst[i].append(n)
            else:
                continue

My objective would be that when it reaches a number greater than 5, it jumps to the next level, that is, it adds 5 to the limits of the interval (e) and jumps to the next i to fill the second empty list in lst. I tried it in several ways because I imagine that the loops must go in a specific order to give a good result. This code is just an example of several that I tried, but they all gave me similar results, either all the lists were filled with all the numbers, or only the first list was filled with all the numbers

CodePudding user response:

Your title mentions classifying the numbers -- are you looking for a categorical output like calm | gentle breeze | strong breeze | moderate gale | etc.? If so, take a look at the second example on the pd.qcut docs.

Since you're already using pandas, use cut with an IntervalIndex (constructed with the pd.interval_range function) to get a Series of bins, and then groupby on that.

import pandas as pd
from math import ceil

BIN_WIDTH = 5

wind_velocity = (pd.read_csv("wind.csv")["Value"]).sort_values()

upper_bin_lim = BIN_WIDTH * ceil(wind_velocity.max() / BIN_WIDTH)
bins = pd.interval_range(
    start=0,
    end=upper_bin_lim,
    periods=upper_bin_lim//5,
    closed='left')
velocity_bins = pd.cut(wind_velocity, bins)
groups = wind_velocity.groupby(velocity_bins)

for name, group in groups:
    #TODO: use `groups` to do stuff
  • Related