I want to extract specific information from this csv file.
I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.
I have tried to use
data = pandas.read_csv('Weather_2012.csv')
data.nsmallest(5, 'Visibility(km)')
but it returns several values for the same day. I don't know if I'm doing it correctly, since I need the list of days.
I also need the total number of foggy days. I have filtered all rows containing fog:
data.loc[data['Weather'].str.contains('Fog')]['Weather']
but I don't know how to count the number of days, I can only count the number of rows.
CodePudding user response:
You're looking for DataFrame.resample
. Based on a specific column, it will group the rows of the dataframe by a specific time interval.
First you need to do this, if you haven't already:
data['Date/Time'] = pd.to_datetime(data['Date/Time'])
Get the lowest 5 days of visibility:
>>> df.resample(rule='D', on='Date/Time')['Visibility (km)'].mean().nsmallest(5)
Date/Time
2012-03-01 2.791667
2012-03-14 5.350000
2012-12-27 6.104167
2012-01-17 6.433333
2012-02-01 6.795833
Name: Visibility (km), dtype: float64
Basically what that does is this:
- Groups all the rows by day
- Converts each group to the average value of all the
Visibility (km)
items for that day - Returns the 5 smallest
Count the number of foggy days
>>> df.resample(rule='D', on='Date/Time').apply(lambda x: x['Weather'].str.contains('Fog').any()).sum()
78
Basically what that does is this:
- Groups all the rows by day
- For each day, adds a
True
if any row inside that day contains'Fog'
in theWeather
column,False
otherwise - Counts how many
True
's there were, and thus the number of foggy days.
CodePudding user response:
This will get you an array of all unique foggy days. you can use the shape method to get its dimension
df[df["Weather"].apply(lambda x : "Fog" in x)]["Date/Time"].unique()
CodePudding user response:
I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.
Since your Date/Time
column represents a particular hour, you'll need to do some grouping to get the minimum visibility for a particular day. The following will find the 5 least-visible days.
# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date
# Group on the new "Date" column and get the minimum values of
# each column for each group.
>>> min_by_day = data.groupby("Date").min()
# Now we can use nsmallest, since 1 row == 1 day in min_by_day.
# Since `nsmallest` returns a pandas.Series with "Date" as the index,
# we have to use `.index` to pull the date objects from the result.
>>> least_visible_days = min_by_day.nsmallest(5, "Visibility (km)").index
Then you can limit your original dataset to the least-visible days with
data[data["Date"].isin(least_visible_days)]
I also need the total number of foggy days.
We can use the extracted date in this case too:
# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date
# Filter on hours which have foggy weather
>>> foggy = data[data["Weather"].str.contains("Fog")]
# Count number of unique days
>>> len(foggy["Date"].unique())