I am working on python and new to it.I have a dataframe as
Date Emailable Lost_Fans New_Fans Country
12-10-2020 121134 JP
06-11-2020 120859 350 75 JP
18-12-2020 101857 19128 126 JP
29-01-2021 105760 5029 8932 JP
16-02-2021 115437 1279 10956 JP
01-03-2021 115512 517 592 JP
12-07-2021 137546 18074 40108 JP
11-08-2021 134319 23753 20526 JP
02-09-2021 130156 4355 192 JP
- First,I want to find the missing month from the dataframe
- to fill the missing month data for lost and new split the last month data value of lost and new into these missing month lost and new column. For example: the april, may june is missing month from the above dataframe and the next data available is for july which contains 18074 in lost and 40108 in new. we need to spilt these values into the missing month(april, may, june) including july.so for lost 18074/4 =4518 to each of the missing month including july. so the final output will be like.
output should be:
Date Emailable Lost_Fans New_Fans Country
12-10-2020 121134 JP
06-11-2020 120859 350 75 JP
18-12-2020 101857 19128 126 JP
29-01-2021 105760 5029 8932 JP
16-02-2021 115437 1279 10956 JP
01-03-2021 115512 517 592 JP
30-04-2021 0 4518 10027 JP
31-05-2021 0 4518 10027 JP
30-06-2021 0 4518 10027 JP
12-07-2021 137546 4518 10027 JP
11-08-2021 134319 23753 20526 JP
02-09-2021 130156 4355 192 JP
CodePudding user response:
To find the missing months, use the following code:
months = [] miss_months = [] for i in range(len(df.Date)): if df.Date[i].split('-')[1][0] != '0': months.append(df.Date[i].split('-')[1]) else: months.append(df.Date[i].split('-')[1][1]) for j in range(1,13): if str(j) in months: pass else: miss_months.append(j) print(miss_months)
Here while extracting the month from the dataframe, I noticed that all the retrieved months were 2 character strings. For example: '07','08','11',etc. So I wrote some code to check if the first character of the retrieved is 0 or not.
To get the final output, use the following code:
def Insert_row(row_number, df, row_value): start_upper = 0 end_upper = row_number start_lower = row_number end_lower = df.shape[0] upper_half = [*range(start_upper, end_upper, 1)] lower_half = [*range(start_lower, end_lower, 1)] lower_half = [x.__add__(1) for x in lower_half] index_ = upper_half lower_half df.index = index_ df.loc[row_number] = row_value df = df.sort_index() return df row_number = [6,7,8] the_date = ['30-04-2021','31-05-2021','30-06-2021'] for k in range(len(row_number)): row_value = [the_date[k],0,18074//4,10027,'JP'] df = Insert_row(row_number[k], df, row_value) print(df)
Here, I have created a function Insert_row() to make the row adding easier.
CodePudding user response:
The idea of the code is first to create rows of NaNs for the missing months. Then to fill in the rows by using the following value (bfill). And then divide the result either by 1 if there was no filling involved, or by the number of consecutive NaNs 1:
# If Date is a string, convert to datetime:
df.Date = pd.to_datetime(df.Date, format="%d-%m-%Y")
# Drop the days, keep months only
df.Date = df.Date.dt.to_period("M")
# make it the index and add rows for missing months
df.set_index("Date", inplace=True)
df = df.reindex(pd.period_range(df.index[0], df.index[-1], freq='M'))
# group the consecutive NaNs
group_nans = df.Lost_Fans.isna().groupby(df.Lost_Fans.notna().cumsum())
# for each NaN group, broadcast their length & shift it one row lower
consecutive_nans = group_nans.transform("sum").shift(fill_value=0).astype(int)
divisor = consecutive_nans 1
# fill NaNs backwards, and divide by number of consecutive nans 1
df.Lost_Fans = df.Lost_Fans.fillna(method="bfill").astype(int) // divisor
df.New_Fans = df.New_Fans.fillna(method="bfill").astype(int) // divisor
df.Emailable = df.Emailable.fillna(0).astype(int)
df.Country = df.Country.fillna(method="bfill")
The resulting DataFrame:
Emailable Lost_Fans New_Fans Country
2020-10 121134 0 0 JP
2020-11 120859 350 75 JP
2020-12 101857 19128 126 JP
2021-01 105760 5029 8932 JP
2021-02 115437 1279 10956 JP
2021-03 115512 517 592 JP
2021-04 0 4518 10027 JP
2021-05 0 4518 10027 JP
2021-06 0 4518 10027 JP
2021-07 137546 4518 10027 JP
2021-08 134319 23753 20526 JP
2021-09 130156 4355 192 JP