How do I insert rows in a dataframe based on sequence?-CodePudding

So I have this dataframe. The columns 'Time' follows a sequence where there are all weeks that fall between the first and the last rows populated in the column. But one of the values i.e. W43-2021 is missing.

Given DF

How do I insert a new row corresponding to W43-2021 with all the other column values be either same as others or None and 'Values' as 0. Basically like the below dataframe.

New DF

I have tried tried below approach to solve this problem.

f = int(df['Time.[Week]'][0][1:3])
l = int(df['Time.[Week]'].iloc[-1][1:3])
check = list(df['Time.[Week]'].str[1:3])
check = list(map(int, check))
c = []
for i in range(f, l 1):
    if i not in check:
        c.append(i)
for week in c:
    temp_df = pd.DataFrame(columns = df.columns)

    temp_df.loc[0, 'Time.[Week]'] = 'W'   str(week)   '-2021'

df.append(temp_df)

This doesn't seem to be the most appropriate way of dealing with the issue as order is getting broken due to appending dataframe at the end and there could be multiple missing rows which could problem later. What could be a better and more pythonic way of handling this case?

Also find the code to initiate the df if required.

df = pd.DataFrame([[33534,9132,'Current','W41-2021',34],
              [33534,9132,'Current','W42-2021', 45],
              [33534,9132,'Current','W44-2021', 32],
              [33534,9132,'Current','W45-2021', 41],
              [33534,9132,'Current','W46-2021',49]], columns = ['Item', 'Location', 'Version', 'Time', 'Value'])

CodePudding user response：

You can use the following:

# build missing values
weeks = df['Time'].str.extract('(?<=W)(\d )').astype(int)
idx = [f'W{w}-2021' for w in range(weeks.min()[0], weeks.max()[0] 1)]

# add missing values
df2 = df.set_index('Time').reindex(idx).reset_index()

# ffill, except 'Value'
df2 = df2.combine_first(df2.drop(columns='Value').ffill())[df.columns]

output:

>>> df2
      Item  Location  Version      Time  Value
0  33534.0    9132.0  Current  W41-2021   34.0
1  33534.0    9132.0  Current  W42-2021   45.0
2  33534.0    9132.0  Current  W43-2021    NaN
3  33534.0    9132.0  Current  W44-2021   32.0
4  33534.0    9132.0  Current  W45-2021   41.0
5  33534.0    9132.0  Current  W46-2021   49.0

CodePudding user response：

Use -

df['week'] = df['Time'].str[1:3].astype(int)
df2 = pd.DataFrame([ i for i in range(week.min(), week.max())], columns=['week']).merge(df, how='outer')
fill_val = df['Time'].iloc[0]
df2['Time'] = df2['Time'].fillna(fill_val[0] df2['week'].astype(str) fill_val[3:])
df2 = df2.fillna(0)

Output

   week      Time  Value
0    41  W41-2021   34.0
1    42  W42-2021   45.0
2    43  W43-2021    0.0
3    44  W44-2021   32.0
4    45  W45-2021   41.0
5    46  W46-2021   49.0

CodePudding user response：

You can do like that:

Code:

df = pd.DataFrame([[33534,9132,'Current','W41-2021',34],
              [33534,9132,'Current','W42-2021', 45],
              [33534,9132,'Current','W44-2021', 32],
              [33534,9132,'Current','W45-2021', 41],
              [33534,9132,'Current','W46-2021',49]], columns = ['Item', 'Location', 'Version', 'Time', 'Value'])

new_df = df.copy(deep=True)
new_df = new_df.append(pd.DataFrame([[33534,9132,'Current','W43-2021',0]],columns=['Item', 'Location', 'Version', 'Time', 'Value']))
new_df = new_df.sort_values("Time",ascending=True)
print(new_df)

Because the time is ordered you can just append the new value and then sort the values according to the time.

Output:

    Item  Location  Version      Time  Value
0  33534      9132  Current  W41-2021     34
1  33534      9132  Current  W42-2021     45
0  33534      9132  Current  W43-2021      0
2  33534      9132  Current  W44-2021     32
3  33534      9132  Current  W45-2021     41
4  33534      9132  Current  W46-2021     49