Home > front end >  Assign Values from a Header within Dataframe to All Rows in Dataframe by Index
Assign Values from a Header within Dataframe to All Rows in Dataframe by Index

Time:11-18

I have a large dataset I am working through with Pandas that is pulled from Excel.

The data contains a header for each section and the relevant data underneath for each unit, i.e.

<index> | Date/Time|   UnitNo. | Reading
   0    |    NaN   |  UnitBc36 |   NaN
   1    | 1/1/2021 |    NaN    |   100
   2    | 1/1/2021 |    NaN    |   300
  ...
 260253 |    NaN   |  UnitAN95 |   NaN
 272644 | 3/1/2021 |    NaN    |   150  

I have been able to locate all indexes of the unit headers as below by identifying where the UnitNo. columns is not NaN:

Loc = df[df["UnitNo."].notnull()]
In: Loc.index 
Out: ([   0,  12393,  24786,  37179,  49572,  61965,  74358,  86751,
          99144, 111537, 123930, 136323, 148716, 161109, 173502, 185895,
          198288, 210681, 223074, 235467, 247860, 260253],
          dtype='int64')

I also have a dataframe that matches each index to the relevant header value.

I want to use this information to attach the Unit number as a new column to each row of data, so that later I can groupby this information and analyse each unit.

For example, all rows between index 0 and 12393 (Loc.index[0] and Loc.index[1] above), should have UnitBc36 as a new column. All units between 260253 and the final line (Loc.index[21] above) should have UnitAN95 in the new column. See below:

<index> | Date/Time|   UnitNo. | Reading | UnitNo._new_col
   0    |    NaN   |  UnitBc36 |   NaN   |     UnitBc36
   1    | 1/1/2021 |    NaN    |   100   |     UnitBc36
   2    | 1/1/2021 |    NaN    |   300   |     UnitBc36
  ...
 260253 |    NaN   |  UnitAN95 |   NaN   |     UnitAN95
 272644 | 3/1/2021 |    NaN    |   150   |     UnitAN95

Could anyone please provide assistance on the best way to approach the above?

I currently am manually doing this by using a large number of condition statements but this is not scalable for different data sizes with different numbers of units.

Any help would be much appreciated and happy to clarify further if required.

Thanks, Jack

CodePudding user response:

Use ffill (or fillna(method='ffill')):

df['UnitNo._new_col'] = df['UnitNo.'].ffill()
print(df)

# Output:

       Date/Time   UnitNo.  Reading UnitNo._new_col
0            NaN  UnitBc36      NaN        UnitBc36
1       1/1/2021       NaN    100.0        UnitBc36
2       1/1/2021       NaN    300.0        UnitBc36
260253       NaN  UnitAN95      NaN        UnitAN95
272644  3/1/2021       NaN    150.0        UnitAN95

To go further, maybe you want:

out = df.assign(**{'UnitNo.': df['UnitNo.'].ffill()}) \
        .loc[lambda x: x['Date/Time'].notna()]
print(out)

# Output:

       Date/Time   UnitNo.  Reading
1       1/1/2021  UnitBc36    100.0
2       1/1/2021  UnitBc36    300.0
272644  3/1/2021  UnitAN95    150.0
  • Related