Home > OS >  Python Pandas: Append column value, based on another same column value
Python Pandas: Append column value, based on another same column value

Time:12-01

I have a pandas dataframe like this.

   Time                      Source Level  County  Town
0  2021-12-01 10:01:41.443   NaN    NaN    NaN     NaN
1                      NaN   Test   3      C1      C1-T1
2                      NaN   Test   5-     C2      C2-T0
3                      NaN   Test   5-     C2      C2-T1
4  2021-12-01 10:01:46.452   NaN    NaN    NaN     NaN

I want to append Town value, which is based on row have the same Source, Level and County value.

I have tried isin, groupby, diff(but my value is str), but still not figure out.

Image below is what I want to get.

   Time                      Source Level  County  Town
0  2021-12-01 10:01:41.443   NaN    NaN    NaN     NaN
1                      NaN   Test   3      C1      C1-T0
2                      NaN   Test   5-     C2      C2-T0, C2-T1
3  2021-12-01 10:01:46.452   NaN    NaN    NaN     NaN

Really appreciate your help!

CodePudding user response:

The way we can make this work is by creating a list out of it using groupby() and apply(list), we can then transform this into a string separated by comma. Let's split it into 2 steps for better understanding.

Personally I would keep this data as a list within a pandas series and not do step 2. Formatting as string separated by comma might not be ideal to work with.

Step 1:

output = df.groupby(['Time','Source','Level','County'])['Town'].apply(list).reset_index()

Returns:

                      Time Source Level County            Town
0  2021-12-01 10:01:41.443    NaN   NaN    NaN           [nan]
1  2021-12-01 10:01:46.452    NaN   NaN    NaN           [nan]
2                      NaN   Test     3     C1         [C1-T1]
3                      NaN   Test    5-     C2  [C2-T0, C2-T1]

Now, we can format them correctly into strings (step 2):

output['Town'] = pd.Series([', '.join([y for y in x if type(y) == str]) for x in output['Town']]).replace('',np.nan)

Which outputs our desired result:

                      Time Source Level County          Town
0  2021-12-01 10:01:41.443    NaN   NaN    NaN           NaN
1  2021-12-01 10:01:46.452    NaN   NaN    NaN           NaN
2                      NaN   Test     3     C1         C1-T1
3                      NaN   Test    5-     C2  C2-T0, C2-T1
  • Related