Home > Blockchain >  Check if column name of a pandas df starts with "name" and split that column based on exis
Check if column name of a pandas df starts with "name" and split that column based on exis

Time:01-26

Let's say I have a pandas dataframe that looks like this:

df = pd.read_json('{"id":{"0":"21 Delta","1":"38 Bravo","2":"Charlie 37","3":"Alpha 56"},"name_1":{"0":"Tom","1":"Nick","2":"Chris","3":"David 56"},"name_2":{"0":"Peter 17","1":"Emma 53","2":"Jeff 11","3":"Oscar"},"name_3":{"0":"Jeffrey","1":"Olivier 12","2":null,"3":null},"name_4":{"0":"Henry 23","1":null,"2":null,"3":null}}') 
df

    id           name_1      name_2     name_3       name_4
0   21 Delta     Tom         Peter 17   Jeffrey      Henry 23 
1   38 Bravo     Nick        Emma 53    Olivier 12   None
2   Charlie 37   Chris       Jeff 11    None         None
3   Alpha 56     David 56    Oscar      None         None

What I would like to do is to iterate over the columns in this df and check if the column name starts with name. If so, I would like to add the number after the white space in each row of that particular column in an extra column called age_ which increments by one like so:

    id          name_1   name_2    name_3     name_4    age_1  age_2  age_3  age_4
0   21 Delta    Tom      Peter 17  Jeffrey    Henry 23  None   17     None   23
1   38 Bravo    Nick     Emma 53   Olivier 12 None      None   53     12     None
2   Charlie 37  Chris    Jeff 11   None       None      None   11     None   None    
3   Alpha 56    David 56 Oscar     None       None      56     None   None   None

So far I came up with this, but I struggle how to get to the end result:

for column in df.columns:
    if column.startswith("name"):
        age = df[column].str.split(" ").str.get(1)

CodePudding user response:

ages = (df.filter(like="name")
          .apply(lambda col: col.str.extract(r" (\d )$", expand=False))
          .rename(columns=lambda c: c.replace("name", "age")))
  • get the "name" involving columns
  • for each of them, extract the numbers near end with a regex
  • column names are still "name_*", so replace "name" with "age" there

and lastly join with the original frame to get

>>> df.join(ages)

           id    name_1    name_2      name_3    name_4 age_1 age_2 age_3 age_4
0    21 Delta       Tom  Peter 17     Jeffrey  Henry 23   NaN    17   NaN    23
1    38 Bravo      Nick   Emma 53  Olivier 12      None   NaN    53    12  None
2  Charlie 37     Chris   Jeff 11        None      None   NaN    11  None  None
3    Alpha 56  David 56     Oscar        None      None    56   NaN  None  None

CodePudding user response:

Besides Mustafa Aydın's approach. Here is the fixed version of your for-loop

for column in df.columns:
    if column.startswith("name"):
        age = f"age_{column[-1]}"
        df[age] = df[column].str.extract(r"(\d )")
        
print(df)

           id    name_1    name_2      name_3    name_4 age_1 age_2 age_3 age_4
0    21 Delta       Tom  Peter 17     Jeffrey  Henry 23   NaN    17   NaN    23
1    38 Bravo      Nick   Emma 53  Olivier 12      None   NaN    53    12   NaN
2  Charlie 37     Chris   Jeff 11        None      None   NaN    11   NaN   NaN
3    Alpha 56  David 56     Oscar        None      None    56   NaN   NaN   NaN
  • Related