Home > Back-end >  Pandas dataframe: Creating a new column based on data from other columns
Pandas dataframe: Creating a new column based on data from other columns

Time:12-08

I have a pandas dataframe, df:

   foo         bar
0  Supplies   Sample X
1  xyz        A   
2  xyz        B
3  Supplies   Sample Y
4  xyz        C
5  Supplies   Sample Z
6  xyz        D
7  xyz        E
8  xyz        F

I want to create a new df that looks something like this:

   bar
0  Sample X - A
1  Sample X - B
2  Sample Y - C
3  Sample Z - D
4  Sample Z - E
5  Sample Z - F

I am new to Pandas so I don't know how to achieve this. Could someone please help?

I tried DataFrame.iterrows but no luck.

CodePudding user response:

You can use boolean indexing and ffill:

m = df['foo'].ne('Supplies')

out = (df['bar'].mask(m).ffill()[m]
       .add(' - ' df.loc[m, 'bar'])
       .to_frame().reset_index(drop=True)
       )

Output:

            bar
0  Sample X - A
1  Sample X - B
2  Sample Y - C
3  Sample Z - D
4  Sample Z - E
5  Sample Z - F

CodePudding user response:

You can do:

s = (df["bar"].mask(df.foo == "xyz").ffill()   "-"   df["bar"]).reindex(
    df.loc[df.foo == "xyz"].index
)

df = s.to_frame()

print(df):

           bar
1   Sample X-A
2   Sample X-B
4   Sample Y-C
6   Sample Z-D
7   Sample Z-E
8   Sample Z-F
  • Related