Home > front end >  how to concatenate one column to another column when there is specific value in column in pandas
how to concatenate one column to another column when there is specific value in column in pandas

Time:03-16

My data frame contains two Column column_1 and column_2 I want to concatenate column_1 current row with column_2 prev value when there is null in column_2 in pandas dataframe.

|column_1  |column_2    |
|--------  |--------    |
|A:6400    | A:6400     |
|A:6406    | A:6406     |
|A:5607    | A:5607     |
|B:40AB4   |  null      |          
|A:5609    | A:5609     |
|B:5607    |   null     |
|B:5608    |   null     |

The resultant data frame should look like this.

  |column_1  |column_2         |
  |--------  |--------------   |
  |A:6400    | A:6400          |
  |A:6406    | A:6406          |
  |A:5607    | A:5607          |
  |B:40AB4   | A:5607B:40AB4   |          
  |A:5609    | A:5609          |
  |B:5607    |A:5609B:5607     |
  |B:5608    | A:5609B:5608    |

when I tried using loop I get following error. I tried several method but id did not work.
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().any help would be highly appreciated.

CodePudding user response:

IIUC, you can use a mask and ffill:

# locate the 'null' values
s = df['column2'].eq('null')
# concatenate
df['column2'] = df['column2'].mask(s).ffill() df['column1'].where(s, '')

output:

   column1        column2
0   A:6400         A:6400
1   A:6406         A:6406
2   A:5607         A:5607
3  B:40AB4  A:5607B:40AB4
4   A:5609         A:5609
5   B:5607   A:5609B:5607
6   B:5608   A:5609B:5608

CodePudding user response:

import pandas as pd

df = pd.DataFrame({
    'column_1': ['A:6400', 'A:6406', 'A:5607', 'B:40AB4', 'A:5609', 'B:5607', 'B:5608'],
    'column_2': ['A:6400', 'A:6406', 'A:5607', 'null', 'A:5609', 'null', 'null']
})

# Select missing values
sel = df['column_2'] == 'null'
# Convert "null" values to `None`
df.loc[sel, 'column_2'] = None
# Forward-fill null values
df['column_2'] = df['column_2'].ffill()
# Append string from "column_1" for selected values
df.loc[sel, 'column_2'] = df.loc[sel, 'column_2']   df.loc[sel, 'column_1']

print(df)

gives

      col1           col2
0   A:6400         A:6400
1   A:6406         A:6406
2   A:5607         A:5607
3  B:40AB4  A:5607B:40AB4
4   A:5609         A:5609
5   B:5607   A:5609B:5607
6   B:5608   A:5609B:5608

CodePudding user response:

Idea is create missing values instead nulls and then join columns with forward filling misisng values only for nulls rows:

#if null is not Nonetype or missing values
df['column_2'] = df['column_2'].replace({'null':np.nan})

df.loc[df['column_2'].isna(), 'column_2'] = df['column_2'].ffill()   df['column_1']
print (df)
  column_1      column_2
0   A:6400        A:6400
1   A:6406        A:6406
2   A:5607        A:5607
3  B:40AB4  A:5607A:5607
4   A:5609        A:5609
5   B:5607  A:5609A:5609
6   B:5608  A:5609A:5609

CodePudding user response:

Use indexing:

df.loc[df['column_2'] == 'null', 'column_2'] = df['column_2'].replace('null', np.nan).ffill()   df['column_1']
print(df)

# Output
  column_1       column_2
0   A:6400         A:6400
1   A:6406         A:6406
2   A:5607         A:5607
3  B:40AB4  A:5607B:40AB4
4   A:5609         A:5609
5   B:5607   A:5609B:5607
6   B:5608   A:5609B:5608

CodePudding user response:

Please try this:

df.loc[df['column_2']=='null',['column_2']] = df['column_2'].shift() df['column_1']

The above code gives NaN if first row of column_2 is 'null'. To add 'column_1' value there, add this line to above code:

df.loc[df['column_2'].isna()==True,['column_2']] = df['column_1']
  • Related