how to concatenate one column to another column when there is specific value in column in pandas-CodePudding

My data frame contains two Column column_1 and column_2 I want to concatenate column_1 current row with column_2 prev value when there is null in column_2 in pandas dataframe.

|column_1  |column_2    |
|--------  |--------    |
|A:6400    | A:6400     |
|A:6406    | A:6406     |
|A:5607    | A:5607     |
|B:40AB4   |  null      |          
|A:5609    | A:5609     |
|B:5607    |   null     |
|B:5608    |   null     |

The resultant data frame should look like this.

  |column_1  |column_2         |
  |--------  |--------------   |
  |A:6400    | A:6400          |
  |A:6406    | A:6406          |
  |A:5607    | A:5607          |
  |B:40AB4   | A:5607B:40AB4   |          
  |A:5609    | A:5609          |
  |B:5607    |A:5609B:5607     |
  |B:5608    | A:5609B:5608    |

when I tried using loop I get following error. I tried several method but id did not work.
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().any help would be highly appreciated.

CodePudding user response：

IIUC, you can use a mask and ffill:

# locate the 'null' values
s = df['column2'].eq('null')
# concatenate
df['column2'] = df['column2'].mask(s).ffill() df['column1'].where(s, '')

output:

   column1        column2
0   A:6400         A:6400
1   A:6406         A:6406
2   A:5607         A:5607
3  B:40AB4  A:5607B:40AB4
4   A:5609         A:5609
5   B:5607   A:5609B:5607
6   B:5608   A:5609B:5608

CodePudding user response：

import pandas as pd

df = pd.DataFrame({
    'column_1': ['A:6400', 'A:6406', 'A:5607', 'B:40AB4', 'A:5609', 'B:5607', 'B:5608'],
    'column_2': ['A:6400', 'A:6406', 'A:5607', 'null', 'A:5609', 'null', 'null']
})

# Select missing values
sel = df['column_2'] == 'null'
# Convert "null" values to `None`
df.loc[sel, 'column_2'] = None
# Forward-fill null values
df['column_2'] = df['column_2'].ffill()
# Append string from "column_1" for selected values
df.loc[sel, 'column_2'] = df.loc[sel, 'column_2']   df.loc[sel, 'column_1']

print(df)

gives

      col1           col2
0   A:6400         A:6400
1   A:6406         A:6406
2   A:5607         A:5607
3  B:40AB4  A:5607B:40AB4
4   A:5609         A:5609
5   B:5607   A:5609B:5607
6   B:5608   A:5609B:5608

CodePudding user response：

Idea is create missing values instead nulls and then join columns with forward filling misisng values only for nulls rows:

#if null is not Nonetype or missing values
df['column_2'] = df['column_2'].replace({'null':np.nan})

df.loc[df['column_2'].isna(), 'column_2'] = df['column_2'].ffill()   df['column_1']
print (df)
  column_1      column_2
0   A:6400        A:6400
1   A:6406        A:6406
2   A:5607        A:5607
3  B:40AB4  A:5607A:5607
4   A:5609        A:5609
5   B:5607  A:5609A:5609
6   B:5608  A:5609A:5609

CodePudding user response：

Use indexing:

df.loc[df['column_2'] == 'null', 'column_2'] = df['column_2'].replace('null', np.nan).ffill()   df['column_1']
print(df)

# Output
  column_1       column_2
0   A:6400         A:6400
1   A:6406         A:6406
2   A:5607         A:5607
3  B:40AB4  A:5607B:40AB4
4   A:5609         A:5609
5   B:5607   A:5609B:5607
6   B:5608   A:5609B:5608

CodePudding user response：

Please try this:

df.loc[df['column_2']=='null',['column_2']] = df['column_2'].shift() df['column_1']

The above code gives NaN if first row of column_2 is 'null'. To add 'column_1' value there, add this line to above code:

df.loc[df['column_2'].isna()==True,['column_2']] = df['column_1']