My data frame contains two Column column_1 and column_2 I want to concatenate column_1 current row with column_2 prev value when there is null in column_2 in pandas dataframe.
|column_1 |column_2 |
|-------- |-------- |
|A:6400 | A:6400 |
|A:6406 | A:6406 |
|A:5607 | A:5607 |
|B:40AB4 | null |
|A:5609 | A:5609 |
|B:5607 | null |
|B:5608 | null |
The resultant data frame should look like this.
|column_1 |column_2 |
|-------- |-------------- |
|A:6400 | A:6400 |
|A:6406 | A:6406 |
|A:5607 | A:5607 |
|B:40AB4 | A:5607B:40AB4 |
|A:5609 | A:5609 |
|B:5607 |A:5609B:5607 |
|B:5608 | A:5609B:5608 |
when I tried using loop I get following error. I tried several method but id did not work.
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().any help would be highly appreciated.
CodePudding user response:
IIUC, you can use a mask and ffill
:
# locate the 'null' values
s = df['column2'].eq('null')
# concatenate
df['column2'] = df['column2'].mask(s).ffill() df['column1'].where(s, '')
output:
column1 column2
0 A:6400 A:6400
1 A:6406 A:6406
2 A:5607 A:5607
3 B:40AB4 A:5607B:40AB4
4 A:5609 A:5609
5 B:5607 A:5609B:5607
6 B:5608 A:5609B:5608
CodePudding user response:
import pandas as pd
df = pd.DataFrame({
'column_1': ['A:6400', 'A:6406', 'A:5607', 'B:40AB4', 'A:5609', 'B:5607', 'B:5608'],
'column_2': ['A:6400', 'A:6406', 'A:5607', 'null', 'A:5609', 'null', 'null']
})
# Select missing values
sel = df['column_2'] == 'null'
# Convert "null" values to `None`
df.loc[sel, 'column_2'] = None
# Forward-fill null values
df['column_2'] = df['column_2'].ffill()
# Append string from "column_1" for selected values
df.loc[sel, 'column_2'] = df.loc[sel, 'column_2'] df.loc[sel, 'column_1']
print(df)
gives
col1 col2
0 A:6400 A:6400
1 A:6406 A:6406
2 A:5607 A:5607
3 B:40AB4 A:5607B:40AB4
4 A:5609 A:5609
5 B:5607 A:5609B:5607
6 B:5608 A:5609B:5608
CodePudding user response:
Idea is create missing values instead null
s and then join columns with forward filling misisng values only for null
s rows:
#if null is not Nonetype or missing values
df['column_2'] = df['column_2'].replace({'null':np.nan})
df.loc[df['column_2'].isna(), 'column_2'] = df['column_2'].ffill() df['column_1']
print (df)
column_1 column_2
0 A:6400 A:6400
1 A:6406 A:6406
2 A:5607 A:5607
3 B:40AB4 A:5607A:5607
4 A:5609 A:5609
5 B:5607 A:5609A:5609
6 B:5608 A:5609A:5609
CodePudding user response:
Use indexing:
df.loc[df['column_2'] == 'null', 'column_2'] = df['column_2'].replace('null', np.nan).ffill() df['column_1']
print(df)
# Output
column_1 column_2
0 A:6400 A:6400
1 A:6406 A:6406
2 A:5607 A:5607
3 B:40AB4 A:5607B:40AB4
4 A:5609 A:5609
5 B:5607 A:5609B:5607
6 B:5608 A:5609B:5608
CodePudding user response:
Please try this:
df.loc[df['column_2']=='null',['column_2']] = df['column_2'].shift() df['column_1']
The above code gives NaN
if first row of column_2 is 'null'. To add 'column_1' value there, add this line to above code:
df.loc[df['column_2'].isna()==True,['column_2']] = df['column_1']