I have a data frame where in the first column I have to concatenate the other two if this record is empty.
Cuenta CeCo GLAccount CeCoCeBe
123 A 123 A
234 S 234 S
NaN 345 B
NaN 987 A
for x in df1["Cuenta CeCo"].isna():
if x:
df1["Cuenta CeCo"]=df1["GLAccount"].apply(str) " " df1["CeCoCeBe"]
else :
df1["Cuenta CeCo"]
TYPES:
df1["Cuenta CeCo"] = dtype('O')
df1["GLAccount"] = dtype('float64')
df1["CeCoCeBe"] = dtype('O')
expected output:
Cuenta CeCo GLAccount CeCoCeBe
123 A 123 A
234 S 234 S
345 B 345 B
987 A 987 A
however it seems that when concatenating it does something strange and throws me other numbers and letters
Cuenta CeCo
251 O
471 B
791 R
341 O
Could someone support me to know why this happens and how to correct it to have my expected exit?
CodePudding user response:
Iterating over dataframes is typically bad practice and not what you intend. As you have done it, you are actually iterating over the columns. Try
for x in df:
print(x)
and you will see it print the column headings.
As for what you're trying to do, try this:
cols = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
mask = df[cols[0]].isna()
df.loc[mask, cols[0]] = df.loc[mask, cols[1]].map(str) " " df.loc[mask, cols[2]]
This generates a mask (in this case a series of True and False) that we use to get a series of just the NaN rows, then replace them by getting the string of the second column and concatenating with the third, using the mask again to get only the rows we need.
CodePudding user response:
import pandas as pd
import numpy as np
df = pd.DataFrame([
['123 A', 123, 'A'],
['234 S', 234, 'S'],
[np.NaN, 345, 'B'],
[np.NaN, 987, 'A']
], columns = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
)
def f(r):
if pd.notna(r['Cuenta CeCo']):
return r['Cuenta CeCo']
else:
return f"{r['GLAccount']} {r['CeCoCeBe']}"
df['Cuenta CeCo'] = df.apply(f, axis=1)
df
prints
index | Cuenta CeCo | GLAccount | CeCoCeBe |
---|---|---|---|
0 | 123 A | 123 | A |
1 | 234 S | 234 | S |
2 | 345 B | 345 | B |
3 | 987 A | 987 | A |