Home > Blockchain >  Erroneous column concatenation Python
Erroneous column concatenation Python

Time:08-13

I have a data frame where in the first column I have to concatenate the other two if this record is empty.

 Cuenta CeCo   GLAccount   CeCoCeBe
  123 A           123         A
  234 S           234         S
  NaN             345         B
  NaN             987         A


for x in df1["Cuenta CeCo"].isna():
if x:
    df1["Cuenta CeCo"]=df1["GLAccount"].apply(str) " " df1["CeCoCeBe"]
else :
    df1["Cuenta CeCo"]

TYPES:

df1["Cuenta CeCo"] = dtype('O')
df1["GLAccount"] = dtype('float64')
df1["CeCoCeBe"] = dtype('O')

expected output:

Cuenta CeCo   GLAccount   CeCoCeBe
  123 A           123         A
  234 S           234         S
  345 B           345         B
  987 A           987         A

however it seems that when concatenating it does something strange and throws me other numbers and letters

 Cuenta CeCo   
  251 O
  471 B
  791 R
  341 O

Could someone support me to know why this happens and how to correct it to have my expected exit?

CodePudding user response:

Iterating over dataframes is typically bad practice and not what you intend. As you have done it, you are actually iterating over the columns. Try

for x in df:
    print(x)

and you will see it print the column headings.

As for what you're trying to do, try this:

cols = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
mask = df[cols[0]].isna()
df.loc[mask, cols[0]] = df.loc[mask, cols[1]].map(str)   " "   df.loc[mask, cols[2]]

This generates a mask (in this case a series of True and False) that we use to get a series of just the NaN rows, then replace them by getting the string of the second column and concatenating with the third, using the mask again to get only the rows we need.

CodePudding user response:

import pandas as pd
import numpy as np

df = pd.DataFrame([
        ['123 A', 123, 'A'],
        ['234 S', 234, 'S'],
        [np.NaN, 345, 'B'],
        [np.NaN, 987, 'A']
    ], columns = ['Cuenta CeCo', 'GLAccount', 'CeCoCeBe']
)

def f(r):
    if pd.notna(r['Cuenta CeCo']):
        return r['Cuenta CeCo']
    else:
        return f"{r['GLAccount']} {r['CeCoCeBe']}"

df['Cuenta CeCo'] = df.apply(f, axis=1)
df

prints

index Cuenta CeCo GLAccount CeCoCeBe
0 123 A 123 A
1 234 S 234 S
2 345 B 345 B
3 987 A 987 A
  • Related