I have three columns in my data frame:
CaseID | FirstName | LastName |
---|---|---|
1 | rohit | pandey |
2 | rai | |
3 |
In the output, I am trying to add the fourth column and have values as LastName,FirstName
I have this Python code
df_ids['ContactName'] = df_ids[['LastName', 'FirstName']].agg(lambda x: ','.join(x.values), axis=1)
But it appends the blank values also which something like below that I am able to get like below:
CaseID | FirstName | LastName | ContactName |
---|---|---|---|
1 | rohit | pandey | pandey, rohit |
2 | rai | , rai | |
3 | , |
The expected output:
CaseID | FirstName | LastName | ContactName |
---|---|---|---|
1 | rohit | pandey | pandey, rohit |
2 | rai | rai | |
3 |
CodePudding user response:
This is the easy way, using apply
. apply
takes each row one at a time and passes it to the given function.
import pandas as pd
data = [
[ 1, 'rohit', 'pandey' ],
[ 2, '', 'rai' ],
[ 3, '', '' ]
]
df = pd.DataFrame(data, columns=['CaseID', 'FirstName', 'LastName'] )
def fixup( row ):
if not row['LastName']:
return ''
if not row['FirstName']:
return row['LastName']
return row['LastName'] ', ' row['FirstName']
print(df)
df['Contact1'] = df.apply(fixup, axis=1)
print(df)
Output:
CaseID FirstName LastName
0 1 rohit pandey
1 2 rai
2 3
CaseID FirstName LastName Contact1
0 1 rohit pandey pandey, rohit
1 2 rai rai
2 3
CodePudding user response:
Two (actually 1 and a half) other options, which are very close to your attempt:
df_ids['ContactName'] = (
df_ids[['LastName', 'FirstName']]
.agg(lambda row: ', '.join(name for name in row if name), axis=1)
)
or
df_ids['ContactName'] = (
df_ids[['LastName', 'FirstName']]
.agg(lambda row: ', '.join(filter(None, row)), axis=1)
)
In both version the ''
s are filtered out:
- Via a generator expression: The
if name
makes sure that''
isn't allowed, because its truth value isFalse
- tryprint(bool(''))
. - By the built-in function
filter()
with the first argument set toNone
.
CodePudding user response:
Someone has added PySpark tag. This is PySpark version:
from pyspark.sql import functions as F
df_ids = df_ids.replace('', None) # Replaces empty strings with nulls
df_ids = df_ids.withColumn('ContactName', F.concat_ws(', ', 'LastName', 'FirstName'))
df_ids = df_ids.fillna('') # Replaces nulls back to empty strings
df_ids.show()
# ------ --------- -------- -------------
# |CaseID|FirstName|LastName| ContactName|
# ------ --------- -------- -------------
# | 1| rohit| pandey|pandey, rohit|
# | 2| | rai| rai|
# | 3| | | |
# ------ --------- -------- -------------