I have a pandas dataframe that I read from google sheet.
I then added the tag
column using:
df['tag'] = df.filter(like = 'Subject', axis = 1).apply(lambda x: np.where(x == 'Y', x.name,'')).values.tolist()
df['tag'] = df['tag'].apply(lambda x: [i for i in x if i!= ''])
Resultant sample DataFrame:
Id Name Subject-A Subject-B Total tag
0 1 A Y 100 [Subject-A]
1 2 B Y 98 [Subject-B]
2 3 C Y Y 191 [Subject-A, Subject-B]
3 4 D Y 100 [Subject-B]
4 5 E Y 95 [Subject-B]
Then I export the dataframe to a MySQL Database after converting the tag
column into a comma separated string by:
df['tag'] = df['tag'].map(lambda x : ', '.join(str(i) for i in x)).str.replace('Subject-','')
df
Id Name Subject-A Subject-B Total tag
0 1 A Y 100 A
1 2 B Y 98 B
2 3 C Y Y 91 A, B
3 4 D Y 100 B
4 5 E Y 95 B
df.to_sql(name = 'table_name', con = conn, if_exists = 'replace', index = False)
But in the MySQL database the tag
columns is:
A,
,B
A,B
,B
,B
My actual data has many such "Subject" columns so the result looks like:
, , , D
A, ,C,
...
...
Could someone please let me know why it's giving expected out in Pandas but when I save the dataframe in cloud SQL, the column looks different. The expected output in MySQL database is same as how the tag
column is appearing in Pandas.
CodePudding user response:
Here is alternative solution, seems some data related problem.
First filter Subject
columns with remove Subject-
and then use DataFrame.dot
with columns names with separator, last strip separator from right side:
df1 = df.filter(like = 'Subject').rename(columns=lambda x: x.replace('Subject-',''))
print (df1)
A B
0 Y NaN
1 NaN Y
2 Y Y
3 NaN Y
4 NaN Y
df['tag'] = df1.eq('Y').dot(df1.columns ', ').str.rstrip(', ')
print (df)
Id Name Subject-A Subject-B Total tag
0 1 A Y NaN 100 A
1 2 B NaN Y 98 B
2 3 C Y Y 191 A, B
3 4 D NaN Y 100 B
4 5 E NaN Y 95 B