How to split elements listed in one cell into separate cells in Pandas?-CodePudding

I was able to solve the problem described below, but as I am a newbie, I am not sure if my solution is good. I'd be grateful for any tips on how to do it in a more efficient and/or more elegant manner.

What I have:

...and so on (the table's quite big).

What I need:

How I solved it:

Load the file

df = pd.read_csv("survey_data_cleaned_ver2.csv")

Define a function

def transform_df(df, list_2, column_2, list_1, column_1='Respondent'): 

    for ind in df.index:
    
        elements = df[column_2][ind].split(';')
        num_of_elements = len(elements)

        for num in range(num_of_elements):
            list_1.append(df['Respondent'][ind])

        for el in elements:
            list_2.append(el)

Dropna because NaNs are floats and that was causing errors later on.

df_LanguageWorkedWith = df[['Respondent', 'LanguageWorkedWith']]
df_LanguageWorkedWith.dropna(subset='LanguageWorkedWith', inplace=True)

Create empty lists

Respondent_As_List = []
LanguageWorkedWith_As_List = []

Call the function

transform_df(df_LanguageWorkedWith, LanguageWorkedWith_As_List, 'LanguageWorkedWith', Respondent_As_List)

Tranform the lists into dataframes

df_Respondent = pd.DataFrame(Respondent_As_List, columns=["Respondent"])
df_LanguageWorked = pd.DataFrame(LanguageWorkedWith_As_List, columns=["LanguageWorkedWith"])

Concatenate those dataframes

df_LanguageWorkedWith_final = pd.concat([df_Respondent, df_LanguageWorked], axis=1)

And that's it.

The code and input file can be found on my GitHub: https://github.com/jarsonX/Temp_files

Thanks in advance!

CodePudding user response：

You can try like this. I haven't tested but it should work

df['LanguageWorkedWith'] = df['LanguageWorkedWith'].str.replace(';',',')

df =df.assign(LanguageWorkedWith=df['LanguageWorkedWith'].str.split(',')).explode('LanguageWorkedWith')

#Tested

      LanguageWorkedWith  Respondent
0                      C           4
0                    C             4
0                     C#           4
0                 Python           4
0                    SQL           4
...                  ...         ...
10319                 Go       25142