Home > Mobile >  Labelling a pandas dataframe based on data in a column
Labelling a pandas dataframe based on data in a column

Time:09-27

I am trying to label a pandas dataframe with labels 0,1,2 etc.

Code I am trying is

import pandas as pd
# initialise data of lists.
data = {'Path':['/content/gdrive/MyDrive/DOA_dataset/theta_30/s1', '/content/gdrive/MyDrive/DOA_dataset/theta_30/s2', '/content/gdrive/MyDrive/DOA_dataset/theta_60/s1','/content/gdrive/MyDrive/DOA_dataset/theta_60/s2',
                '/content/gdrive/MyDrive/DOA_dataset/theta_90/s1','/content/gdrive/MyDrive/DOA_dataset/theta_90/s2']}

# Create DataFrame
df = pd.DataFrame(data)

Expected Output

                                   Path                             Label
0   /content/gdrive/MyDrive/DOA_dataset/theta_30/s1                  0
1   /content/gdrive/MyDrive/DOA_dataset/theta_30/s2                  0
2   /content/gdrive/MyDrive/DOA_dataset/theta_60/s1                  1
3   /content/gdrive/MyDrive/DOA_dataset/theta_60/s2                  1
4   /content/gdrive/MyDrive/DOA_dataset/theta_90/s1                  2
5   /content/gdrive/MyDrive/DOA_dataset/theta_90/s2                  2

I tried several ways like based on re pattern etc. But none was a success... Any help.

CodePudding user response:

Try with split then factorize

df['new'] = df.Path.str.rsplit('/',n=1).str[0].factorize()[0]
df
                                              Path  new
0  /content/gdrive/MyDrive/DOA_dataset/theta_30/s1    0
1  /content/gdrive/MyDrive/DOA_dataset/theta_30/s2    0
2  /content/gdrive/MyDrive/DOA_dataset/theta_60/s1    1
3  /content/gdrive/MyDrive/DOA_dataset/theta_60/s2    1
4  /content/gdrive/MyDrive/DOA_dataset/theta_90/s1    2
5  /content/gdrive/MyDrive/DOA_dataset/theta_90/s2    2

CodePudding user response:

If you want to have two columns, let's say x and y, in your dataframe, then you need to have something like :

df = pd.DataFrame({"First_column":x,"Second_column":y})

So, maybe you can build x and y with a loop by going through your files before creating the dataframe.

  • Related