Home > OS >  convert data to 1NF
convert data to 1NF

Time:10-24

I have a dataset like this

import pandas as pd 
df = pd.read_csv("music.csv") 
df
name date singer language phase
1 Yes or No 02.01.20 Benjamin Smith en 1
2 Parabens 01.06.21 Rafael Galvao;Simon Murphy pt;en 2
3 Love 12.11.20 Michaela Condell en 1
4 Paz 11.07.19 Ana Perez; Eduarda Pinto es;pt 3
5 Stop 12.01.21 Michael Conway;Gabriel Lee en;en 1
6 Shalom 18.06.21 Shimon Cohen hebr 1
7 Habibi 22.12.19 Fuad Khoury ar 3
8 viva 01.08.21 Veronica Barnes en 1
9 Buznanna 23.09.20 Kurt Azzopardi mt 1
10 Frieden 21.05.21 Gabriel Meier dt 1
11 Uruguay 11.04.21 Julio Ramirez es 1
12 Beautiful 17.03.21 Cameron Armstrong en 3
13 Holiday 19.06.20 Bianca Watson en 3
14 Kiwi 21.10.20 Lachlan McNamara en 1
15 Amore 01.12.20 Vasco Grimaldi it 1
16 La vie 28.04.20 Victor Dubois fr 3
17 Yom 21.02.20 Ori Azerad; Naeem al-Hindi hebr;ar 2
18 Elefthería 15.06.19 Nikolaos Gekas gr 1

This table is not in 1NF. I would like to convert in the form of pd.DataFrame, that satiesfy 1NF.

How can I do that?

I did this, but seems not work

import pandas as pd
import numpy as np
 
df = pd.read_csv("music.csv") 
 
lens = list(map(len, df['singer','language].values))
 
res = pd.DataFrame({'name': np.repeat(
    df['name'], lens), 'singer': np.concatenate(df['singer'].values),'language': np.concatenate(df['language'].values)})
 
print(res)

It should satisfy only 1NF not 3NF and so on.

CodePudding user response:

Split language and singer by values and use pd.explode:

df['language']=df['language'].str.split(';')
df['singer']=df['singer'].str.split(";")
df.explode(['language','singer'])
Id name date singer language phase
1 Yes or No 02.01.20 Benjamin Smith en 1
2 Parabens 01.06.21 Rafael Galvao pt 2
2 Parabens 01.06.21 Simon Murphy en 2
3 Love 12.11.20 Michaela Condell en 1
4 Paz 11.07.19 Ana Perez es 3
4 Paz 11.07.19 Eduarda Pinto pt 3
5 Stop 12.01.21 Michael Conway en 1
5 Stop 12.01.21 Gabriel Lee en 1
6 Shalom 18.06.21 Shimon Cohen hebr 1
7 Habibi 22.12.19 Fuad Khoury ar 3
8 viva 01.08.21 Veronica Barnes en 1
9 Buznanna 23.09.20 Kurt Azzopardi mt 1
10 Frieden 21.05.21 Gabriel Meier dt 1
11 Uruguay 11.04.21 Julio Ramirez es 1
12 Beautiful 17.03.21 Cameron Armstrong en 3
13 Holiday 19.06.20 Bianca Watson en 3
14 Kiwi 21.10.20 Lachlan McNamara en 1
15 Amore 01.12.20 Vasco Grimaldi it 1
16 La vie 28.04.20 Victor Dubois fr 3
17 Yom 21.02.20 Ori Azerad hebr 2
17 Yom 21.02.20 Naeem al-Hindi ar 2
18 Elefthería 15.06.19 Nikolaos Gekas gr 1
  • Related