I'm working on a chord dictionary and for that I need to group the different types of chords into smaller groups.
However, I'm having trouble working with some variations that contain characters like # (e.g. C#, C#m) and variations like D7/F# and A/B, which I would like to insert into others.
I believe it's some regex parameter, which I confess I'm not so familiar with.
Here is the developed code:
triadeMaior = pd.DataFrame({'triadeMaior': ['C','C#','Db','D','D#','Eb','E','F','F#','Gb','G','G#','Ab','A','A#','Bb','B']
})
triadeMenor = pd.DataFrame({'triadeMenor': ['Cm','C#m','Dbm','Dm','D#m','Ebm','Em','Fm','F#m','Gbm','Gm','G#m','Abm','Am','A#m','Bbm','Bm']
})
triadeDiminuta = pd.DataFrame({'triadeDiminuta':['Cdim','C#dim','Dbdim', 'Ddim', 'D#dim', 'Ebdim', 'Edim', 'Fdim', 'F#dim', 'Gbdim','Gdim',
'G#dim', 'Abdim', 'Adim', 'A#dim', 'Bbdim', 'Bdim']
})
triadeAumentada = pd.DataFrame({'triadeAumentada':['Caug','C#aug','Dbaug','Daug','D#aug','Ebaug','Eaug','Faug','F#aug','Gbaug','Gaug','G#aug','Abaug','Aaug','A#aug','Bbaug','Baug' ]
})
setima = pd.DataFrame({'setima':['C7','C#7','Db7','D7','D#7','Eb7','E7','F7','F#7','Gb7','G7','G#7','Ab7','A7','A#7','Bb7','B7']
})
setimaMenor = pd.DataFrame({'setimaMenor':['Cm7','C#m7','Dbm7','Dm7','D#m7','Ebm7','Em7','Fm7','F#m7','Gbm7','Gm7','G#m7','Abm7','Am7','A#m7','Bbm7','Bm7']
})
setimaMaior = pd.DataFrame({'setimaMaior':['Cmaj7', 'C#maj7', 'Dbmaj7', 'Dmaj7', 'D#maj7', 'Ebmaj7', 'Emaj7', 'Fmaj7', 'F#maj7','Gbmaj7','Gmaj7', 'G#maj7','Abmaj7','Amaj7','A#maj7','Bbmaj7','Bmaj7']
})
setimaMenorQuinta = ({'setimaMenorQuinta':['Cm7b5','C#m7b5', 'Dbm7b5', 'Dm7b5', 'D#m7b5', 'Ebm7b5','Em7b5', 'Fm7b5', 'F#m7b5', 'Gbm7b5', 'Gm7b5', 'G#m7b5', 'Abm7b5', 'Am7b5', 'A#m7b5', 'Bbm7b5', 'Bm7b5']
})
sexta= pd.DataFrame({'sexta':['C6','C#6','Db6','D6','D#6','Eb6','E6','F6','F#6','Gb6','G6','G#6','Ab6','A6','A#6','Bb6','B6']
})
sextaMenor = pd.DataFrame({'sextaMenor': ['Cm6','C#m6','Dbm6','Dm6','D#m6','Ebm6','Em6','Fm6','F#m6','Gbm6','Gm6','G#m6','Abm6','Am6','A#m6'
'Bbm6','Bm6']
})
triadeMaior_pat = fr"\b({'|'.join(triadeMaior['triadeMaior'])})\b"
triadeMenor_pat = fr"\b({'|'.join(triadeMenor['triadeMenor'])})\b"
triadeDiminuta_pat = fr"\b({'|'.join(triadeDiminuta['triadeDiminuta'])})\b"
triadeAumentada_pat = fr"\b({'|'.join(triadeAumentada['triadeAumentada'])})\b"
setima_pat = fr"\b({'|'.join(setima['setima'])})\b"
setimaMenor_pat = fr"\b({'|'.join(setimaMenor['setimaMenor'])})\b"
setimaMaior_pat = fr"\b({'|'.join(setimaMaior['setimaMaior'])})\b"
setimaMenorQuinta_pat = fr"\b({'|'.join(setimaMenorQuinta['setimaMenorQuinta'])})\b"
sexta_pat = fr"\b({'|'.join(sexta['sexta'])})\b"
sextaMenor_pat = fr"\b({'|'.join(sextaMenor['sextaMenor'])})\b"
df['chordType'] = df['chords'].replace({triadeMaior_pat: 'triadeMaj',
triadeMenor_pat: 'triadeMen',
triadeDiminuta_pat: 'triadeDim',
triadeAumentada_pat: 'triadeAug',
setima_pat: 'setima',
setimaMenor_pat: 'setimaMen',
setimaMaior_pat: 'setimaMaj',
setimaMenorQuinta_pat : 'setimaMenQui',
sexta_pat:'sexta',
sextaMenor_pat: 'sextaMen',
r'\b(?!triadeMaj|triadeMen|triadeDim|triadeAug|setima|setimaMen|setimaMen|setimaMaj|sexta|sextaMen\b)\w ': 'outros'},
regex=True)
Here is an example of some results:
chords | chordType |
---|---|
C#, E7, Abm, Amaj7, E, Abm, C#m, E | triadeMaj#, setima, triadeMen, setimaMaj, triadeMaj, triadeMen, triadeMaj#outros, triadeMaj |
E, A7, G6, D/F#, F6, E, Em, D7/F#, Fmaj7, E, A7, G6, D7/F#, F6, Em, D, Dm7, E | triadeMaj, setima, sexta, triadeMaj/triadeMaj#, sexta, triadeMaj, triadeMen, setima/triadeMaj#, setimaMaj, triadeMaj, setima, sexta, setima/triadeMaj#, sexta, triadeMen, triadeMaj, setimaMen, triadeMaj |
As you can see, in the case of chords with # and /, the current code understands it as two parts and not one.
Does anyone know how to fix? Also, as I mentioned, I don't have a lot of regex skills, so I don't know if it would be possible to shorten and make the code more robust and clean.
CodePudding user response:
This might be more cleanly done without regex, actually.
This example just uses a small subset of your data, but you can just fill out the chord_types
dict with all your mappings.
import pandas as pd
chord_types = {'C': 'triadeMaj', 'C#': 'triadeMaj', 'C7': 'setima'} # Add as required
df = pd.DataFrame(['C, C7', 'C, C#'], columns=('chords',)) # Toy example
map_fn = lambda cs: ', '.join((chord_types.get(c, 'outros') for c in cs))
df['chordType'] = df['chords'].str.replace(' ', '').str.split(',').apply(map_fn)
print(df)
giving:
chords chordType
0 C, C7 triadeMaj, setima
1 C, C# triadeMaj, triadeMaj