How to generate columns from count?-CodePudding

I need to count the most common values in each row and if possible I have to create columns for each value and their respective count.

Currently, I can count and group the values of each row. However, I don't know if it is possible to generate columns for the observed values.

Here's an example of the data:

chordType	commonChord
triadeMaior, setima, triadeMenor, setimaMaior, triadeMaior, triadeMenor, triadeMenor, triadeMaior	triadeMaior (3), triadeMenor (3), setima (1), setimaMaior (1)
triadeMenor, setima, triadeMaior, setimaMenor, triadeMaior, triadeMaior, setima, setima, setimaMenor, triadeMaior, triadeMaior, setimaMaior, triadeMaior, triadeMaior, triadeMenor, setima, triadeMaior, sexta, triadeMaior, setimaMenor, triadeMaior, triadeMaior, setimaMenor	triadeMaior (11), setima (4), setimaMenor (4), triadeMenor (2), setimaMaior (1), sexta (1)

To generate the table above, I used the following code:

df.chordType = df.chordType.str.split(", ").apply(lambda x: [v for v in map(str.strip, x)])
df = df[df.chordType.str.len() > 0]


df["commonChord"] = df.chordType.apply(
    lambda x: ", ".join(
        f"{a} ({b})" for a, b in pd.Series(x).value_counts().to_dict().items()
    )
)

df.chordType = df.chordType.apply(", ".join)
df.head(5)

My goal is to get a table that has, for example, columns for each observed value (e.g. triadMajor, triadMinor) and the respective value, which is in parentheses (3, 3).

It's possible?

CodePudding user response：

IIUC, you could start by splitting your column 'chordType' by ', ' (make sure that this splitting criteria is adequate) and then explode it. Then, you have to reset the index of the exploded series. Now, you can group by the exploded index and column to get the count of each occurrence. If you unstack the grouped data frame, you achieve your desired data form which can easily be concatenated with your initial data frame.

Code:

import pandas as pd

df = pd.DataFrame({
    "chordType": ["triadeMaior, setima, triadeMenor, setimaMaior, triadeMaior, triadeMenor, triadeMenor, triadeMaior", "triadeMenor, setima, triadeMaior, setimaMenor, triadeMaior, triadeMaior, setima, setima, setimaMenor, triadeMaior, triadeMaior, setimaMaior, triadeMaior, triadeMaior, triadeMenor, setima, triadeMaior, sexta, triadeMaior, setimaMenor, triadeMaior, triadeMaior, setimaMenor"]
})

pd.concat([df, df["chordType"].str.split(", ").explode().reset_index().groupby(["index", "chordType"]).size().unstack().fillna(0)], axis=1)

Output:

CodePudding user response：

Here's a way to do what your question asks, with an intermediate result of commonChord counts in a single column sorted by frequency per row that may also be useful:

df = (df
    .assign(commonChord=df.chordType.str.replace(' ','').str.split(','))
    .explode('commonChord')
    .assign(count=0).groupby(['chordType', 'commonChord']).count()
    .sort_values(['chordType', 'count'], ascending=[True, False])
    )
print('', 'df with commonChord and count columns', df, sep='\n')
df = df.unstack(fill_value=0).astype(int).T.reset_index(0, drop=True).T
print('', 'df with one column for each commonChord and values equal to count per row', df, sep='\n')

Output:

df with commonChord and count columns
                                                                count
chordType                                          commonChord
triadeMaior, setima, triadeMenor, setimaMaior, ... triadeMaior      3
                                                   triadeMenor      3
                                                   setima           1
                                                   setimaMaior      1
triadeMenor, setima, triadeMaior, setimaMenor, ... triadeMaior     11
                                                   setima           4
                                                   setimaMenor      4
                                                   triadeMenor      2
                                                   setimaMaior      1
                                                   sexta            1

df with one column for each commonChord and values equal to count per row
commonChord                                         setima  setimaMaior  setimaMenor  sexta  triadeMaior  triadeMenor
chordType
triadeMaior, setima, triadeMenor, setimaMaior, ...       1            1            0      0            3            3
triadeMenor, setima, triadeMaior, setimaMenor, ...       4            1            4      1           11            2