import pandas as pd
data = {'T1_sometext': [1,2,3,4,5,6],
'T1_anothertext': [1,2,3,4,5,6],
"T1_anothertext2": [1,2,3,4,5,6],
"T2_anothertext2": [1,2,3,4,5,6],
"T2_anothertext4": [1,2,3,4,5,6],
"T2_anothertext5": [1,2,3,4,5,6],
}
df = pd.DataFrame(data)
How do I extract the T1 columns together and T2 columns together, so that I could plot them separately?
Is there a regex-way to do it? Let's say I have up to T20. I would like to automate it using python code.
In other words, I would like to extract (or select) every 3 columns with the same prefix, that is, starting with same string (T1, T2, T3, etc...)
Thank you!
CodePudding user response:
Here's a way to do it by gathering all the columns into 1 dictionary. First I define a list of keys from the column name prefix as k_list
. Then I iterate over the column names, and append the list with the relevant columns using dict.get()
.
import pandas as pd
import re
data = {'T1_sometext': [11,2,3,4,5,6],
'T1_anothertext': [11,2,3,4,5,6],
"T1_anothertext2": [11,2,3,4,5,6],
"T2_anothertext2": [21,2,3,4,5,6],
"T2_anothertext4": [21,2,3,4,5,6],
"T2_anothertext5": [21,2,3,4,5,6],
}
df = pd.DataFrame(data)
d = {}
k_list = re.findall('(T\d )_', ','.join(df.columns))
for i, col in enumerate(df.columns):
d[k_list[i]] = d.get(k_list[i], list(df[col])) list(df[col])
print(d)
Output:
{'T1': [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6],
'T2': [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]}
You can also see how it looks like in a dataframe
df_new = pd.DataFrame(d)
T1 T2
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
6 1 1
7 2 2
8 3 3
9 4 4
10 5 5
11 6 6
12 1 1
13 2 2
14 3 3
15 4 4
16 5 5
17 6 6
CodePudding user response:
You can split the keys using "_" and group the data.
import pprint
data = {'T1_sometext': [1, 2, 3, 4, 5, 6],
'T1_anothertext': [1, 2, 3, 4, 5, 6],
"T1_anothertext2": [1, 2, 3, 4, 5, 6],
"T2_anothertext2": [1, 2, 3, 4, 5, 6],
"T2_anothertext4": [1, 2, 3, 4, 5, 6],
"T2_anothertext5": [1, 2, 3, 4, 5, 6],
}
newdata = {}
for key in data:
k = key.split("_")[0]
if k not in newdata:
newdata[k] = {}
newdata[k][key] = data[key]
pprint.pprint(newdata)
Output:
{'T1': {'T1_anothertext': [1, 2, 3, 4, 5, 6],
'T1_anothertext2': [1, 2, 3, 4, 5, 6],
'T1_sometext': [1, 2, 3, 4, 5, 6]},
'T2': {'T2_anothertext2': [1, 2, 3, 4, 5, 6],
'T2_anothertext4': [1, 2, 3, 4, 5, 6],
'T2_anothertext5': [1, 2, 3, 4, 5, 6]}}
CodePudding user response:
You can extract
the IDs and use groupby
to get subgroups:
names = df.columns.str.extract(r'([^_] )', expand=False)
for name, d in df.groupby(names, axis=1):
print(f'>>> {name}')
print(d)
Output:
>>> T1
T1_sometext T1_anothertext T1_anothertext2
0 1 1 1
1 2 2 2
2 3 3
...
>>> T2
T2_anothertext2 T2_anothertext4 T2_anothertext5
0 1 1 1
1 2 2 2
...
Or as dictionary:
d = {name: d for name, d in df.groupby(names, axis=1)}
Output:
{'T1': subdf_with_T1
'T2': subdf_with_T2
}
alternative using a MultiIndex
:
df2 = df.set_axis(df.columns.str.split('_', 1, expand=True), axis=1)
for c in df2. columns.unique(level=0):
print(f'>>> {c}')
print(df2[c])
Output:
>>> T1
sometext anothertext anothertext2
0 1 1 1
1 2 2 2
...
>>> T2
anothertext2 anothertext4 anothertext5
0 1 1 1
1 2 2 2
...
df2
:
T1 T2
sometext anothertext anothertext2 anothertext2 anothertext4 anothertext5
0 1 1 1 1 1 1
1 2 2 2 2 2 2
2 3 3 3 3 3 3
3 4 4 4 4 4 4
4 5 5 5 5 5 5
...