DF is a dataframe with columns [A1,B1,C1,A2,B2,C2,A3,B3,C3]
I want to split that 'DF' data frame into small dataframes DF1,DF2,DF3
DF1 to have [A1,B1,C1] as columns
DF2 to have [A2,B2,C2] as columns
DF3 to have [A3,B3,C3] as columns
The number in the name of the dataframe DF'3' should match with its columns [A'3',B'3',C'3']
i tried
for i in range(1,4):
'DF{}'.format(i)=DF[DF['A{}'.format(i),'B{}'.format(i),'C{}'.format(i)]]
Getting the error
SyntaxError: cannot assign to function call
Is it possible to do this in a single loop?
CodePudding user response:
The error message stems from trying to assign a dataframe to the string format function, instead of a variable.
Dynamically creating variables from DF1
to DFN
for N
numbers can be a bit tricky. It is easy to create key-item pairs in dicts though. Try the following:
dfs = {}
for i in range(1,4):
dfs["DF{}".format(i)] = DF[["A{}".format(i), "B{}".format(i), "C{}".format(i)]]
Instead of getting DF1
, DF2
and DF3
variables, you get dfs["DF1"]
, dfs["DF2"]
, and dfs["DF3"]
CodePudding user response:
You can't dynamically change an object's name.
You can use a list comprehension with explicit definition of the dfs:
df1,df2,df3=[df[['A{}'.format(i),'B{}'.format(i),'C{}'.format(i)]] for i in range(1,4)]
Update based on ViettelSolutions' comment
Here is a more concise way of doing that: df1,df2,df3=[df[[f'A{i}',f'B{i}','C{i}']] for i in range(1,4)]
You can also use a list instead of explicitly name the dfs, and unpack them when needed.
n=4 # Define the number of dfs
dfs=[df[['A{}'.format(i),'B{}'.format(i),'C{}'.format(i)]] for i in range(1,n)]
CodePudding user response:
You could make it completely configurable:
def split_dataframe(df, letters, numbers):
return [df[[f'{letter}{number}' for letter in letters]] for number in numbers]
letters = ("A","B","C")
numbers = range(1,4)
df1, df2, df3 = split_dataframe(df, letters, numbers)
You can make it the function even more general as follows:
import re
letters_pattern = re.compile("^\D ")
numbers_pattern = re.compile("\d $")
def split_dataframe(df):
letters = sorted(set(letters_pattern.findall(x)[0] for x in df.columns))
numbers = sorted(set(numbers_pattern.findall(x)[0] for x in df.columns))
return [df[[x for x in [f'{letter}{number}' for letter in letters] if x in df.columns]] for number in numbers]
This method has 2 advantages:
- you don't need to provide the letters and the numbers in advance, the method will discover what is available in the header and proceed
- it will manage "irregular" situations - when, for example,
d1
exists butd2
doesn't
To give a concrete example:
df = pd.DataFrame({"A1":[1,2], "B1":[2,3], "C1":[3,4], "D1":[4,5], "A2":[2,3], "B2":[10,11], "C2":[12,13]})
for sub_df in split_dataframe(df):
print(sub_df)
OUTPUT
A1 B1 C1 D1
0 1 2 3 4
1 2 3 4 5
A2 B2 C2
0 2 10 12
1 3 11 13
The columns names discovery process could be set as optional if you pass letters
and numbers
you only want to consider, as follows:
def split_dataframe(df, letters=None, numbers=None):
letters = sorted(set(letters_pattern.findall(x)[0] for x in df.columns)) if letters is None else letters
numbers = sorted(set(numbers_pattern.findall(x)[0] for x in df.columns)) if numbers is None else numbers
return [df[[x for x in [f'{letter}{number}' for letter in letters] if x in df.columns]] for number in numbers]
for sub_df in split_dataframe(df, letters=("B","C"), numbers=[1,2]):
print(sub_df)
OUTPUT
B1 C1
0 2 3
1 3 4
B2 C2
0 10 12
1 11 13