I have multiple financial statements and not all of them have the same entries - some have more entries than others - and I'd like to consolidate them all into a single one that has all of the entries.
I was able to do it manually in Excel since it's not that many, but I'd like to have the computer do it to double check I got them all.
So, here's what I did: I created a dataframe where each column has the entry names from one of the financial statements.
FinancialStatement1 | FinancialStatement2 | FinancialStatement3 |
---|---|---|
REVENUES | REVENUES | REVENUES |
Revenue1 | Revenue1 | Revenue1 |
Revenue2 | Revenue2 | Revenue2 |
EXPENSES | EXPENSES | Revenue3 |
Expense1 | Expense1 | EXPENSES |
Expense2 | PROFIT | Expense1 |
Expense3 | - | Expense2 |
PROFIT | - | PROFIT |
- | - | - |
My idea was to run a script that would analyse the number of cells between two of the group titles and return to a 'Consolidated' column all of the strings between these two values, including the first, but not the last.
My end result would look like this:
FinancialStatement1 | FinancialStatement2 | FinancialStatement3 | Consolidated |
---|---|---|---|
REVENUES | REVENUES | REVENUES | REVENUES |
Revenue1 | Revenue1 | Revenue1 | Revenue1 |
Revenue2 | Revenue2 | Revenue2 | Revenue2 |
EXPENSES | EXPENSES | Revenue3 | Revenue3 |
Expense1 | Expense1 | EXPENSES | EXPENSES |
Expense2 | PROFIT | Expense1 | Expense1 |
Expense3 | - | Expense2 | Expense2 |
PROFIT | - | PROFIT | Expense3 |
- | - | - | PROFIT |
I'm a beginner in Pandas and, so far, here's what I came up with by searching here in stack:
df = pd.read_excel(file)
df['Consolidated']=0
df.head()
df['Consolidated'].iloc[1] = df['FinancialStatement1'][df['FinancialStatement1'].between(
'REVENUES', 'EXPENSES',
inclusive=False
)].tolist()
However, this code gives me "A value is trying to be set on a copy of a slice from a DataFrame"
.
I tried using only df.iloc[3,0] but it also doesn't work. Anyway, this code won't do what I want anyway since it doesn't pick the list with most items between the two group titles.
CodePudding user response:
You can create a SET of all the entries in the column. Python SET will eliminate duplicates. Then you cast to a list and create a pd with the superset of all the entries and join it back to the dataframe.
s = set()
for col in df.columns:
s |= set(df[col])
l = sorted(list(s - set(['-'])))
df1 = pd.DataFrame(l,columns = ['Consolidated'])
df = df.join(df1,how='outer').replace(np.nan,'-')
FinancialStatement1 | FinancialStatement2 | FinancialStatement3 | Consolidated | |
---|---|---|---|---|
0 | REVENUES | REVENUES | REVENUES | EXPENSES |
1 | Revenue1 | Revenue1 | Revenue1 | Expense1 |
2 | Revenue2 | Revenue2 | Revenue2 | Expense2 |
3 | EXPENSES | EXPENSES | Revenue3 | Expense3 |
4 | Expense1 | Expense1 | EXPENSES | PROFIT |
5 | Expense2 | PROFIT | Expense1 | REVENUES |
6 | Expense3 | - | Expense2 | Revenue1 |
7 | PROFIT | - | PROFIT | Revenue2 |
8 | - | - | - | Revenue3 |
CodePudding user response:
If you're looking to just get a list of all line items needed to cover all financial statments, you can do the following
l = list(set(df.unstack().values.tolist()))
l.sort()
l.remove('-')
print(l)
['EXPENSES', 'Expense1', 'Expense2', 'Expense3', 'PROFIT', 'REVENUES', 'Revenue1', 'Revenue2', 'Revenue3']
If you want to just pull out the needed Revenues:
revs = [val for val in l if 'Revenue' in val]
revs.sort()
print(revs)
['Revenue1', 'Revenue2', 'Revenue3']
If you want to just pull out the needed Expenses:
exps = [val for val in l if 'Expense' in val]
exps.sort()
print(exps)
['Expense1', 'Expense2', 'Expense3']
You can piece together the data how you want an add the column to existing df or create a new one
line_items =['REVENUES'] revs ['EXPENSES'] exps ['PROFIT']
df1 = pd.DataFrame({'consolidated':line_items})
df_final = pd.concat([df,df1], axis=1, ignore_index=True)
df_final.columns = ['FinancialStatement1', 'FinancialStatement2', 'FinancialStatement3', 'consollidated']
df_final
FinancialStatement1 FinancialStatement2 FinancialStatement3 consollidated
0 REVENUES REVENUES REVENUES REVENUES
1 Revenue1 Revenue1 Revenue1 Revenue1
2 Revenue2 Revenue2 Revenue2 Revenue2
3 EXPENSES EXPENSES Revenue3 Revenue3
4 Expense1 Expense1 EXPENSES EXPENSES
5 Expense2 PROFIT Expense1 Expense1
6 Expense3 - Expense2 Expense2
7 PROFIT - PROFIT Expense3
8 NaN NaN NaN PROFIT