The problem i want to solve is: Use Len() on a column and the number of characters for each row needs to be applied to another column.
I have a dataframe with general ledger codes that don't have the same length and i need to find the lowest level of detail to prevent double counting. The way i can find it is by comparing the digits of the current row with the next row using the number of characters of the current row. For example, 11.0 and 111.0 are grouped accounts of 1111-1123. I only want 111-1123 and exclude the group accounts.
I can use the LEN function to get the number of characters of the current row, but i am not able to apply this for the entire column.
My dataframe looks like this:
:df3
Account Amount
0 11.0 1000.82
1 111.0 1000.42
2 1111.0 791.51
3 1115.0 1802.19
4 1116.0 202.36
5 1117.0 1507.33
6 1118.0 0.03
7 1119.0 0.00
8 1120.0 0.00
9 1121.0 24.28
10 1122.0 376.87
11 1123.0 0.25
14 12.0 80179.92
15 121.0 80179.92
16 12101.0 0.00
I tried calculating this by adding a new column for the next row, adding a new column for the Length of the characters for the current row.
df3['Next_Account'] = df3['Account'].shift(-1)
df3['Len_account'] = df3['Account'].str.len()-2
Account Amount Next_account Len_Account
0 11.0 1000.82 111.0 2
1 111.0 1000.42 1111.0 3
2 1111.0 791.51 1115.0 4
3 1115.0 1802.19 1116.0 4
4 1116.0 202.36 1117.0 4
5 1117.0 1507.33 1118.0 4
6 1118.0 0.03 1119.0 4
7 1119.0 0.00 1120.0 4
8 1120.0 0.00 1121.0 4
9 1121.0 24.28 1122.0 4
10 1122.0 376.87 1123.0 4
11 1123.0 0.25 12.0 4
14 12.0 80179.92 121.0 2
15 121.0 80179.92 12101.0 3
16 12101.0 0.00 12102.0 5
I tried getting the number of characters of the Next account by using the string function, but this does not work for some reason.
df3['current_digits_next'] = df3['Next_Account'].str[:df3['Len_Account']]
df3
current_digits_next
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
14 NaN
15 NaN
16 NaN
The preferred output is:
current_digits_next
0 11
1 111
2 1115
3 1116
4 1117
5 1118
6 1119
7 1120
8 1121
9 1122
10 1123
11 12.0
14 12
15 121
16 12102
With the preferred output i can match the data and exclude the grouped accounts. What am i doing wrong?
CodePudding user response:
str
accessor accepts int rather Series as index. You can try apply
on rows
df3['current_digits_next'] = df3.apply(lambda row: str(row['Next_Account'])[:row['Len_account']], axis=1)
Account Amount Next_Account Len_account current_digits_next
0 11.0 1000.82 111.0 2 11
1 111.0 1000.42 1111.0 3 111
2 1111.0 791.51 1115.0 4 1115
3 1115.0 1802.19 1116.0 4 1116
4 1116.0 202.36 1117.0 4 1117
5 1117.0 1507.33 1118.0 4 1118
6 1118.0 0.03 1119.0 4 1119
7 1119.0 0.00 1120.0 4 1120
8 1120.0 0.00 1121.0 4 1121
9 1121.0 24.28 1122.0 4 1122
10 1122.0 376.87 1123.0 4 1123
11 1123.0 0.25 12.0 4 12.0
12 12.0 80179.92 121.0 2 12
13 121.0 80179.92 12101.0 3 121
CodePudding user response:
You can convert your Account
field to a string and then use apply
to check for the required condition
s1 = df['Account'].astype(int).astype(str)
s2 = df['Account'].astype(int).astype(str).shift(-1)
s3 = pd.concat([s1, s2], axis=1, ignore_index=True).loc[:len(s1), :].apply(lambda x: x[0] in x[1], axis=1)
df = pd.concat([df, s3], axis=1).fillna(False)
print(df)
Account Amount 0
0 11.0 1000.82 True
1 111.0 1000.42 True
2 1111.0 791.51 False
3 1115.0 1802.19 False
4 1116.0 202.36 False
5 1117.0 1507.33 False
6 1118.0 0.03 False
7 1119.0 0.00 False
8 1120.0 0.00 False
9 1121.0 24.28 False
10 1122.0 376.87 False
11 1123.0 0.25 False
14 12.0 80179.92 True
15 121.0 80179.92 True
16 12101.0 0.00 False