I am trying to create a condition using for loop and if statement for a python dataframe object. In order to accurately specify which row from the data table to extract upon a specific condition, I searched the row index, and created an argument to specify the location before the for loop. The specifics looks something like this:
import pandas as pd
input_csv_file = "./CSV/Officers_and_Shareholders.csv"
df = pd.read_csv(input_csv_file, skiprows=10, on_bad_lines='skip')
df.fillna('', inplace=True)
# df.drop([0, 3], inplace=True)
df.columns = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
# print(df.shape)
# print(df.columns)
# print(df.iloc[:53])
# shareholders = df.iloc[24:42]
# print(shareholders)
# officers = df.iloc[0:23]
# print(officers)
dataframe = df.query("Total.ne('-')")
def get_shareholder_by_row_index():
for column in df.columns:
if object(df.iloc[column][:53]) == dataframe:
shareholders = df.iloc[24:42]
print(shareholders)
# elif object(df[:53][column]) != dataframe:
# officers = df.iloc[0:23]
# print(officers)
Because the format of the CSV file is not proper, I forced dataframe to re-create a header on top of the original CSV file, which I indicate under df.columns. The df.iloc[24:42] and df.iloc[0:23] are able to specifically locate the data range in the dataframe, but it doesn't return so when instantiated inside the for loop. Objectively, I want to create a function where if the row under the column Total is empty (-), then return the officers, but if the row under the column Total is not empty, then return shareholders. In this case, how should I modify the for loop and the if statement?
The desired output for shareholders will be:
24 PT CTCORP INFRASTRUKTUR D INDONESIA, ... Rp. 3.200.000.000
25 Nomor SK :- I ...
26 JalanKaptenPierreTendeanKavling12-14A ...
27 PT INTRERPORT PATIMBAN AGUNG, ... Rp. 2.900.000.000
28 Nomor SK :- ...
29 ...
30 ...
31 ...
32 ...
33 ...
34 PT PATIMBAN MAJU BERSAMA, ... Rp. 2.900.000.000
35 Nomor SK :AHU- ...
36 0061318.AH.01.01.TAHUN 2021 ...
37 Tanggal SK :30 September 2021 ...
38 ...
39 ...
40 PT TERMINAL PETIKEMAS ... Rp. 1.000.000.000
41 SURABAYA, ...
42 Nomor SK :- ...
and for the officers, it will return:
Nama ... Total
1 NIK: 3171060201830005 ...
2 NPWP: 246383541071000 ...
3 TTL: Jakarta, 02 Januari 1983 ...
5 NIK: 1271121011700003 ...
6 NPWP: 070970173112000 ...
7 TTL: Bogor, 10 November 1970 ...
8 ARLAN SEPTIA ANANDA ...
9 RASAM, ...
10 NIK: 3174051209620003 ...
11 NPWP: 080878200013000 ...
12 TTL: Jakarta, 12 September ...
13 1962 ...
15 NIK: 3171011605660004 ...
16 NPWP: 070141650093000 ...
17 TTL: Jakarta, 16 Mei 1966 ...
18 FUAD RIZAL, ...
21 PURNOMO, UTAMA RASRINIK: 3578032408610001 ...
22 NPWP: 097468813615000 ...
23 TTL: SLEMAN, 24 Agustus 1961 ...
CodePudding user response:
Stakeholder and Officer will be printed withrecpect to the index (Row Number) if this is not the desired answer then mention little detail
def get_shareholder_by_row_index():
for i in range(len(df)):
# this will give you shareholders if row under Total is empty else office if row is not empty
if df["Total"][i] == '' :
print(i," shareholders")
print(df.iloc[i])
# what ever your code is, will be here
else:
print(i," officers")
print(df.iloc[i])
# what ever your code is, will be here
# this will give you the indces where row under total is empty
print(df["Total"].iloc[:53][df["Total"] == ''])