Home > database >  Expanding dataset with declared dictionary using Pandas
Expanding dataset with declared dictionary using Pandas

Time:10-07

I have a dataset where I would like to append rows if this value is missing from a specific column.

Data

ID  Date    type    energy
AA  Q1 2022 a       1
AA  Q2 2022 ok      1
AA  Q3 2022 yes     1
AA  Q4 2022 yes     4
FC  Q1 2022 no      4
FC  Q2 2022 no      4
FC  Q3 2022 yes     45
FC  Q4 2022 yes     5
        
        

Desired

ID  Date    type    energy
AA  Q1 2022 a       1
AA  Q2 2022 ok      1
AA  Q3 2022 yes     1
AA  Q4 2022 yes     4
FC  Q1 2022 no      4
FC  Q2 2022 no      4
FC  Q3 2022 yes     45
FC  Q4 2022 yes     5
BC  Q1 2022         0
DA  Q1 2022         0
BC  Q1 2023         0
DA  Q1 2023         0

Doing

stat =   {
          'BC':'Q1 2022', Q1 2023
          'DA':'Q1 2022', Q1 2023  }
    

df0 = df.append(pd.DataFrame({'ID':[x for x in stat], 'Date':[stat[x] for x in stat]}))
df1 = df.merge(df0, how='left').fillna({'energy': 0})

However, this is not appending, not sure how to attach the values dictionary within this script. I am still researching, any suggestion is appreciated.

CodePudding user response:

You can append new rows to a dataframe in various ways. One of which is to use lists and loc method:

import pandas as pd

data = pd.read_csv('data.csv', sep=';')

new_data = [
    ['BC', 'Q1 2022', '', 0],
    ['DA', 'Q1 2022', '', 0],
    ['BC', 'Q1 2023', '', 0],
    ['DA', 'Q1 2023', '', 0]
]

for row in new_data:
    data.loc[len(data)] = row

print(data)

Which will give you the following output:

    ID     Date type  energy
0   AA  Q1 2022    a       1
1   AA  Q2 2022   ok       1
2   AA  Q3 2022  yes       1
3   AA  Q4 2022  yes       4
4   FC  Q1 2022   no       4
5   FC  Q2 2022   no       4
6   FC  Q3 2022  yes      45
7   FC  Q4 2022  yes       5
8   BC  Q1 2022            0
9   DA  Q1 2022            0
10  BC  Q1 2023            0
11  DA  Q1 2023            0

From the other hand, if you want to use dictonaries specifically, then you can use append method:

data = pd.read_csv('data.csv', sep=';')

new_data = {
    'ID': ['BC', 'DA', 'BC', 'DA'],
    'Date': ['Q1 2022', 'Q1 2022', 'Q1 2023', 'Q1 2023'],
    'type': ['', '', '', ''],
    'energy': [0, 0, 0, 0]
}

data = data.append(pd.DataFrame(new_data), ignore_index=True)

print(data)

It will give you the same output as previously.

Here you can find more information on how to append data to dataframe: Quick Examples of Append Row to DataFrame

  • Related