Home > Enterprise >  How to (elegantly) add single values and rows to a DataFrame?
How to (elegantly) add single values and rows to a DataFrame?

Time:02-04

Imagine the following DataFrame.

import pandas as pd

animal_sizes = pd.DataFrame({"Animal": ["Horse", "Mouse"], 
                             "Size": ["Large", "Small"]})

Animal Size
Horse Large
Mouse Small

I want to add another row for "Dog". If I understand correctly, I have to first create another DataFrame and then concatenate the new and the existing DataFrame.

pd.concat([animal_sizes, 
           pd.DataFrame({"Animal": ["Dog"],
                         "Size": ["Medium"]})]
         )
Animal Size
Horse Large
Mouse Small
Dog Medium

This doesn't seem terribly elegant. Is there a simpler way? I imagine something like animal_sizes.append_row(["Dog", "Medium"]).

Imagine I only want to add another value to the Animal column. (Perhaps I haven't measured the size yet.) Again, pd.concat with an explicit empty (or NaN) value for the Size column seems awkward:

pd.concat([animal_sizes, 
           pd.DataFrame({"Animal": ["Crow"], "Size": [""]})]
Animal Size
Horse Large
Mouse Small
Crow

Is there a simpler solution? I'm looking for something like animal_sizes["Animal"].append_value("Crow").

I know about pd.append (see this fine answer), but not only is it deprecated, it also expects you to explicate the column for each new row value. This makes it slightly unwieldy for my taste.

animal_sizes.append({"Animal": "Crow"}, ignore_index=True)

Are there more elegant solutions for this very simple problem?

CodePudding user response:

I recommend defining an appropriate index (animals in this case) and using it to insert new rows by name. Use dictionaries to add incomplete rows.

import pandas as pd

animal_sizes = pd.DataFrame({"Animal": ["Horse", "Mouse"], 
                             "Size": ["Large", "Small"],
                             "othercol": ["A", "B"]}
                           ).set_index("Animal")

animal_sizes.loc["Dog"] = {"othercol": "C"}
animal_sizes.loc["Elephant"] = ["verylarge", "D"]
animal_sizes.loc["unspecifiedanimal"] = {}

print(animal_sizes)

# result:
                        Size othercol
Animal                               
Horse                  Large        A
Mouse                  Small        B
Dog                      NaN        C
Elephant           verylarge        D
unspecifiedanimal        NaN      NaN

Adding an existing animal replaces a row. This may or may not be intended behavior. If the goal is to blindly dump rows into the table while accepting duplicates, the best solution might still be concat.

CodePudding user response:

Solution for default RangeIndex values in index for always inserting new rows to end of DataFrame:

Use DataFrame.loc with list, only necessary same length like number of columns - new index value is created by length of rows:

animal_sizes.loc[len(animal_sizes)] = ["Dog", "Medium"]
print (animal_sizes)
  Animal    Size
0  Horse   Large
1  Mouse   Small
2    Dog  Medium

If need also specify columns names:

animal_sizes.loc[len(animal_sizes)] = {"Animal": "Dog", "Size": "Medium"}
print (animal_sizes)
  Animal    Size
0  Horse   Large
1  Mouse   Small
2    Dog  Medium

CodePudding user response:

You can add a single row to a Pandas DataFrame using the .loc indexing method:

animal_sizes.loc[len(animal_sizes)] = ["Dog", "Medium"]

To add a single value to the Animal column, you can create a new column with that value and concatenate the DataFrames:

animal_sizes['Size'] = animal_sizes['Size'].astype(str)
animal_sizes = pd.concat([animal_sizes, pd.DataFrame({"Animal": ["Crow"], "Size": [""]})], sort=False)

Note that you need to cast the Size column to a string data type to accommodate the empty string.

  • Related