Imagine the following DataFrame.
import pandas as pd
animal_sizes = pd.DataFrame({"Animal": ["Horse", "Mouse"],
"Size": ["Large", "Small"]})
Animal | Size |
---|---|
Horse | Large |
Mouse | Small |
I want to add another row for "Dog". If I understand correctly, I have to first create another DataFrame and then concatenate the new and the existing DataFrame.
pd.concat([animal_sizes,
pd.DataFrame({"Animal": ["Dog"],
"Size": ["Medium"]})]
)
Animal | Size |
---|---|
Horse | Large |
Mouse | Small |
Dog | Medium |
This doesn't seem terribly elegant. Is there a simpler way? I imagine something like animal_sizes.append_row(["Dog", "Medium"])
.
Imagine I only want to add another value to the Animal column. (Perhaps I haven't measured the size yet.) Again, pd.concat
with an explicit empty (or NaN) value for the Size column seems awkward:
pd.concat([animal_sizes,
pd.DataFrame({"Animal": ["Crow"], "Size": [""]})]
Animal | Size |
---|---|
Horse | Large |
Mouse | Small |
Crow |
Is there a simpler solution? I'm looking for something like animal_sizes["Animal"].append_value("Crow")
.
I know about pd.append
(see this fine answer), but not only is it deprecated, it also expects you to explicate the column for each new row value. This makes it slightly unwieldy for my taste.
animal_sizes.append({"Animal": "Crow"}, ignore_index=True)
Are there more elegant solutions for this very simple problem?
CodePudding user response:
I recommend defining an appropriate index (animals in this case) and using it to insert new rows by name. Use dictionaries to add incomplete rows.
import pandas as pd
animal_sizes = pd.DataFrame({"Animal": ["Horse", "Mouse"],
"Size": ["Large", "Small"],
"othercol": ["A", "B"]}
).set_index("Animal")
animal_sizes.loc["Dog"] = {"othercol": "C"}
animal_sizes.loc["Elephant"] = ["verylarge", "D"]
animal_sizes.loc["unspecifiedanimal"] = {}
print(animal_sizes)
# result:
Size othercol
Animal
Horse Large A
Mouse Small B
Dog NaN C
Elephant verylarge D
unspecifiedanimal NaN NaN
Adding an existing animal replaces a row. This may or may not be intended behavior. If the goal is to blindly dump rows into the table while accepting duplicates, the best solution might still be concat
.
CodePudding user response:
Solution for default RangeIndex
values in index for always inserting new rows to end of DataFrame
:
Use DataFrame.loc
with list, only necessary same length like number of columns - new index value is created by length of rows:
animal_sizes.loc[len(animal_sizes)] = ["Dog", "Medium"]
print (animal_sizes)
Animal Size
0 Horse Large
1 Mouse Small
2 Dog Medium
If need also specify columns names:
animal_sizes.loc[len(animal_sizes)] = {"Animal": "Dog", "Size": "Medium"}
print (animal_sizes)
Animal Size
0 Horse Large
1 Mouse Small
2 Dog Medium
CodePudding user response:
You can add a single row to a Pandas DataFrame using the .loc indexing method:
animal_sizes.loc[len(animal_sizes)] = ["Dog", "Medium"]
To add a single value to the Animal column, you can create a new column with that value and concatenate the DataFrames:
animal_sizes['Size'] = animal_sizes['Size'].astype(str)
animal_sizes = pd.concat([animal_sizes, pd.DataFrame({"Animal": ["Crow"], "Size": [""]})], sort=False)
Note that you need to cast the Size column to a string data type to accommodate the empty string.