I am sure this is not hard, but I can't figure it out!
I want to create a dataframe that starts at 1 for the first row and ends at 100,000 in increments of 1, 2, 4, 5, or whatever. I could do this in my sleep in Excel, but is there a slick way to do this without importing a .csv or .txt file?
I have needed to do this in variations many times and just settled on importing a .csv, but I am tired of that.
CodePudding user response:
Generating numbers
Generating numbers is not something special to pandas
, rather numpy
module or range
function (as mentioned by @Grismer) can do the trick. Let's say you want to generate a series of numbers and assign these numbers to a dataframe. As I said before, there are multiple approaches two of which I personally prefer.
range
function
Take range(1,1000,1)
as an Example. This function gets three arguments two of which are not mandatory. The first argument defines the start number, the second one defines the end number, and the last one points to the steps of this range. So the abovementioned example will result in the numbers 1 to 9999 (Note that this range is a half-open interval which is closed at the start and open at the end).
numpy.arange
function
To have the same results as the previous example, take numpy.arange(1,1000,1)
as an example. The arguments are completely the same as the range
's arguments.
Assigning to dataframe
Now, if you want to assign these numbers to a dataframe, you can easily do this by using the pandas
module. Code below is an example of how to generate a dataframe:
import numpy as np
import pandas as pd
myRange = np.arange(1,1001,1) # Could be something like myRange = range(1,1000,1)
df = pd.DataFrame({"numbers": myRange})
df.head(5)
which results in a dataframe like(Note that just the first five rows have been shown):
numbers | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
Difference of numpy.arange
and range
To keep this answer short, I'd rather to refer to this answer by @hpaulj