Header And Skiprows Difference In Pandas Unclear
Solution 1:
You can follow this article, which explains the difference between the parameters header
and skiprows
with examples from the olympic dataset, which can be downloaded here.
To summarize: the default behavior for pd.read()
is to read in all of the rows, which in the case of this dataset, includes an unnecessary first row of row numbers.
import pandas as pddf= pd.read_csv('olympics.csv')
df.head()
01234 ... 11121314150 NaN № Summer 01 ! 02 ! 03 ! ... № Games 01 ! 02 ! 03 ! Combined total
1 Afghanistan (AFG) 13002 ... 1300222 Algeria (ALG) 12528 ... 15528153 Argentina (ARG) 23182428 ... 41182428704 Armenia (ARM) 5129 ... 1112912
However the parameter skiprows
allows you to delete one or more rows when you read in the .csv file:
df1 = pd.read_csv('olympics.csv', skiprows = 1)
df1.head()
Unnamed:0№Summer01!02!...01!.202!.203!.2Combinedtotal0Afghanistan (AFG)1300...00221Algeria (ALG)1252...528152Argentina (ARG)231824...182428703Armenia (ARM)512...129124Australasia (ANZ) [ANZ] 234...34512
And if you want to skip a bunch of different rows, you can do the following (notice the missing countries):
df2 = pd.read_csv('olympics.csv', skiprows = [0, 2, 3])
df2.head()
Unnamed:0№Summer01!02!...01!.202!.203!.2Combinedtotal0Argentina (ARG)231824...182428701Armenia (ARM)512...129122Australasia (ANZ) [ANZ] 234...345123Australia (AUS) [AUS] [Z] 25139152...1441551814804Austria (AUT)261833...77111116304
The header
parameter tells you where to start reading in the .csv, which in the following case, does the same thing as skiprows = 1
:
# this gives the same result as df1 = pd.read_csv(‘olympics.csv’, skiprows = 1)
df4 = pd.read_csv('olympics.csv', header = 1)
df4.head()
Unnamed:0№Summer01!02!...01!.202!.203!.2Combinedtotal0Afghanistan (AFG)1300...00221Algeria (ALG)1252...528152Argentina (ARG)231824...182428703Armenia (ARM)512...129124Australasia (ANZ) [ANZ] 234...34512
However you cannot use the header parameter to skip a bunch of different rows. You would not be able to replicate df2 using the header parameter. Hopefully this clears things up.
Post a Comment for "Header And Skiprows Difference In Pandas Unclear"