Skip to content Skip to sidebar Skip to footer

Header And Skiprows Difference In Pandas Unclear

Can any one please elaborate with good example the difference between header and skiprows in syntax of pd.read_excel('name',header=number,skiprows=number)

Solution 1:

You can follow this article, which explains the difference between the parameters header and skiprows with examples from the olympic dataset, which can be downloaded here.

To summarize: the default behavior for pd.read() is to read in all of the rows, which in the case of this dataset, includes an unnecessary first row of row numbers.

import pandas as pddf= pd.read_csv('olympics.csv')
df.head()

01234  ...       11121314150                NaN  № Summer  01 !  02 !  03 !  ...  № Games  01 !  02 !  03 !  Combined total
1  Afghanistan (AFG)        13002  ...       1300222      Algeria (ALG)        12528  ...       15528153    Argentina (ARG)        23182428  ...       41182428704      Armenia (ARM)         5129  ...       1112912

However the parameter skiprows allows you to delete one or more rows when you read in the .csv file:

df1 = pd.read_csv('olympics.csv', skiprows = 1)
df1.head()

Unnamed:0Summer01!02!...01!.202!.203!.2Combinedtotal0Afghanistan (AFG)1300...00221Algeria (ALG)1252...528152Argentina (ARG)231824...182428703Armenia (ARM)512...129124Australasia (ANZ) [ANZ]         234...34512

And if you want to skip a bunch of different rows, you can do the following (notice the missing countries):

df2 = pd.read_csv('olympics.csv', skiprows = [0, 2, 3])
df2.head()

Unnamed:0Summer01!02!...01!.202!.203!.2Combinedtotal0Argentina (ARG)231824...182428701Armenia (ARM)512...129122Australasia (ANZ) [ANZ]         234...345123Australia (AUS) [AUS] [Z]        25139152...1441551814804Austria (AUT)261833...77111116304

The header parameter tells you where to start reading in the .csv, which in the following case, does the same thing as skiprows = 1:

# this gives the same result as df1 = pd.read_csv(‘olympics.csv’, skiprows = 1)
df4 = pd.read_csv('olympics.csv', header = 1)
df4.head()

Unnamed:0Summer01!02!...01!.202!.203!.2Combinedtotal0Afghanistan (AFG)1300...00221Algeria (ALG)1252...528152Argentina (ARG)231824...182428703Armenia (ARM)512...129124Australasia (ANZ) [ANZ]         234...34512

However you cannot use the header parameter to skip a bunch of different rows. You would not be able to replicate df2 using the header parameter. Hopefully this clears things up.

Post a Comment for "Header And Skiprows Difference In Pandas Unclear"