Header And Skiprows Difference In Pandas Unclear
Solution 1:
You can follow this article, which explains the difference between the parameters header and skiprows with examples from the olympic dataset, which can be downloaded here.
To summarize: the default behavior for pd.read() is to read in all of the rows, which in the case of this dataset, includes an unnecessary first row of row numbers.
import pandas as pddf= pd.read_csv('olympics.csv')
df.head()
01234  ...       11121314150                NaN  № Summer  01 !  02 !  03 !  ...  № Games  01 !  02 !  03 !  Combined total
1  Afghanistan (AFG)        13002  ...       1300222      Algeria (ALG)        12528  ...       15528153    Argentina (ARG)        23182428  ...       41182428704      Armenia (ARM)         5129  ...       1112912However the parameter skiprows allows you to delete one or more rows when you read in the .csv file:
df1 = pd.read_csv('olympics.csv', skiprows = 1)
df1.head()
Unnamed:0№Summer01!02!...01!.202!.203!.2Combinedtotal0Afghanistan (AFG)1300...00221Algeria (ALG)1252...528152Argentina (ARG)231824...182428703Armenia (ARM)512...129124Australasia (ANZ) [ANZ]         234...34512And if you want to skip a bunch of different rows, you can do the following (notice the missing countries):
df2 = pd.read_csv('olympics.csv', skiprows = [0, 2, 3])
df2.head()
Unnamed:0№Summer01!02!...01!.202!.203!.2Combinedtotal0Argentina (ARG)231824...182428701Armenia (ARM)512...129122Australasia (ANZ) [ANZ]         234...345123Australia (AUS) [AUS] [Z]        25139152...1441551814804Austria (AUT)261833...77111116304The header parameter tells you where to start reading in the .csv, which in the following case, does the same thing as skiprows = 1:
# this gives the same result as df1 = pd.read_csv(‘olympics.csv’, skiprows = 1)
df4 = pd.read_csv('olympics.csv', header = 1)
df4.head()
Unnamed:0№Summer01!02!...01!.202!.203!.2Combinedtotal0Afghanistan (AFG)1300...00221Algeria (ALG)1252...528152Argentina (ARG)231824...182428703Armenia (ARM)512...129124Australasia (ANZ) [ANZ]         234...34512However you cannot use the header parameter to skip a bunch of different rows. You would not be able to replicate df2 using the header parameter. Hopefully this clears things up.
Post a Comment for "Header And Skiprows Difference In Pandas Unclear"