Parsing A Genbank File Format With Biopython's Seqio
I'm trying to parse a protein genbank file format, Here's an example file (example.protein.gpff) LOCUS NP_001346895 208 aa linear PRI 20-JAN-2018 DEF
Solution 1:
Check out the Genebank-parser library. It accepts a genebank filename and the batch size; next_batch
yields as many number of records as batch_size specifies.
Solution 2:
Seems like the easiest way to deal with this file format is to convert it to a JSON
format (for example, using Bio), and then read it with various JSON
parsers (like the rjson package in R
, which parses a JSON
file to a list
of record
s)
Post a Comment for "Parsing A Genbank File Format With Biopython's Seqio"