Skip to content Skip to sidebar Skip to footer

Can't Download Video Captions Using Youtube Api V3 In Python

I am trying to download closed captions for this public youtube video (just for testing) https://www.youtube.com/watch?v=Txvud7wPbv4 I am using the code sample(captions.py) below t

Solution 1:

Your app seems overly-complex... it's structured to be able to do everything that can be done w/captions, not just download. That makes it harder to debug, so I wrote an abridged (Python 2 or 3) version that just downloads captions:

# Usage example: $ python captions-download.py Txvud7wPbv4

from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))

def process(vid):
    caption_info = YOUTUBE.captions().list(
            part='id', videoId=vid).execute().get('items', [])
    caption_str = YOUTUBE.captions().download(
            id=caption_info[0]['id'], tfmt='srt').execute()
    caption_data = caption_str.split('\n\n')
    for line in caption_data:
        if line.count('\n') > 1:
            i, cap_time, caption = line.split('\n', 2)
            print('%02d) [%s] %s' % (
                    int(i), cap_time, ' '.join(caption.split())))

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        VID = sys.argv[1]
    process(VID)

The way it works is this:

  1. You pass in the video ID (VID) as the only argument (sys.argv[1])
  2. It uses that VID to look up the caption IDs with YOUTUBE.captions().list()
  3. Assuming the video has (at least) one caption track, I grab its ID (caption_info[0]['id'])
  4. Then it calls YOUTUBE.captions().download() with that caption ID requesting the srttrack format
  5. All individual captions are delimited by double NEWLINEs, so split on 'em
  6. Loop through each caption; there's data if there are at least 2 NEWLINEs in the line, so only split() on the 1st pair
  7. Display the caption#, timeline of when it appears, then the caption itself, changing all remaining NEWLINEs to spaces

When I run it, I get the expected result... here on a video I own:

$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390--> 00:00:09,280] iterator cool but that's cool02) [00:00:09,280--> 00:00:12,280] your the moment03) [00:00:13,380--> 00:00:16,380] and sellers very thrilled
    :

Couple of things...

  1. I think you need to be the owner of the video you're trying to download the captions for.
    • I tried my script on your video, and I get a 403 HTTP Forbidden error
    • Here are other errors you may get from the API
  2. In your case, it looks like something is messing up the video ID you're passing in.
    • It thinks you're giving it <code> and </code> (notice the hex 0x3c & 0x3e values)... rich text?
    • Anyway, this is why I wrote my own, shorter version... so I have a more controlled environment to experiment.

FWIW, since you're new to using Google APIs, I've made a couple of intro videos I made to get developers on-boarded with using Google APIs in this playlist. The auth code is the toughest, so focus on videos 3 and 4 in that playlist to help get you acclimated.

I don't really have any videos that cover YouTube APIs (as I focus more on G Suite APIs) although I do have the one Google Apps Script example (video 22 in playlist); if you're new to Apps Script, you need to review your JavaScript then check out video 5 first. Hope this helps!

Post a Comment for "Can't Download Video Captions Using Youtube Api V3 In Python"