Can't Download Video Captions Using Youtube Api V3 In Python
Solution 1:
Your app seems overly-complex... it's structured to be able to do everything that can be done w/captions, not just download. That makes it harder to debug, so I wrote an abridged (Python 2 or 3) version that just downloads captions:
# Usage example: $ python captions-download.py Txvud7wPbv4
from __future__ import print_function
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))
def process(vid):
caption_info = YOUTUBE.captions().list(
part='id', videoId=vid).execute().get('items', [])
caption_str = YOUTUBE.captions().download(
id=caption_info[0]['id'], tfmt='srt').execute()
caption_data = caption_str.split('\n\n')
for line in caption_data:
if line.count('\n') > 1:
i, cap_time, caption = line.split('\n', 2)
print('%02d) [%s] %s' % (
int(i), cap_time, ' '.join(caption.split())))
if __name__ == '__main__':
import sys
if len(sys.argv) == 2:
VID = sys.argv[1]
process(VID)
The way it works is this:
- You pass in the video ID (VID) as the only argument (
sys.argv[1]
) - It uses that VID to look up the caption IDs with
YOUTUBE.captions().list()
- Assuming the video has (at least) one caption track, I grab its ID (
caption_info[0]['id']
) - Then it calls
YOUTUBE.captions().download()
with that caption ID requesting thesrt
track format - All individual captions are delimited by double NEWLINEs, so split on 'em
- Loop through each caption; there's data if there are at least 2 NEWLINEs in the line, so only
split()
on the 1st pair - Display the caption#, timeline of when it appears, then the caption itself, changing all remaining NEWLINEs to spaces
When I run it, I get the expected result... here on a video I own:
$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390--> 00:00:09,280] iterator cool but that's cool02) [00:00:09,280--> 00:00:12,280] your the moment03) [00:00:13,380--> 00:00:16,380] and sellers very thrilled
:
Couple of things...
- I think you need to be the owner of the video you're trying to download the captions for.
- I tried my script on your video, and I get a 403 HTTP Forbidden error
- Here are other errors you may get from the API
- In your case, it looks like something is messing up the video ID you're passing in.
- It thinks you're giving it
<code>
and</code>
(notice the hex 0x3c & 0x3e values)... rich text? - Anyway, this is why I wrote my own, shorter version... so I have a more controlled environment to experiment.
- It thinks you're giving it
FWIW, since you're new to using Google APIs, I've made a couple of intro videos I made to get developers on-boarded with using Google APIs in this playlist. The auth code is the toughest, so focus on videos 3 and 4 in that playlist to help get you acclimated.
I don't really have any videos that cover YouTube APIs (as I focus more on G Suite APIs) although I do have the one Google Apps Script example (video 22 in playlist); if you're new to Apps Script, you need to review your JavaScript then check out video 5 first. Hope this helps!
Post a Comment for "Can't Download Video Captions Using Youtube Api V3 In Python"