How To Get Googles "fast Answer Box" Text?
Solution 1:
If you watch closely on your Network requests when loading that page you'll see that google fires up another link which contains your data.
Please try to access this in your browser:
https://www.google.com/search?q=definition:+calcium&bav=on.2,or.r_cp.&cad=b&fp=1&biw=1920&bih=984&dpr=1&tch=1&ech=1&psi=1489578048971.3
It'll download a file on which your fastbox data is available. You can search in that file for the chemical element of atomic number
to verify this.
You'll have to clean the file and scrape the data that you want.
Solution 2:
In my opinion, the easiest way is to grab CSS selectors of this text by using the SelectorGadget Chrome extension in combination with select()
or select_one()
beautifulsoup
methods.
Also, the problem could be is that you don't specify a user-agent
. User-agent
used to fake a real user visit, so Google (or other website) don't block a request.
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.google.de/search?q=definition%20calcium', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')
syllables = soup.select_one('.frCXef span').text
phonetic = soup.select_one('.g30o5d span span').text
noun = soup.select_one('.h3TRxf span').text
print(f'{syllables}\n{phonetic}\n{noun}')
# Output:'''
cal·ci·um
ˈkalsēəm
the chemical element of atomic number 20, a soft gray metal.
'''
Alternatively, you can do the same thing using Google Direct Answer Box API from SerpApi, except you don't have to figure out how to grab certain HTML elements. It's a paid API with a free trial of 5,000 searches.
Code to integrate:
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "definition calcium",
"google_domain": "google.com",
}
search = GoogleSearch(params)
results = search.get_dict()
syllables = results['answer_box']['syllables']
phonetic = results['answer_box']['phonetic']
noun = results['answer_box']['definitions'][0] # specifying index since the output is an arrayprint(f'{syllables}\n{phonetic}\n{noun}')
# Output:'''
cal·ci·um
ˈkalsēəm
the chemical element of atomic number 20, a soft gray metal.
'''
Disclaimer, I work for SerpApi.
Solution 3:
SerpApi fully support dictionary results that are inside Google direct answer boxes. For example:
$ curl https://serpapi.com/search.json?q=definition%20calcium&google_domain=google.de
...
"answer_box": {
"type": "dictionary_results",
"syllables": "cal·ci·um",
"phonetic": "/ˈkalsēəm/",
"word_type": "noun",
"definitions": [
"the chemical element of atomic number 20, a soft gray metal."
]
},
...
Some documentation for dictionary results are here: https://serpapi.com/direct-answer-box-api
Post a Comment for "How To Get Googles "fast Answer Box" Text?"