Python Beautiful Soup And Regex - Double Quotes Not Getting Replaced
Solution 1:
This website includes characters that aren't 'normal' double quote characters i.e. not "
U+0022
The site includes right and left double quotation marks unicode “
”
U+201C and U+201D
You can replace these:
y = y.replace('"', '')
y = y.replace('“', '')
y = y.replace('”', '')
Solution 2:
I took a look at the website you are trying to scrape. Is " “Blocked” " an exemple of the double quotes you are trying to replace ? If so, look at the difference between my own quote and the ones that i copy pasted from the website. They are not the same character.
You should copy/paste or find the code of the punctuation characters you are trying to replace, because for one sign, there are a variety of characters used on the web and python will make a difference between " and “ and ”.
Hence you should have something like :
y = y.replace('“', '');y = y.replace('”', '');
Since this probably won't be your only problem with punctuation marks, i suggest you do an array with everything you want to replace and then loop on that array.
Post a Comment for "Python Beautiful Soup And Regex - Double Quotes Not Getting Replaced"