Skip to content Skip to sidebar Skip to footer

Python Beautiful Soup And Regex - Double Quotes Not Getting Replaced

I am trying to scrape this website using BeautifulSoup and Regex. While doing so, I encountered a question which was having 'double quotes' and I wanted to replace the 'double quot

Solution 1:

This website includes characters that aren't 'normal' double quote characters i.e. not " U+0022

The site includes right and left double quotation marks unicode U+201C and U+201D

You can replace these:

y = y.replace('"', '')
y = y.replace('“', '')
y = y.replace('”', '')

Solution 2:

I took a look at the website you are trying to scrape. Is " “Blocked” " an exemple of the double quotes you are trying to replace ? If so, look at the difference between my own quote and the ones that i copy pasted from the website. They are not the same character.

You should copy/paste or find the code of the punctuation characters you are trying to replace, because for one sign, there are a variety of characters used on the web and python will make a difference between " and “ and ”.

Hence you should have something like :

y = y.replace('“', '');y = y.replace('”', '');

Since this probably won't be your only problem with punctuation marks, i suggest you do an array with everything you want to replace and then loop on that array.

Post a Comment for "Python Beautiful Soup And Regex - Double Quotes Not Getting Replaced"