Regex Working Wrong, Matching Unexpected Things
Solution 1:
If you really want to do this right, I would start over and first establish the pattern you intend to match, which is not evident by your regex and wasn't in your question - only what you didn't want. You need to look at it as "what do I want?", not "what am I trying to exclude?". Do it as "what do I want?" and what you require will eliminate all those nasty other possibilities.
You must first decide what you will accept as a "valid phone number". Remember that even within the NANP (North American Numbering Plan), there are a few different formats that go like:
- XXX-XXX-XXXX or
- XXX XXX-XXXX or
- 1-XXX-XXX-XXXX or
- (XXX) XXX-XXXX or
- +(XXX) XXX-XXXX or
- 1 (XXX) XXX-XXXX
And all of those are valid numbers, so you'd have to decide the format you'll accept. And then there are different formats throughout the rest of the world of varying lengths from 9 (Portugal) to 13 (South Korea) digits, including the country and international code. So you have to decide:
- Will you be accepting only NANP numbers, or other numbers outside of that standard?
- Will you accept "+" or make them write the international code? Will the code for the countries your user will be inputting have a set number of digits, if you require the code? If they enter the code, will your regex be able to handle (if acceptable) or red-flag (if not acceptable) that?
- Will you enforce parenthesis around the area code, simply allow them (make parenthesis optional), or outright refuse them?
And on that last one, note that different countries have parenthesis in different places in their numbers, i.e. Mexico has 2 digit area codes (and is not in NANP, btw).
And remember, each time you make these kinds of decisions that require a character somewhere, you are negating other possible, valid phone numbers, unless you allowed the other valid characters in that slot, too. That is why there is no one-size fits all solution to your problem. For that reason, many will tell you to just strip out "+", "(", ")", "-", and then count the digits. But this fails if you consider "1" in a NANP number to be required, but someone doesn't include it (because it's generally optional within the NANP), or when different countries have different numbers of digits in their numbers -- even within their own country, like New Zealand.
There is this so-called comprehensive guide: A comprehensive regex for phone number validation
But I found it horribly lacking in addressing things like how to make a person enter "+" vs. "1" and a space (for NANP numbers), how to enforce parenthesis, hyphens, etc. It gives you the regexs rather than explaining how to get you there. Hence my "blog", here, for an answer.
The following is my strict NANP regex I use that will accept:
- +(XXX) XXX-XXXX
- 1 (XXX) XXX-XXXX
- (XXX) XXX-XXXX
It requires parenthesis and the hyphen, which for NANP numbers, I believe, gives a lot of good flexibility while still conforming to a standard. I do not deal in international (outside NANP) numbers, fortunately:
/^(\+|1\s)?[(][2-9]\d{2}[)][\s][2-9]\d{2}-\d{4}$/
/^
= Match at the start of word; basically just indicates the start of the expression
(\+|1\s)? group
- Parenthesis are used to offset the group, and say that any of the characters inside are optional, via the
?
at the end, and allow the "or" condition inside (see pipe character) \+
= Escaped "+", to allow matching on the plus sign (must be escaped as it is a key character in regex, using the backslash)|
= Pipe character, to say it should match either what is to the left or to the right, within the group1\s
= Requires the number 1 and a space character. Space is not made required by[]
- this has not worked for me, though I've seen other posts that seem to indicate it does. It is\s
.
[(]
= This is how you indicate that the open parenthesis is required.
[2-9]\d{2} group
[2-9]
= This is to make the expression match on a digit 2-9. This is because in NANP, 0 and 1 are invalid numbers in the start of an area code (first set of 3 numbers) or in the phone exchange (second set of 3 numbers).\d{2}
= This says to allow 2 digits, from 0-9. This is shorthand for[0-9][0-9]
.
For a three-digit group from 000-999, you would simply say: \d{3}
[)]
= This is how you indicate that the closed parenthesis is required.
[\s]
= This will require a space.
[2-9]\d{2}-\d{4} group
- This first part, before the hyphen, is the same as before.
- Setting the hyphen will require it, here. If you put
-?
, it would be optional. \d{4}
= This says to allow 4 digits, from 0-9. This is shorthand for[0-9][0-9][0-9][0-9]
$/
= Says to match to the end of the word; basically just indicates the end of the expression.
Hopefully this will help you to build your expression.
Post a Comment for "Regex Working Wrong, Matching Unexpected Things"