Jump to content

404 based on Special Character in URL


natasha_thomas

Recommended Posts

Folks,

 

Requirement:

I want a .htaccess level solution to 404 when the URL contains special characters other than mentioned in the below Rewrite rule:

 

RewriteRule ^([a-zA-Z0-9-!@#$^&*:"<>/?]{4,})\.html$ search.php?q=$1 [QSA,L]

 

So, what i want is, i want to show a 404 when the URL contains anything other than "a-zA-Z0-9-!@#$^&*:"<>/?"

 

 

What i have done:

RewriteRule ^([a-zA-Z0-9-!@#$^&*:"<>/?]{4,})\.html$ search.php?q=$1 [QSA,L]

 

 

Problem:

Its not working with Special characters but working only with English letters and Numerics in URL.

 

Cheers

Natasha T

Link to comment
Share on other sites

Really what you want is a fall through rule for the 404 that follows the working rule, but otherwise broadly matches your pattern.  The existing rule will match what you want, and anything else will be redirected.  Trying to do a negative match is quite tricky and not really the strength of regex.  In this case you what you're really saying is

 

-I could have a valid character OR not

-and some number of invalid characters

-AND some valid characters OR not

-etc

 

This is probably why you're having a hard time crafting something that works.

Link to comment
Share on other sites

Really what you want is a fall through rule for the 404 that follows the working rule, but otherwise broadly matches your pattern.  The existing rule will match what you want, and anything else will be redirected.  Trying to do a negative match is quite tricky and not really the strength of regex.  In this case you what you're really saying is

 

-I could have a valid character OR not

-and some number of invalid characters

-AND some valid characters OR not

-etc

 

This is probably why you're having a hard time crafting something that works.

 

Again many things Gizmola.

 

can you tell me what is the Purpose of  " {4,})" in the above Rewrite rule?

 

Cheers

Link to comment
Share on other sites

Yeah that's a quantifier for the character class that preceeds it (the stuff inside the [] is called a character class).  The character class is defining characters and ranges of characters that the regex is trying to match.  The {} after it is quantifying how many times it is looking to match.  There are a lot of different variations to these quantifiers.  This site is a great reference:  http://www.regular-expressions.info/reference.html

 

That particular quantifer means "most match at least 4 times".

 

That closing paren closes out a capture group for the pattern:

 

([a-zA-Z0-9-!@#$^&*:"/?]{4,})

 

So everything inside the () will be captured together.  This is what gets substituted for the $1 in the rewrite.  Since it's the first captured group (or in this case the only captured group)  It becomes $1.

 

search.php?q=$1

 

 

 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.