找到你要的答案

Q:Why not use the dot (“.”) in url rewrite?

Q:为什么不使用点(“。”)在URL重写?

I noticed that most PHP frameworks routes (all) does not use the written urls dot and the php-standalone-server also does not use dots (note that the php-standalone-server not need mod_rewrite, it usually works). This is a pattern, I avoid dots rewritten urls?

Consider reading the following topics before:

  1. If I running php-standalone-server in this struct folder:

    project
    │   index.php
    │   test.php
    │
    └── blog1
        └── index.html
    

    And access http://localhost:8000/test.php, it show response from ./test.php file. If I access http://localhost:8000/blog1 it show contents from blog1/index.html. This is because this is how the phpstand provides native support for rewrites URLs, but this is not the issue.

    But If I access http://localhost:8000/blog, it show response from ./index.php.

  2. The own stackoverflow is an example that replaces the dots by hyphens, an example is this question:

    • Title: Why not use the dot (“.”) in url rewrite?
    • Url: http://stackoverflow.com/questions/29977809/why-not-use-the-dot-in-url-rewrite
  3. An example of php framework is CodeIgniter-3, is don't allow dots.

The question:

Understand this, I wonder if I should allow dots (".") in url-rewriting or not? Do not use dots is a "standard"?

我注意到大多数的PHP框架路线(都)不使用书面的网址点与PHP的独立服务器,也不使用点(注意,PHP的独立服务器,不需要mod_rewrite,它通常工作)。这是一个模式,我避免点重写URL?

Consider reading the following topics before:

  1. 如果我运行PHP独立服务器在这个结构的文件夹:

    project
    │   index.php
    │   test.php
    │
    └── blog1
        └── index.html
    

    访问HTTP:/ /本地:8000 / test.php,表明反应/ test.php文件。如果我访问HTTP:/ /本地:8000 / blog1它显示blog1 / index.html的内容。这是因为,这是怎样的phpstand重写URL提供原生支持,但这不是问题。

    但如果我访问HTTP:/ /本地:8000 /博客,它显示的响应。/的index.php。

  2. 自己的计算器是一个例子,代替点的连字符,例如这个问题:

    • Title: 为什么不使用点(“。”)在URL重写?
    • Url: http://stackoverflow.com/questions/29977809/why-not-use-the-dot-in-url-rewrite
  3. PHP框架的一个例子是codeigniter-3,是不允许点。

The question:

明白这一点,我不知道我应该允许点(“,”在URL重写或不?不使用点是“标准”?

answer1: 回答1:

Finally found the reason for this "polemic" when I searched (googled) the term rfc dot path

The problem with the dot . in URLs

It's okay to use dots in url (even url-rewritten) such as:

http://example/project/hello-new-world

or assuming that we will create a url false as:

http://example/project/index.php/hello-new-world.html

The problem occurs is when to use so:

http://example/project/test./

To the server /project/test./ and /project/test/ are the same thing, but it is visible that are not.

Note that the problem does NOT occur if you do this /project/.test/, as there are files that start with dot only, like .htaccess

The reason the URLs rewritten not use dots to prevent this or facilitate the canonicalization of URLs (URL normalization).

A clearer example of the problem, create a file on your physical folder on localhost:

/var/www/images/test.jpg

Go to http: //localhost/images/test.jpg and then try to access all of these:

  • http://localhost/images/test.jpg.
  • http://localhost/images/test.jpg...
  • http://localhost/images/test.jpg....
  • http://localhost/images/test.jpg.....
  • http://localhost/images/test.jpg......
  • http://localhost/images/test.jpg.......

All URLs are delivered to the client (web-browser for example) as image test.jpg.

URL normalization (or URL canonicalization)

Normalization of URL (or URL canonicalization) is the process by which URLs are altered and standardized in a consistent manner. The objective of the standardization process is to turn a URL into a standard URL or canonical so you can determine whether two different URLs can be syntactically equivalent.

Search engines use standardization URL in order to attach importance to web pages and reduce indexing of duplicate pages. Crawlers perform normalization URL in order to avoid tracking the same resource more than once.

Types of standardization (the following normalization are described by RFC 3986):

  • Removal of the directory index. Default directory indexes are generally not required in URLs:

    http://www.example.com/a/index.html → http://www.example.com/a/

  • Replacing IP domain name. Verify that the IP address maps to a canonical domain name:

    http://208.77.188.166/ → http://www.example.com/ (something that helps it is the header Host: domain)

  • Removing duplicate cutting paths which include two adjacent bars can be converted to a:

    http://www.example.com/foo//bar.html → http://www.example.com/foo/bar.html

  • Removing or adding www as the first domain label. Both urls often dot to as same pages:

    http://www.example.com/ → http://example.com/

  • Removing the ? when the query is empty. When the query is empty, there may be no need for ?:

    http://www.example.com/display? → http://www.example.com/display

  • Add / to the directories:

    http://www.example.com/alice → http://www.example.com/alice/ (usually the server with Apache and Nginx already do redirection, if a real folder).

    However, there is no way to know if a URL path component is a directory or not. RFC 3986 note that if the URL redirects to the previous URL example, then this is an indication that they are equivalent.

  • Removing segments dots (dot-segments). The segment .. and . It can be removed from a URL according to the algorithm described in RFC 3986:

    http://www.example.com/../a/b/../c/./d.html → http://www.example.com/a/c/d.html

    However, if a removed .. component, e.g. b/.., is a symlink to a directory with a different parent, eliding b/.. will result in a different path and URL. In rare cases depending on the web server, this may even be true for the root directory (e.g. //www.example.com/.. may not be equivalent to //www.example.com/. (this is the likely reason to avoid .)

Then you ask me: I must then avoid the dots in my rewrites URLs?

I say it is a solution, but not the only, if you are using mod_rewrite is probably using a language like PHP by example and through this language you can detect if the URL has dots at the end, eg.:

<IfModule mod_rewrite.c>
    RewriteEngine On

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d

    RewriteRule ^([a-zA-Z0-9\-\/.]+)$ index.php/$1 [QSA,L]
</IfModule>

This RewriteRule generates the variable $_SERVER['PATH_INFO'] and you can compare is variable with the variable $_SERVER['REQUEST_URI'] both will be different. Or you can just use REQUEST_URI combined with rtrim to check and make a permanent redirect, eg.:

<?php
$req = rtrim($_SERVER['REQUEST_URI'], '/');//Remove / of the end of URL.

if ($req !== rtrim($req, '.')) {
    header('X-PHP-Response-Code: 301', true, 301);
}

Sources:

终于找到原因的“论战”当我搜索(Google)术语RFC点路径

The problem with the dot . in URLs

它可以在URL中使用点(甚至URL重写),如:

http://example/project/hello-new-world

或者假设我们将创建一个URL为false:

http://example/project/index.php/hello-new-world.html

出现的问题是什么时候使用:

http://example/project/test./

对服务器/项目/测试/ /项目/测试/是相同的东西,但它是可见的,不是。

请注意,问题不如果你做/项目/发生。测试/,有开头的文件的点,像.htaccess

原因不使用URL重写点防止或促进的URL规范化(URL标准化)。

一个更清晰的问题实例,创建一个文件在您的身体在本地文件夹:

/var/www/images/test.jpg

去HTTP:/ /本地/图像/ test.jpg然后尝试访问所有这些:

  • http://localhost/images/test.jpg.
  • http://localhost/images/test.jpg...
  • http://localhost/images/test.jpg....
  • http://localhost/images/test.jpg.....
  • http://localhost/images/test.jpg......
  • http://localhost/images/test.jpg.......

所有的URL都交付给客户(例如Web浏览器)作为图像test.jpg。

URL normalization (or URL canonicalization)

规范化的URL(或URL标准化)的过程称为URL的改变,以一致的方式标准化。标准化进程的目的是将一个URL为标准的URL或典型所以你可以确定两个不同的网址可以语法相同。

搜索引擎使用标准化的URL,以重视网页和减少索引的重复页面。爬虫执行规范化URL以避免跟踪同一资源不止一次。

(以下类型的标准化规范在RFC 3986中描述的):

  • 目录索引移除。默认目录索引一般不需要在url中:

    http://www.example.com/a/index.html→http://www.example.com/a/

  • 替换IP域名。验证IP地址映射到规范域名:

    http://208.77.188.166/→http://www.example.com/(这有助于它是头主持人:域)

  • 删除包含两个相邻条的重复切割路径可以转换为:

    http://www.example.com/foo//bar.html→http://www.example.com/foo/bar.html

  • 去除或添加WWW作为第一个域标签。两个网址经常指向相同的页面:

    http://www.example.com/→http://example.com/

  • 除去?当查询为空时。当查询为空时,可能不需要?:

    http://www.example.com/display?→http://www.example.com/display

  • 添加到目录:

    http://www.example.com/alice→http://www.example.com/alice/(通常用Apache和Nginx服务器已经做了重定向,如果真正的文件夹)。

    但是,没有办法知道URL路径组件是否是目录。RFC 3986注意如果URL重定向到以前的URL例子,那么这是一个迹象表明,它们是等价的。

  • 去除段点(点段)。这段..和。它可以从一个URL根据RFC 3986中所述算法:

    HTTP:/ / www.example。COM / / / / / C /。/ d.html→http://www.example.com/a/c/d.html

    但是,如果删除..组件,如B,是一个符号链接到一个不同的父目录,省略了B/..将导致不同的路径和URL。在罕见的情况下,根据不同的Web服务器,这可能为根目录是真实的(例如:/ / www.example。COM / ..可能不等于/ www.example。COM /。(这是可能避免的原因。)

然后你问我:我必须避免我重写URL的点?

我说它是一个解决方案,但不是唯一的,如果你使用的是mod_rewrite可能使用的是PHP的语言,通过这种语言你可以看出如果URL末端有点,如:

<IfModule mod_rewrite.c>
    RewriteEngine On

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d

    RewriteRule ^([a-zA-Z0-9\-\/.]+)$ index.php/$1 [QSA,L]
</IfModule>

这个关键词产生变量$ _server [ 'path_info ],你可以比较与变量$ _server [ 'request_uri ]将不同的变量。或者你可以只使用request_uri结合RTrim检查并作永久重定向,如:

<?php
$req = rtrim($_SERVER['REQUEST_URI'], '/');//Remove / of the end of URL.

if ($req !== rtrim($req, '.')) {
    header('X-PHP-Response-Code: 301', true, 301);
}

来源:

php  mod-rewrite  url-rewriting  php-standalone-server