Document new features in --restrict-file-names.

This commit is contained in:
Micah Cowan 2009-07-28 00:19:48 -07:00
parent fb0946c7fc
commit c784c334d3
4 changed files with 53 additions and 20 deletions

View File

@ -1,3 +1,8 @@
2009-07-28 Micah Cowan <micah@cowan.name>
* NEWS: Mention some more previously undocumented items, and the
new "ascii" specifer for --restrict-file-names.
2009-07-27 Petr Pisar <petr.pisar@atlas.cz> 2009-07-27 Petr Pisar <petr.pisar@atlas.cz>
* po/Makevars (MSGID_BUGS_ADDRESS): Fixed. * po/Makevars (MSGID_BUGS_ADDRESS): Fixed.

11
NEWS
View File

@ -1,7 +1,7 @@
GNU Wget NEWS -- history of user-visible changes. GNU Wget NEWS -- history of user-visible changes.
Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008 Free Software Foundation, Inc. 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
See the end for copying conditions. See the end for copying conditions.
Please send GNU Wget bug reports to <bug-wget@gnu.org>. Please send GNU Wget bug reports to <bug-wget@gnu.org>.
@ -41,8 +41,13 @@ an external file.
information on how it was built, and the set of configure-time options information on how it was built, and the set of configure-time options
that were selected. that were selected.
** Several previously existing, but undocumented .wgetrc options ** An "ascii" specifier is now accepted by --restrict-file-names, which
are now documented: save_headers, spider, and user_agent. forces the percent-encoding of all non-ASCII bytes
** Several previously existing, but undocumented .wgetrc options are
now documented: save_headers, spider, and user_agent,
auth_no_challenge, and keep_session_cookies. Also added documentation
for the "lowercase" and "uppercase" values for --restrict-file-names, which had been present since Wget 1.11.
* Changes in Wget 1.11.4 * Changes in Wget 1.11.4

View File

@ -1,3 +1,8 @@
2009-07-28 Micah Cowan <micah@cowan.name>
* wget.texi (Download Options): Document "lowercase", "uppercase",
and the new "ascii" specifier for --restrict-file-names.
2009-07-26 Micah Cowan <micah@cowan.name> 2009-07-26 Micah Cowan <micah@cowan.name>
* wget.texi (Download Options): Change --iri item to --no-iri; * wget.texi (Download Options): Change --iri item to --no-iri;

View File

@ -904,24 +904,36 @@ won't need it.
@cindex file names, restrict @cindex file names, restrict
@cindex Windows file names @cindex Windows file names
@item --restrict-file-names=@var{mode} @item --restrict-file-names=@var{modes}
Change which characters found in remote URLs may show up in local file Change which characters found in remote URLs must be escaped during
names generated from those URLs. Characters that are @dfn{restricted} generation of local filenames. Characters that are @dfn{restricted}
by this option are escaped, i.e. replaced with @samp{%HH}, where by this option are escaped, i.e. replaced with @samp{%HH}, where
@samp{HH} is the hexadecimal number that corresponds to the restricted @samp{HH} is the hexadecimal number that corresponds to the restricted
character. character. This option may also be used to force all alphabetical
cases to be either lower- or uppercase.
By default, Wget escapes the characters that are not valid as part of By default, Wget escapes the characters that are not valid or safe as
file names on your operating system, as well as control characters that part of file names on your operating system, as well as control
are typically unprintable. This option is useful for changing these characters that are typically unprintable. This option is useful for
defaults, either because you are downloading to a non-native partition, changing these defaults, perhaps because you are downloading to a
or because you want to disable escaping of the control characters. non-native partition, or because you want to disable escaping of the
control characters, or you want to further restrict characters to only
those in the @sc{ascii} range of values.
When mode is set to ``unix'', Wget escapes the character @samp{/} and The @var{modes} are a comma-separated set of text values. The
acceptable values are @samp{unix}, @samp{windows}, @samp{nocontrol},
@samp{ascii}, @samp{lowercase}, and @samp{uppercase}. The values
@samp{unix} and @samp{windows} are mutually exclusive (one will
override the other), as are @samp{lowercase} and
@samp{uppercase}. Those last are special cases, as they do not change
the set of characters that would be escaped, but rather force local
file paths to be converted either to lower- or uppercase.
When ``unix'' is specified, Wget escapes the character @samp{/} and
the control characters in the ranges 0--31 and 128--159. This is the the control characters in the ranges 0--31 and 128--159. This is the
default on Unix-like OS'es. default on Unix-like operating systems.
When mode is set to ``windows'', Wget escapes the characters @samp{\}, When ``windows'' is given, Wget escapes the characters @samp{\},
@samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<}, @samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<},
@samp{>}, and the control characters in the ranges 0--31 and 128--159. @samp{>}, and the control characters in the ranges 0--31 and 128--159.
In addition to this, Wget in Windows mode uses @samp{+} instead of In addition to this, Wget in Windows mode uses @samp{+} instead of
@ -932,11 +944,17 @@ name from the rest. Therefore, a URL that would be saved as
saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows
mode. This mode is the default on Windows. mode. This mode is the default on Windows.
If you append @samp{,nocontrol} to the mode, as in If you specify @samp{nocontrol}, then the escaping of the control
@samp{unix,nocontrol}, escaping of the control characters is also characters is also switched off. This option may make sense
switched off. You can use @samp{--restrict-file-names=nocontrol} to when you are downloading URLs whose names contain UTF-8 characters, on
turn off escaping of control characters without affecting the choice of a system which can save and display filenames in UTF-8 (some possible
the OS to use as file name restriction mode. byte values used in UTF-8 byte sequences fall in the range of values
designated by Wget as ``controls'').
The @samp{ascii} mode is used to specify that any bytes whose values
are outside the range of @sc{ascii} characters (that is, greater than
127) shall be escaped. This can be useful when saving filenames
whose encoding does not match the one used locally.
@cindex IPv6 @cindex IPv6
@itemx -4 @itemx -4