* doc/wget.texi: Add description for --retry-on-http-error
* src/http.c (gethttp):
Consider given HTTP response codes as non-fatal, transient errors.
Supply a comma-separated list of 3-digit HTTP response codes as
argument. Useful to work around special circumstances where retries
are required, but the server responds with an error code normally not
retried by Wget. Such errors might be 503 (Service Unavailable) and
429 (Too Many Requests). Retries enabled by this option are performed
subject to the normal retry timing and retry count limitations of
Wget.
Using this option is intended to support special use cases only and is
generally not recommended, as it can force retries even in cases where
the server is actually trying to decrease its load. Please use it
wisely and only if you know what you are doing.
Example use and a starting point for manual testing:
wget --retry-on-http-error=429,503 http://httpbin.org/status/503
* src/http.c (metalink_from_http): Process the Content-Type header.
Add an application/metalink4+xml URL as metalink metaurl. If the
option opt.content_disposition is true, the Content-Disposition's
filename is the metaurl's name
* doc/wget.texi: Update --content-disposition and --metalink-over-http
* doc/metalink-standard.txt: Update doc. Content-Type/Disposition
processing through --metalink-over-http. Update download naming
system about --trust-server-names and --content-disposition
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-http-xml-type.py: New file. Metalink/HTTP
Content-Type/Disposition header automated Metalink/XML tests
* testenv/Test-metalink-http-xml-type-trust.py: New file. Metalink/HTTP
Content-Type/Disposition header with --trust-server-names automated
Metalink/XML tests
* testenv/Test-metalink-http-xml-type-content.py: New file. Metalink/HTTP
Content-Type/Disposition header with --content-disposition automated
Metalink/XML tests
* testenv/Test-metalink-http-xml-type-trust-content.py: New file.
Metalink/HTTP Content-Type/Disposition header with --trust-server-names
and --content-disposition automated Metalink/XML tests
Process the Content-Type header, identify an application/metalink4+xml
file. The Content-Disposition could provide an alternate name through
the "filename" field for the metalink xml file. Respectively, the cli
options --metalink-over-http and --content-disposition are required.
When Metalink/XML auto-processing, to use the Content-Disposition's
filename, the cli option --trust-server-names is also required.
* src/metalink.c (retrieve_from_metalink): If opt.trustservernames is
true, use the basename of the metaurl's name to save the xml file
* doc/metalink-standard.txt: Update doc. With --trust-server-names any
Metalink/HTTP Link application/metalink4+xml file is saved using the
basename of the "name" field, if any. Update Metalink/HTTP examples
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-http-xml-trust-name.py: New file. Metalink/HTTP
automated Metalink/XML, save xml files using the "name" field tests
* NEWS: Mention the effect of --metalink-index over Metalink
* src/init.c: Add new option metalinkindex (opt.metalink_index),
initialize to -1
* src/main.c: Add new option metalink-index (--metalink-index=NUMBER)
* src/options.h: Add new option metalink_index (int)
* src/metalink.h: Add declaration of functions fetch_metalink_file(),
replace_metalink_basename()
* src/metalink.c: Add functions fetch_metalink_file() simple file
fetch, replace_metalink_basename() replace file basename
* src/metalink.c (retrieve_from_metalink): New. Process Metalink
application/metalink4+xml of opt.metalink_index ordinal number
* doc/wget.texi: Add new option metalink-index (--metalink-index)
documentation
* doc/metalink-standard.txt: Updated doc. Add documentation about
Metalink application/metalink4+xml metaurls download naming system
* doc/metalink-standard.txt: Update Metalink/XML and HTTP examples
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-http-xml.py: New file. Metalink/HTTP automated
Metalink/XML "application/metalink4+xml" --metalink-index tests
* testenv/Test-metalink-http-xml-trust.py: New file. Metalink/HTTP
automated Metalink/XML "application/metalink4+xml" --metalink-index
retrieval with --trust-server-names tests
WARNING: Do not use lib/dirname.c (dir_name) to get the directory
name, it may append a dot '.' character to the directory name.
* NEWS: Mention the effect of --trust-server-names over Metalink
* src/metalink.h: Add declaration of function append_suffix_number()
* src/metalink.c: Add function append_suffix_number() append number to
string
* src/metalink.c (retrieve_from_metalink): Safer Metalink/XML and
Metalink/HTTP download naming system, opt.trustservernames based
* doc/metalink-standard.txt: Update doc. Explain new Metalink/XML and
Metalin/HTTP download naming system and --trust-server-names role
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-xml-continue.py: Update test. Metalink/XML
continue/keep existing files (HTTP 416) with --continue tests
* testenv/Test-metalink-xml.py: Update test. Metalink/XML naming tests
* testenv/Test-metalink-xml-trust.py: New file. Metalink/XML naming
tests with --trust-server-names
* testenv/Test-metalink-xml-abspath.py: Update test. Metalink/XML
absolute path tests
* testenv/Test-metalink-xml-abspath-trust.py: New file. Metalink/XML
absolute path tests with --trust-server-names
* testenv/Test-metalink-xml-relpath.py: Update test. Metalink/XML
relative path tests
* testenv/Test-metalink-xml-relpath-trust.py: New file. Metalink/XML
relative path tests with --trust-server-names
* testenv/Test-metalink-xml-homepath.py: Update test. Metalink/XML
home path and ~ (tilde) tests
* testenv/Test-metalink-xml-homepath-trust.py: New file. Metalink/XML
home path and ~ (tilde) tests with --trust-server-names
* testenv/Test-metalink-xml-prefix.py: New file. Metalink/XML naming
tests with --directory-prefix
* testenv/Test-metalink-xml-prefix-trust.py: New file. Metalink/XML
naming tests with --directory-prefix and --trust-server-names
* testenv/Test-metalink-xml-absprefix.py: New file. Metalink/XML
absolute --directory-prefix tests
* testenv/Test-metalink-xml-absprefix-trust.py: New file. Metalink/XML
absolute --directory-prefix tests with --trust-server-names
* testenv/Test-metalink-xml-relprefix.py: New file. Metalink/XML
relative --directory-prefix tests
* testenv/Test-metalink-xml-relprefix-trust.py: New file. Metalink/XML
relative --directory-prefix tests with --trust-server-names
* testenv/Test-metalink-xml-homeprefix.py: New file. Metalink/XML home
--directory-prefix tests
* testenv/Test-metalink-xml-homeprefix-trust.py: New file. Metalink/XML
home --directory-prefix tests with --trust-server-names
The option --trust-server-names allows to use the file names parsed
from a Metalink/XML file. Without --trust-server-names, the safety
mechanism provides secure and predictable file names.
* NEWS: Mention the use of a safe Metalink destination path
* src/metalink.h: Add declaration of functions get_metalink_basename(),
last_component(), metalink_check_safe_path()
* src/metalink.c: Add directive #include "dosname.h"
* src/metalink.c: Add function get_metalink_basename() to return the
basename of a file name, strip w32's drive letter prefixes
* src/metalink.c (retrieve_from_metalink): Enforce Metalink file name
verification, if the file name is unsafe try its basename
* doc/metalink.txt: Update document. Explain --directory-prefix
The function get_metalink_basename() uses FILE_SYSTEM_PREFIX_LEN to
catch any 'C:D:file' (w32 environment), then it removes each drive
letter prefix, i.e. 'C:' and 'D:'.
Unsafe file names contain an absolute, relative, or home path. Safe
paths can be verified by libmetalink's metalink_check_safe_path().
* NEWS: Mention the effect of --directory-prefix over Metalink
* src/metalink.c (retrieve_from_metalink): Add opt.dir_prefix as
prefix to the metalink:file name mfile->name
* doc/metalink.txt: Update document. Explain --directory-prefix
When --directory-prefix=<prefix> is used, set the top of the retrieval
tree to prefix. The default is . (the current directory). Metalink/XML
and Metalink/HTTP files will be downloaded under prefix.
* NEWS: Mention Metalink's file size verification
* src/metalink.c (retrieve_from_metalink): Add file size computation
* doc/metalink.txt: Update document. Remove resolved bugs
Reject downloaded files when they do not agree with their Metalink/XML
metalink:size: https://tools.ietf.org/html/rfc5854#section-4.2.14
At the moment of writing, Metalink/HTTP headers do not provide a file
size field. This information could be obtained from the Content-Length
header field: https://tools.ietf.org/html/rfc6249#section-7
* NEWS: Mention the Metalink "path/file" name format handling
* src/metalink.c (retrieve_from_metalink): Fix NULL filename, set
filename to the right "path/file" value
* src/metalink.c (retrieve_from_metalink): Fix NULL output_stream, set
output_stream to filename when it is created by retrieve_url()
* src/metalink.c (retrieve_from_metalink): Add RFC5854 comments about
proper metalink:file "path/file" name format handling
* doc/metalink.txt: Update document. Remove resolved bugs
If unique_create() cannot create/open the destination file, filename
and output_stream remain NULL. If fopen() is used instead, filename
always remains NULL. Both functions cannot create "path/file" trees.
Setting filename to the right value is sufficient to prevent SIGSEGV
generating from testing a NULL value. This also allows retrieve_url()
to create a "path/file" tree through opt.output_document.
Reading NULL as output_stream, when it shall not be, leads to wrong
results. For instance, a non-NULL output_stream tells when a stream
was interrupted, reading NULL instead means to assume the contrary.
This patch conforms to the RFC5854 specification:
The Metalink Download Description Format
4.1.2.1. The "name" Attribute
https://tools.ietf.org/html/rfc5854#section-4.1.2.1
* doc/metalink.txt
Evaluation of "Directory Options" on the command line interacting with
the option '--input-metalink=file':
$ wget --input-metalink=file <directory options>
* doc/wget.texi: Add --use-askpass to documentation.
* src/init.c: Add cmd_use_askpasss to set opt.use_askpass based on
argument, WGET_ASKPASS, and SSH_ASKPASS environment variables.
opt.wget-askpass is freed in cleanup ()
* src/main.c: Update options & add spawn process of opt.use_askpass
command.
* src/options.h: Addition of string use_askpass.
* src/url.c: Function scheme_leading_string to access the leading
string of a parsed url.
* src/url.h: Prototype for scheme_leading_string for returning the
leading string.
* bootstrap.conf: Add posix_spawn to gnulib_modules
This adds the --use-askpass option which is disabled by default.
--use-askpass=COMMAND will request the username and password for a given
URL by executing the external program COMMAND. If COMMAND is left
blank, then the external program in the environment variable
WGET_ASKPASS will be used. If WGET_ASKPASS is not set then the
environment variable SSH_ASKPASS is used. If there is no value set, an
error is returned. If an error occurs requesting the username or
password, wget will exit.
Signed-off-by: Liam R. Howlett <Liam.Howlett@WindRiver.com>
* wget.texi: Replace server.com by example.com,
replace ftp://wuarchive.wustl.edu by https://example.com,
use HTTPS instead of HTTP where possible,
fix list archive reference,
remove reference to wget-notify@addictivecode.org,
change bugtracker URL to bugtracker on Savannah,
replace yoyodyne.com by example.com,
fix URL to VMS port
* README.checkout: Add description for libares
* configure.ac: Add check for libares
* doc/wget.texi: Add docs for the new options
* src/build_info.c.in: Add +/-cares for --version output
* src/host.c:
(merge_address_lists): New static function
(address_list_from_hostent): New static function
(wait_ares): New static function
(callback): New static function
(lookup_host): Add libares resolver code
* src/init.c: Add new options,
(cleanup): Add cleanup code
* src/main.c: Add global libares channel variable
(cmdline_option option_data): Add new options
(print_help): Add short descriptions
(main): Add libares init code
* src/options.h (struct options): Add option members
The new options allow to specify alternative DNS servers and
an alternate packet route for the resolver packets.
Wget has to built with libares, enabled at configure time by
./configure --with-cares.
* src/convert.c (convert_links_in_hashtable, convert_links):
test for CO_CONVERT_BASENAME_ONLY.
(convert_basename): new function.
* src/convert.h: new constant CO_CONVERT_BASENAME_ONLY.
* src/init.c, src/main.c, src/options.h: new option "--convert-file-only".
* doc/wget.texi: updated documentation.
Reviewed-by: Gabriel Somlo <somlo@cmu.edu>
* doc/wget.texi: updated documentation to reflect the new FTPS functionality.
* src/ftp-basic.c (ftp_greeting): new function to read the server's greeting.
(ftp_login): greeting code was previously here. Moved to ftp_greeting to
support FTPS implicit mode.
(ftp_auth): wrapper around the AUTH TLS command.
(ftp_ccc): wrapper around the CCC command.
(ftp_pbsz): wrapper around the PBSZ command.
(ftp_prot): wraooer around the PROT command.
* src/ftp.c (get_ftp_greeting): new static function.
(init_control_ssl_connection): new static function to start SSL/TLS on the
control channel.
(getftp): added hooks to support FTPS commands (RFCs 2228 and 4217).
(ftp_loop_internal): test for new FTPS error codes.
* src/ftp.h: new enum 'prot_level' with available FTPS protection levels +
prototypes of previous functions. New flag for enum 'wget_ftp_fstatus' to track
whether the data channel has some security mechanism enabled or not.
* src/gnutls.c (struct wgnutls_transport_context): new field 'session_data'.
(wgnutls_close): free GnuTLS session data before exiting.
(ssl_connect_wget): save/resume SSL/TLS session.
* src/http.c (establish_connection): refactor ssl_connect_wget call.
(metalink_from_http): take into account SCHEME_FTPS as well.
* src/init.c, src/main.c, src/options.h: new command line/wgetrc options.
(main): in recursive downloads, check for SCHEME_FTPS as well.
* src/openssl.c (struct openssl_transport_context): new field 'sess'.
(ssl_connect_wget): save/resume SSL/TLS session.
* src/retr.c (retrieve_url): check new scheme SCHEME_FTPS.
* src/ssl.h (ssl_connect_wget): refactor. New parameter of type 'int *'.
* src/url.c. src/url.h: new scheme SCHEME_FTPS.
* src/wget.h: new FTPS error codes.
* src/metalink.h: support FTPS scheme.
* main.c: Add "--rejected-log" option.
* init.c: Add "rejectedlog" command.
* options.h: Add "rejected_log" parameter string.
* wget.texi: Add brief documentation on new --rejected-log option.
* recur.c: Optionally log details of URLs not traversed.
Add reject_reason enum.
(download_child_p -> download_child): Return a reject_reason.
(descend_redirect_p -> descend_redirect): Return a reject_reason.
(retrieve_tree): Support logging reasons for rejection.
Add write_reject_log_header that writes a CSV format header to a file.
Add write_reject_log_url that writes a url struct to a file in CSV format.
Add write_reject_log_reason that writes the URL and parent URL as well as the
rejection reason to a CSV file.
* Test--rejected-log.px: Add a basic test for the --rejected-log command.
* tests/Makefile.am: Run Test--rejected-log.px.
This allows you to figure out why URLs are being rejected and some context
around it. CSV is used as the output format since it can be used easily parsed,
it's delimited by tabs instead of commas to allow using all (quoted) URL
characters and includes column names which may be used for compatibility.
* doc/wget.text: Add information about --preferred-location.
* src/init.c: Add --preferred-location option.
* src/main.c (option_data): Handle --preferred-location argument.
(main): Sort resources based on location if requested.
* src/metalink.c (metalink_res_cmp): Compare based on location if
priority and preference are equal.
* src/options.h (options): Add preferred_location option.
Wget considers the file mentioned in the --post-file argument as a
binary file and does not strip any control characters. The lack of this
information in the documentation can cause a lot of headaches debugging
for a simple issue
This commit causes the --show-progress option to print the progress bar
to stderr even when a logfile was explicitly provided on the command
line. Such a combination allows a user to log the output of Wget while
simultaneously keeping track of the download status.
This reverts commit fcd3b3c473.
Turns out that removing the ChangeLog files causes the Wget build to
fail. While this issue is investigated and sorted out, the commit is
reversed to allow people to be able to build Wget from master
From v1.16.1 onwards, Wget no longer maintains an active ChangeLog file.
Instead the ChangeLog will be automatically generated on each release
through gnulib's gitlog-to-changelog script. However, the old versions
of the ChangeLog files are retained for reference. These files are
renamed with a .pre-gitlog appended to their filenames.
Also removed ChangeLog.README file which is not required anymore
Wget was susceptible to a symlink attack which could create arbitrary
files, directories or symbolic links and set their permissions when
retrieving a directory recursively through FTP. This commit changes the
default settings in Wget such that Wget no longer creates local symbolic
links, but rather traverses them and retrieves the pointed-to file in
such a retrieval.
The old behaviour can be attained by passing the --retr-symlinks=no
option to the Wget invokation command.