* NEWS: Mention the effect of --directory-prefix over Metalink
* src/metalink.c (retrieve_from_metalink): Add opt.dir_prefix as
prefix to the metalink:file name mfile->name
* doc/metalink.txt: Update document. Explain --directory-prefix
When --directory-prefix=<prefix> is used, set the top of the retrieval
tree to prefix. The default is . (the current directory). Metalink/XML
and Metalink/HTTP files will be downloaded under prefix.
* src/metalink.c (retrieve_from_metalink): Change mfile->name to
filename when referring to the downloaded file
The file name could have been changed by unique_create() (or by any
other mean) before downloading. Use the name of the downloaded file
(filename) when printing output which refer to it.
* NEWS: Mention Metalink's file size verification
* src/metalink.c (retrieve_from_metalink): Add file size computation
* doc/metalink.txt: Update document. Remove resolved bugs
Reject downloaded files when they do not agree with their Metalink/XML
metalink:size: https://tools.ietf.org/html/rfc5854#section-4.2.14
At the moment of writing, Metalink/HTTP headers do not provide a file
size field. This information could be obtained from the Content-Length
header field: https://tools.ietf.org/html/rfc6249#section-7
* testenv/Test-metalink-xml-relpath.py: Update test
* testenv/Test-metalink-xml-homepath.py: New file. Reject home paths
* testenv/Makefile.am: Add new file
When --input-metalink=<file> is used, each metalink:file name is
verified by libmetalink's metalink_check_safe_path(). By design,
absolute, relative and home paths are rejected.
At the moment of writing, when --metalink-over-http is used, absolute,
relative, and home paths aren't a concern. The destination file name
is a combination of URL's file name and cli's "Directory Options"
handled by src/url.c (url_file_name).
* NEWS: Mention the effects of --continue over Metalink
* src/metalink.c (retrieve_from_metalink): On download error, resume
output_stream with the next mres->url. Keep fully downloaded files
started with --continue, otherwise rename/remove the file
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-xml-continue.py: New file. Metalink/XML
continue/keep existing files (HTTP 416) with --continue tests
Before this patch, with --continue, existing and/or fully retrieved
files which fail the sanity tests were renamed (--keep-badhash), or
removed.
This patch ensures that --continue doesn't rename/remove existing
and/or fully retrieved files (HTTP 416) which fail the sanity tests.
* NEWS: Mention the Metalink "path/file" name format handling
* src/metalink.c (retrieve_from_metalink): Fix NULL filename, set
filename to the right "path/file" value
* src/metalink.c (retrieve_from_metalink): Fix NULL output_stream, set
output_stream to filename when it is created by retrieve_url()
* src/metalink.c (retrieve_from_metalink): Add RFC5854 comments about
proper metalink:file "path/file" name format handling
* doc/metalink.txt: Update document. Remove resolved bugs
If unique_create() cannot create/open the destination file, filename
and output_stream remain NULL. If fopen() is used instead, filename
always remains NULL. Both functions cannot create "path/file" trees.
Setting filename to the right value is sufficient to prevent SIGSEGV
generating from testing a NULL value. This also allows retrieve_url()
to create a "path/file" tree through opt.output_document.
Reading NULL as output_stream, when it shall not be, leads to wrong
results. For instance, a non-NULL output_stream tells when a stream
was interrupted, reading NULL instead means to assume the contrary.
This patch conforms to the RFC5854 specification:
The Metalink Download Description Format
4.1.2.1. The "name" Attribute
https://tools.ietf.org/html/rfc5854#section-4.1.2.1
* doc/metalink.txt
Evaluation of "Directory Options" on the command line interacting with
the option '--input-metalink=file':
$ wget --input-metalink=file <directory options>
* testenv/Test-metalink-http.py: Use python .replace
* testenv/Test-metalink-xml.py: Use python .replace
* testenv/Test-metalink-xml-abspath.py: Use python .replace
* testenv/Test-metalink-xml-relpath.py: Use python .replace
Use python .replace instead than re.sub, remove 'import re'.
* testenv/Test-metalink-xml-abspath.py: Change Metalink/XML v3 file
name from test.meta4 into test.metalink
* testenv/Test-metalink-xml-relpath.py: Change Metalink/XML v3 file
name from test.meta4 into test.metalink
* testenv/Test-metalink-xml.py: Change Metalink/XML v3 file name from
test.meta4 into test.metalink
* src/html-url.c (tag_handle_img): Check append_url() for NULL
return value before dereference.
Crashed reproducable with parsing srcset="data:..." inline data.
Reported-by: Coverity
* src/http.c: Add const to first param of initialize_request(),
initialize_proxy_configuration(), establish_connection(),
check_file_output(), check_auth(), gethttp(), http_loop().
* src/http.h: Add const to first param of http_loop().
* src/connect.c (connect_to_ip): Check return value of setsockopt.
* src/ftp.c (ftp_retrieve_list): Check return value of chmod.
* src/http.c (digest_authentication_encode): Cleanup code.
* src/init.c (setval_internal): Explicitely check comind range.
* src/main.c (main): Explicitely check optarg.
* src/retr.c (retr_rate): Use snprintf instead sprintf,
(retrieve_from_file): More verbose error message,
(rotate_backups): Use snprintf instead sprintf, check return
value of rename().
* src/url.c (mkalldirs): Check return value of unlink().
* src/utils.c (strdupdelim): Explicitely check beg and end for NULL,
(merge_vecs): Fix sizeof argument to char *,
(stable_sort): Use malloc instead of alloca.
* bootstrap.conf: Add xmemdup0 and strpbrk.
* src/init.c (cmd_use_askpass): Add 'const' to char *,
remove check for file existence.
* src/main.c (run_use_askpass): C89 compat init of argv,
added \n to error messages,
fixed stripping of \n and \r from input,
make run_use_askpass and use_askpass static.
* doc/wget.texi: Add --use-askpass to documentation.
* src/init.c: Add cmd_use_askpasss to set opt.use_askpass based on
argument, WGET_ASKPASS, and SSH_ASKPASS environment variables.
opt.wget-askpass is freed in cleanup ()
* src/main.c: Update options & add spawn process of opt.use_askpass
command.
* src/options.h: Addition of string use_askpass.
* src/url.c: Function scheme_leading_string to access the leading
string of a parsed url.
* src/url.h: Prototype for scheme_leading_string for returning the
leading string.
* bootstrap.conf: Add posix_spawn to gnulib_modules
This adds the --use-askpass option which is disabled by default.
--use-askpass=COMMAND will request the username and password for a given
URL by executing the external program COMMAND. If COMMAND is left
blank, then the external program in the environment variable
WGET_ASKPASS will be used. If WGET_ASKPASS is not set then the
environment variable SSH_ASKPASS is used. If there is no value set, an
error is returned. If an error occurs requesting the username or
password, wget will exit.
Signed-off-by: Liam R. Howlett <Liam.Howlett@WindRiver.com>
* testenv/Test-recursive-basic.py: New file. Test basic recursion
* testenv/Test-recursive-include.py: New File. Recursion test with
include directories
* testenv/Test-redirect.py: New File. Basic redirection tests
* testenv/Makefile.am: Add new tests to makefile
* configure.ac: Add --with-psl-file to set a PSL file
* src/cookies.c (check_domain_match): Load PSL_FILE with
fallback to built-in data.
This change allows package maintainers to make Wget use the latest
PSL (DAFSA or plain text), without updating libpsl itself.
E.g. Debian now comes with a DAFSA binary within the 'publicsuffix'
package which allows very fast loading (no parsing or processing needed).
* src/cookies.c (cookie_header): Use heap instead of stack.
* src/http.c (request_send): Likewise.
If wget has to handle an insanely large amount of cookies (~700,000 on
32 bit systems or ~530,000 on 64 bit systems), the stack is not large
enough to hold these pointers, leading to undefined behaviour according
to POSIX; expect a segmentation fault in real life. ;)
Signed-off-by: Tobias Stoeckmann <tobias@stoeckmann.org>
The signal handler for SIGALRM calls longjmp, but the handler is
installed before the jump target has been initialized. If another
process sends SIGALRM right between handler installation and target
initialization, the jump leads to undefined behavior.
This can easily be fixed by moving the signal handler installation
into the "SETJMP == 0" conditional block, which means that the target
has just been initialized.
* src/utils.c: call signal after SETJMP.
Signed-off-by: Tobias Stoeckmann <tobias@stoeckmann.org>
* src/init.c: Remove hyphens from command names
* src/main.c: Likewise
Options with hyphens (or underscores) in their command name cannot be
set in a wgetrc file.
Signed-off-by: Jeffery To <jeffery.to@gmail.com>
* src/metalink.c (retrieve_from_metalink): Continue file download if
opt.always_rest is true
Without --continue, download as a new file with an unique name (this
conforms to the old behaviour).
* bootstrap.conf: Add crypto/md2, and crypto/md4
* src/metalink.c (retrieve_from_metalink): Add md2, and md4 support
This patch adds support for the deprecated (insecure) md2, and md4
Message-Digest algorithms to the Metalink module.
* bootstrap.conf: Add crypto/sha512
* src/metalink.c (retrieve_from_metalink): Add md5, sha1, sha224,
sha384, and sha512 support
Metalink's checksum verification was limited to sha256. This patch
adds support for md5, sha1, sha224, sha384, and sha512.
* configure.ac: Check for xattr availability
* src/Makefile.am: Add xattr.c
* src/ftp.c: Include xattr.h.
(getftp): Set attributes if enabled.
* src/http.c: Include xattr.h.
(gethttp): Add parameter 'original_url',
set attributes if enabled.
(http_loop): Add 'original_url' to call of gethttp().
* src/init.c: Add new option --xattr.
* src/main.c: Add new option --xattr, add description to help text.
* src/options.h: Add new config member 'enable_xattr'.
* src/xatrr.c: New file.
* src/xattr.h: New file.
These attributes provide a lightweight method of later determining
where a file was downloaded from.
This patch changes:
* autoconf detects whether extended attributes are available and
enables the code if they are.
* The new flags --xattr and --no-xattr control whether xattr is enabled.
* The new command "xattr = (on|off)" can be used in ~/.wgetrc or /etc/wgetrc
* The original and redirected URLs are recorded as shown below.
* This works for both single fetches and recursive mode.
The attributes that are set are:
user.xdg.origin.url: The URL that the content was fetched from.
user.xdg.referrer.url: The URL that was originally requested.
Here is an example, where http://archive.org redirects to https://archive.org:
$ wget --xattr http://archive.org
...
$ getfattr -d index.html
user.xdg.origin.url="https://archive.org/"
user.xdg.referrer.url="http://archive.org/"
These attributes were chosen based on those stored by Google Chrome
https://bugs.chromium.org/p/chromium/issues/detail?id=45903
and curl https://github.com/curl/curl/blob/master/src/tool_xattr.c