* src/url.c (url_skip_credentials): Properly re-implement userinfo parsing (rfc2396)
The reason why the implementation is based on RFC 2396, an outdated standard,
is that the whole file is based on that RFC, and mixing standard here might be
dangerous.
Rather than reading from stdin only once, leave the pipe open until the
other end closes it and keep reading from the file after each set of
URLs is read
* src/html-url.h(get_urls_file): Update prototype to add additional
param
* src/html-url.c(get_urls_file): Pass through read_again to
wget_read_from_file.
* src/retr.c(retrieve_from_file): Split the function into two. Introduce
`retrieve_from_url_list` that actually performs the retrieval.
Also, if `url_list` returns that the fd has been left open, then
continue reading from it until the fd is closed.
(retrieve_from_url_list): New function that does the retrieval from
a list of URLs that was read from a file.
* src/utils.c(wget_read_from_file): Rename old function `wget_read_file`
to this.
Accept an additional output parameter that states whether the fd was
left open and if we should continue reading from it after the current
set of URLs have been processed
(wget_read_file): Write it as a new wrapper function around
`wget_read_from_file` to maintain API comptability across other users
The definition of debug_logprintf in src/log.c is guarded by ENABLE_DEBUG
(although its prototype is unconditionally available in src/log.h).
The uses of debug_logprintf in src/retr.c aren't guarded by ENABLE_DEBUG.
Use the DEBUGP macro which is designed for this purpose.
* src/retr.c (getproxy): Use DEBUGP macro.
Fixes: https://gitlab.com/gnuwget/wget/-/issues/19
Copyright-paperwork-exempt: Yes
* testenv/Test-recursive-pathmax.py: Add a new testcase. This test tries
to check that Wget allows downloading long filenames as far as allowed
by the OS and filesystem.
* tests/Makefile.am: Remove some tests that are redundant with the
Python testenv
* tests/Test-auth-basic.px: Delete file
* tests/Test-auth-no-challenge.px: Same
* tests/Test-auth-no-challenge-url.px: Same
* tests/Test-auth-retcode.px: Same
* tests/Test-auth-with-content-disposition.px: Same
* tests/Test-k.px: Same
* testenv/Makefile.am: Add two new tests, Test-k.py and Test-https-k.py
* testenv/Test-k.py: New file. Add a test based on tests/Test-k.px
* testenv/Test-https-k.py: New file. Add a new test to ensure that the
protocol of the original host URL is retained when creatign absolute
links.
This test is added as a result of an issue reported on StackExchange:
https://superuser.com/questions/1348940/making-wgets-convert-links-respect-http-vs-https
Add support for libproxy, which is capable to extract desktop
environment proxy configurations from dozens of systems and platforms.
This also enables wget to handle pac/wpad proxy server.
* configure.ac: Add check for libbproxy.
* src/retr.c (getproxy): Retrieve proxy via libproxy.
Copyright-paperwork-exempt: Yes
If the download rate is TB/s, a read buffer overflow happended
that either caused a crash or printed whatever string was pointed to.
* src/retr.c (retr_rate): Add missing array entrie for TB/s and Tb/s,
(test_retr_rate): New test function.
* tests/unit-tests.c (all_tests): Run test 'test_retr_rate'.
* tests/unit-tests.h: Add prototype for test_retr_rate.
Reported-by: Wiebe Cazemier <wiebe@halfgaar.net>
* testenv/server/http/http_server.py (HTTPSServer): Update for
ssl.SSLContext APIs instead of deprecated ssl.wrap_socket().
ssl.wrap_socket() was deprecated in 3.7 and removed in 3.12.
This should be compatible back to 3.6 (RHEL 8 and newer).
Copyright-paperwork-exempt: Yes
* src/convert.c(convert_links): Print the actual quoted newname when printing DEBUG output
(local_quote_string): Also quote the ' ' charcter as %20. While it is okay
to leave the characted as-is, quoting it covers more edge cases.
And it should resolve a >10 year old bug with CSS url() parameters not being quoted
Bug-Id: 64082
Reported-By: Ethan Gibbs <ethan@snowsign.net>
Discussed-At: https://stackoverflow.com/q/13300017