Commit Graph

2699 Commits

Author SHA1 Message Date
Tim Rühsen
e7a4d818fa * src/main.c (main): Unlink output document when --unlink is given 2022-01-09 18:58:41 +01:00
Tim Rühsen
c34c2529dc * src/log.c (logprintf): Check earlier for verbosity 2021-12-22 13:07:23 +01:00
Tim Rühsen
c7a37d82ee * src/http.c (http_loop): Fix memleak 2021-12-22 13:06:34 +01:00
Darshit Shah
f75fcf2985 * src/http.c (http_loop): Hide password when printing status with -nv
Reported-By: Per Lundberg <perlun@gmail.com>
Closes: #61492
2021-12-01 23:38:52 +01:00
Darshit Shah
e1bacd2fa5 * src/hsts.c (hsts_read_database): Read time_t values as long long 2021-12-01 22:42:42 +01:00
Thomas Niederberger
faeb4d90c2 * src/main.c (print_help): Add command line option for TLS 1.3 2021-12-01 22:17:11 +01:00
Darshit Shah
65e6d5b3b8 * retr.c (rotate_backups): Non existent files are not errors in this function 2021-10-11 23:06:38 +02:00
Darshit Shah
aecf5fbf1b * ftp.c (ftp_loop_internal): Fix computation of total_downloaded_bytes
When continuing a FTP download, or not starting one because the file is
already fully retrieved, don't include the size of the file in the
total_downloaded_bytes. Only the actual amount of data retrieved over
the network should be considered there.

Fixes: #61277
Reported-By: Michal Ruprich <formaiko>
2021-10-08 20:37:51 +02:00
Darshit Shah
3ea9658c07 Remove suprious print statements
* src/gnutls.c: Remove fprintf statements. We should never print to
  console directly. Always honor the log levels.
  Fixes: #61125
2021-09-08 17:52:32 +02:00
WB
ebb96761f5 Fix #60956 (improve --page-requisites)
* src/html_url.c (tag_handle_link): Check for "alternate stylesheet",
  "icon" and "manifest".
2021-08-21 19:51:12 +02:00
Tim Rühsen
7899e1d17b * src/html-url.c (tag_handle_meta): Fix integer overflow 2021-08-07 14:29:02 +02:00
Tim Rühsen
254b2d3c7c * src/recur.c (download_child): Remove temporary robots.txt.tmp 2021-07-05 15:43:13 +02:00
Josef Moellers
718ab3f79b Long pathnames patch 2021-06-14 08:33:57 +02:00
Tim Rühsen
91c42c799a * src/url.c (append_uri_pathel): Add cheap extra check to help static analyzers 2021-06-06 15:34:12 +02:00
Tim Rühsen
c778ac20b4 * src/http.c (gethttp): Add cheap extra check to help static analyzers 2021-06-06 15:34:06 +02:00
Tim Rühsen
a209bb1fac * src/main.c (main): Removed unused variable 2021-06-06 15:33:59 +02:00
Tim Rühsen
36e250e09a Revert "Long pathnames patch"
This reverts commit affad27664.

Manual tests with very long path names did not work with this patch.
We have to wait for a patch including automated tests.
2021-06-06 14:10:22 +02:00
jmoellers
affad27664 Long pathnames patch 2021-05-29 17:17:27 +00:00
Tim Rühsen
027d294114 * src/http.c (initialize_request): Send Host HTTP header first
This solves an issue where the server expects the Host: header
as first one. This seems plausible (ahem) as the Host: header is the
only one that is required.
2021-05-03 17:49:58 +02:00
Tim Rühsen
5fe8d26904 Improve wget_options_fuzzer
* fuzz/fuzzer.h: Ignore -Wunused-parameter.
* fuzz/wget_options_fuzzer.c: Let getaddrinfo() fail in while fuzzing.
* fuzz/wget_options_fuzzer.in/*: Update corpora from OSS-Fuzz.
2021-05-02 19:43:06 +02:00
Nekun
aabdf6eb66 Fix typo in VMS support code
* src/utils.c: Remove unpaired brace

Copyright-paperwork-exempt: Yes
2021-05-02 14:19:01 +02:00
Nils
1aada296dd Use "nofollow" instead of "no-follow" in messages
* src/html-url.c (get_urls_html_fm): Remove misleading debug message.
* src/recur.c (retrieve_tree): Fix no-follow -> follow in DEBUGP.

The attribute in html is "nofollow" so it is more consistent to call it
so than to hyphenate it.

Copyright-paperwork-exempt: Yes
2021-04-15 21:03:56 +02:00
Nils
f1cccd2c45 Print message for no-follow attribute only if norobots respected
* src/html-url.c (get_urls_html_fm): Remove misleading log message.
* src/recur.c (retrieve_tree): Add log message into correct if block.

Commit e39be32838 added a message that
said links will not be followed whenever the nofollow attribute is found
in a page. It didn't take into account that with -e robots=off (and
equivalents) links will still be followed.

This bug has been noticed multiple times:
* https://www.reddit.com/r/DataHoarder/comments/mprq89/wget_respects_nofollow_attribute_despite_e/
* https://gist.github.com/simonw/27e810771137408fd7834ad153750c41#gistcomment-3648191
* https://superuser.com/questions/1494761/wget-wont-ignore-no-follow-attributes

This commits makes it so that this message is only printed when a
nofollow link is found and the norobots convention is respected.

Copyright-paperwork-exempt: Yes
2021-04-15 21:02:28 +02:00
Tim Rühsen
90631a6fe5 * src/wget.h: Use strtoll() for str_to_wgint
This fixes a regression reported at https://savannah.gnu.org/bugs/?60353.

Reported-by: Michal Ruprich
2021-04-11 12:53:20 +02:00
Shamil Gumirov
fd2a061f6a Minor output fix to use quote_n() instead of quote()
* src/ftp.c (ftp_retrieve_list): change quote to quote_n
* src/iri.c (do_conversion): change quote to quote_n
* src/url.c (convert_fname): change quote to quote_n

The implementation quote() reuses the buffer it returns which
leads to printing the same string for each quote() call in one
output line. Instead, quote_n() should be used as highlighted in
the doc:
https://www.gnu.org/software/gnulib/manual/html_node/Quoting.html

Copyright-paperwork-exempt: Yes
2021-04-11 12:42:07 +02:00
Tim Rühsen
27b12dad12 * src/Makefile.am: Add metalink.c and xattr.c to EXTRA_wget_SOURCES 2021-04-05 12:37:28 +02:00
Darshit Shah
f7835691b4 Fix double free in FTP Code
* src/ftp.c(getftp): Don't free `target`. If it is not pointing to
  targetbuf, then it still pointing to its original location of u->dir.
  This location will be free'd later. Doing so now causes a double free
  and hence crashes Wget
* tests/Test-ftp-dir.px: New test to show double free error
* tests/Makefile.am: Add new test
2021-03-02 12:03:14 +01:00
Tim Rühsen
7d9ed223fc Use gnulib's utime.h
* bootstrap.conf: Remove utime-h (included by utime).
* configure.ac: Remove header checks for utime.h and sys/utime.h.
* src/utils.c: Simply #include <utime.h>.
2021-01-23 19:28:58 +01:00
Tim Rühsen
ad36a467ac Fix --quota on systems with 32bit long type
* src/init.c (cmd_bytes_sum): Use WGINT_MIN and WGINT_MAX in check.
* src/options.h (struct options): Make 'quota' of type wgint.
* src/retr.c: Make 'total_downloaded_bytes' of type wgint.
* src/utils.h: Fix comment.
* src/wget.h: Add WGINT_MIN, remove SUM_SIZE_INT.
2021-01-16 20:00:39 +01:00
Darshit Shah
e9641d989b Use PRId64 to correctly identify the format specifier
* src/utils.c: Use PRId64 to correctly identify the format specifier for
wgint values. This fixes a warning on 32-bit systems where wgint is a
long long int instead of the long int that the format specifier
indicated.

Reported-by: Jeffrey Walton
2021-01-07 21:35:20 +01:00
Darshit Shah
9f3df123bb * src/retr.c(rotate_backups): Simplify logic for handling filename rotation 2021-01-03 15:59:49 +01:00
Darshit Shah
5a7f2f7e87 Run make update-copyright 2021-01-01 12:31:01 +01:00
Darshit Shah
37f0dca4e2 * src/main.c: Disable use-askpass on VMS 2020-12-30 23:04:13 +01:00
Steven M. Schweda
8af2171a34 Fixes for running on VMS
time_t on VMS is typically unsigned.  (Lazy man's solution to 2038?)
I added "(time_t)" type casts to negative values ("-1"), and changed
tests to avoid complaints.

* src/hsts.c (hsts_add_entry): Explicitly cast potentially negative time
  values to time_t to handle VMS quirks.
  (hsts_store_entry): Same
  (get_hsts_store_filename): Use new ajoin_dir_file function to join
  filenames
  (test_hsts_read_database): Same
* src/init.c (struct options): use-askpass is not implemented on VMS
  (ajoin_dir_file): New Function to join filenames in a platform
  agnostic manner
  (wgetrc_user_file_name): Use ajoin_dir_file to join paths. Doing this
  correctly, eliminates the need for a special case on VMS
* src/init.h: Add prototype for ajoin_dir_file
* src/log.c (check_redirect_output): Ignore on VMS
* src/main.c(option_data): Disable use-askpass on VMS
  (print_help): Same
  (get_hsts_database): Use ajoin_dir_file to join paths
  (print_version): Add VMS specific information to Version output
* src/utils.c (fork_to_background): Fix signature on VMS

Co-authored-by: Darshit Shah <darnir@gnu.org>
2020-12-30 22:50:32 +01:00
Tim Rühsen
7ec15b9c92 Remove SIZEOF_WGINT as wgint is always int64_t
* src/http.c (test_parse_range_header): Remove use of SIZEOF_WGINT.
* src/utils.c (human_readable): Remove superfluous HR_NUMTYPE,
*   (number_to_string): Remove use of SIZEOF_WGINT.
* src/utils.h: Remove use of SIZEOF_WGINT and HR_NUMTYPE.
* src/wget.h: Remove #define SIZEOF_WGINT.
2020-12-29 12:44:20 +00:00
Tim Rühsen
a16149e5bb src/wget.h: Cleanup code around wgint 2020-12-29 12:44:20 +00:00
Darshit Shah
db88ad441e Remove portability handling for str[n]casecmp
* src/mswindows.c: Gnulib ensures we always have str{n}casecmp
* configure.ac: Don't need to define HAVE_STR[N]CASECMP anymore
2020-12-29 12:44:20 +00:00
Darshit Shah
8b1aeab783 Remove portability handling code for wgint
Gnulib's stdint.h module promises a C99 compliant stdint.h file on all
platforms. Thus allowing us to directly use the fixed wodth integer
type, int64_t wihout needing to resort to all the checks being
performed.

* src/wget.h: Assume that int64_t is always available and use it
* src/mswindows.h: Remove portability code since gnulib handles it
* configure.ac: Remove sizeof checks for integer types that are no
  longer used
2020-12-29 12:44:20 +00:00
Tim Rühsen
5b7d068a4b Fix --accept-regex/i--reject-regex for FTP
* src/ftp.c (ftp_retrieve_glob): Call accept_url() with the full URL

Reported-by: Frans de Boer <frans@fransdb.nl>
2020-12-28 23:33:48 +01:00
Darshit Shah
a11bfc2d4e Use a separate domain for translating gnulib
Use the --po-domain option to gnulib-tool to create a new textdomain
that can be used by gnulib files for translations. This way, we don't
have to maintain the list of all files that require translations in
gnulib.

* bootstrap.conf: Use --po-domain and --po-base options to create a
  separate base for gnulib translations
* src/main.c(i18n_initialize): Call bindtextdomain on wget-gnulib to
  include those translations as well
* Makefile.am: Add new directory gnulib_po to SUBDIRS
* configure.ac: Generate gnulib_po/Makefile.in
* lib/Makefile.am: Set AM_CPPFLAGS to empty since gnulib.mk expects it
  to be set
2020-12-27 21:15:45 +01:00
Darshit Shah
3636b2a5af main.c (main): Warn when trying to use password without username 2020-12-22 22:25:26 +01:00
Tim Rühsen
015afd7cc7 * src/http.c (http_cleanup): Reset wget_cookie_jar after freeing
This silences the wget_options_fuzzer which triggered #28610 on
OSS-Fuzz. This issue can not happen with the Wget utility.
The fuzzer runs main(),...,cleanup() in a loop which the Wget utility
never does.
2020-12-13 18:23:39 +01:00
Tim Rühsen
794b7b1dbe * src/main.c: Add description to --help output of wait options 2020-11-08 18:46:11 +01:00
Tim Rühsen
1656a1628c * src/ftp.c (ftp_loop_internal): Check for VERIFCERTERR to avoid SIGABRT
There is a bug that causes wget to exit with SIGABRT when trying to
receive files through FTP from a server with a certificate that failed
the verification.

The bug is filed in RedHat Bugzilla for Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=1475861

Reported-by: Artem Egorenkov <aegorenk@redhat.com>
2020-06-29 18:04:52 +02:00
Tim Rühsen
e830f5f42b * src/host.c (lookup_host): Fix uninitialized pointer access in c-ares code
Reported-by: Swapnil More
2020-06-21 11:37:28 +02:00
Tim Rühsen
470a7dfc84 * src/gnutls.c (ssl_init): Small cleanup fixing output of ncerts 2020-05-22 15:49:12 +02:00
Tim Rühsen
c23eaff56f * src/convert.c (downloaded_files_free): Only compile if DEBUG_MALLOC or TESTING is defined 2020-05-01 17:54:58 +02:00
Tim Rühsen
5a141065c4 * src/netrc.c (free_netrc): Only compile if DEBUG_MALLOC or TESTING is defined 2020-05-01 17:54:58 +02:00
Вячеслав Петрищев
7a3a82faf8 Fix SSL/TLS timeout issues.
* connect.c (fd_read, fd_peek): Let implementation take care about timeout.
* gnutls.c (_do_handshake, _do_reauth, wgnutls_read_timeout): Fix support for interactive timeout.
* gnutls.c (wgnutls_peek): Let wgnutls_read_timeout() take care about timeout.
* openssl.c (openssl_read_peek): Fix 0 (-1) timeout.
* retr.c (fd_read_body): Avoid wrong 'interactive timeout'.
2020-05-01 17:53:47 +02:00
Вячеслав Петрищев
c12a295496 Set interactive to true for bar progress.
* src/progress.c (bar_set_params): Set interactive to true.
* src/retr.c (fd_read_body): Avoid call fd_read with 0 timeout.
2020-05-01 17:53:21 +02:00