[svn] When downloading recursively, don't ignore rejection of HTML

documents that are themselves leaves of recursion.
This commit is contained in:
hniksic 2002-04-15 14:57:10 -07:00
parent b9210ecbc8
commit f8b4b8bd12
2 changed files with 12 additions and 16 deletions

View File

@ -1,3 +1,8 @@
2002-04-15 Hrvoje Niksic <hniksic@arsdigita.com>
* recur.c (download_child_p): Don't ignore rejection of HTML
documents that are themselves leaves of recursion.
2002-04-15 Ian Abbott <abbotti@mev.co.uk> 2002-04-15 Ian Abbott <abbotti@mev.co.uk>
Makefile.in: Updated several dependencies for object files to take Makefile.in: Updated several dependencies for object files to take
@ -16,6 +21,7 @@
* utils.c: Don't define `SETJMP()', `run_with_timeout_env' or * utils.c: Don't define `SETJMP()', `run_with_timeout_env' or
`abort_run_with_timeout()' when `USE_SIGNAL_TIMEOUT' is undefined. `abort_run_with_timeout()' when `USE_SIGNAL_TIMEOUT' is undefined.
>>>>>>> 1.395
2002-04-15 Hrvoje Niksic <hniksic@arsdigita.com> 2002-04-15 Hrvoje Niksic <hniksic@arsdigita.com>
* host.c (getaddrinfo_with_timeout): New function. * host.c (getaddrinfo_with_timeout): New function.

View File

@ -511,23 +511,13 @@ download_child_p (const struct urlpos *upos, struct url *parent, int depth,
/* 6. */ /* 6. */
{ {
/* Check for acceptance/rejection rules. We ignore these rules /* Check for acceptance/rejection rules. We ignore these rules
for HTML documents because they might lead to other files which for directories (no file name to match) and for HTML documents,
need to be downloaded. Of course, we don't know which which might lead to other files that do need to be downloaded.
documents are HTML before downloading them, so we guess. That is, unless we've exhausted the recursion depth anyway. */
A file is subject to acceptance/rejection rules if:
* u->file is not "" (i.e. it is not a directory)
and either:
+ there is no file suffix,
+ or there is a suffix, but is not "html" or "htm" or similar,
+ both:
- recursion is not infinite,
- and we are at its very end. */
if (u->file[0] != '\0' if (u->file[0] != '\0'
&& (!has_html_suffix_p (url) && !(has_html_suffix_p (u->file)
|| (opt.reclevel != INFINITE_RECURSION && depth >= opt.reclevel))) && depth < opt.reclevel - 1
&& depth != INFINITE_RECURSION))
{ {
if (!acceptable (u->file)) if (!acceptable (u->file))
{ {