annotate toolboxes/wget/man/cat1/wget.1.txt @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
rev   line source
wolffd@0 1 WGET(1) GNU Wget WGET(1)
wolffd@0 2
wolffd@0 3
wolffd@0 4
wolffd@0 5 NAME
wolffd@0 6 Wget - The non-interactive network downloader.
wolffd@0 7
wolffd@0 8 SYNOPSIS
wolffd@0 9 wget [option]... [URL]...
wolffd@0 10
wolffd@0 11 DESCRIPTION
wolffd@0 12 GNU Wget is a free utility for non-interactive download
wolffd@0 13 of files from the Web. It supports HTTP, HTTPS, and FTP
wolffd@0 14 protocols, as well as retrieval through HTTP proxies.
wolffd@0 15
wolffd@0 16 Wget is non-interactive, meaning that it can work in the
wolffd@0 17 background, while the user is not logged on. This
wolffd@0 18 allows you to start a retrieval and disconnect from the
wolffd@0 19 system, letting Wget finish the work. By contrast, most
wolffd@0 20 of the Web browsers require constant user's presence,
wolffd@0 21 which can be a great hindrance when transferring a lot
wolffd@0 22 of data.
wolffd@0 23
wolffd@0 24 Wget can follow links in HTML and XHTML pages and create
wolffd@0 25 local versions of remote web sites, fully recreating the
wolffd@0 26 directory structure of the original site. This is some-
wolffd@0 27 times referred to as "recursive downloading." While
wolffd@0 28 doing that, Wget respects the Robot Exclusion Standard
wolffd@0 29 (/robots.txt). Wget can be instructed to convert the
wolffd@0 30 links in downloaded HTML files to the local files for
wolffd@0 31 offline viewing.
wolffd@0 32
wolffd@0 33 Wget has been designed for robustness over slow or
wolffd@0 34 unstable network connections; if a download fails due to
wolffd@0 35 a network problem, it will keep retrying until the whole
wolffd@0 36 file has been retrieved. If the server supports reget-
wolffd@0 37 ting, it will instruct the server to continue the down-
wolffd@0 38 load from where it left off.
wolffd@0 39
wolffd@0 40 OPTIONS
wolffd@0 41 Option Syntax
wolffd@0 42
wolffd@0 43 Since Wget uses GNU getopt to process command-line argu-
wolffd@0 44 ments, every option has a long form along with the short
wolffd@0 45 one. Long options are more convenient to remember, but
wolffd@0 46 take time to type. You may freely mix different option
wolffd@0 47 styles, or specify options after the command-line argu-
wolffd@0 48 ments. Thus you may write:
wolffd@0 49
wolffd@0 50 wget -r --tries=10 http://fly.srk.fer.hr/ -o log
wolffd@0 51
wolffd@0 52 The space between the option accepting an argument and
wolffd@0 53 the argument may be omitted. Instead of -o log you can
wolffd@0 54 write -olog.
wolffd@0 55
wolffd@0 56 You may put several options that do not require argu-
wolffd@0 57 ments together, like:
wolffd@0 58
wolffd@0 59 wget -drc <URL>
wolffd@0 60
wolffd@0 61 This is a complete equivalent of:
wolffd@0 62
wolffd@0 63 wget -d -r -c <URL>
wolffd@0 64
wolffd@0 65 Since the options can be specified after the arguments,
wolffd@0 66 you may terminate them with --. So the following will
wolffd@0 67 try to download URL -x, reporting failure to log:
wolffd@0 68
wolffd@0 69 wget -o log -- -x
wolffd@0 70
wolffd@0 71 The options that accept comma-separated lists all
wolffd@0 72 respect the convention that specifying an empty list
wolffd@0 73 clears its value. This can be useful to clear the
wolffd@0 74 .wgetrc settings. For instance, if your .wgetrc sets
wolffd@0 75 "exclude_directories" to /cgi-bin, the following example
wolffd@0 76 will first reset it, and then set it to exclude /~nobody
wolffd@0 77 and /~somebody. You can also clear the lists in
wolffd@0 78 .wgetrc.
wolffd@0 79
wolffd@0 80 wget -X " -X /~nobody,/~somebody
wolffd@0 81
wolffd@0 82 Most options that do not accept arguments are boolean
wolffd@0 83 options, so named because their state can be captured
wolffd@0 84 with a yes-or-no ("boolean") variable. For example,
wolffd@0 85 --follow-ftp tells Wget to follow FTP links from HTML
wolffd@0 86 files and, on the other hand, --no-glob tells it not to
wolffd@0 87 perform file globbing on FTP URLs. A boolean option is
wolffd@0 88 either affirmative or negative (beginning with --no).
wolffd@0 89 All such options share several properties.
wolffd@0 90
wolffd@0 91 Unless stated otherwise, it is assumed that the default
wolffd@0 92 behavior is the opposite of what the option accom-
wolffd@0 93 plishes. For example, the documented existence of
wolffd@0 94 --follow-ftp assumes that the default is to not follow
wolffd@0 95 FTP links from HTML pages.
wolffd@0 96
wolffd@0 97 Affirmative options can be negated by prepending the
wolffd@0 98 --no- to the option name; negative options can be
wolffd@0 99 negated by omitting the --no- prefix. This might seem
wolffd@0 100 superfluous---if the default for an affirmative option
wolffd@0 101 is to not do something, then why provide a way to
wolffd@0 102 explicitly turn it off? But the startup file may in
wolffd@0 103 fact change the default. For instance, using "fol-
wolffd@0 104 low_ftp = off" in .wgetrc makes Wget not follow FTP
wolffd@0 105 links by default, and using --no-follow-ftp is the only
wolffd@0 106 way to restore the factory default from the command
wolffd@0 107 line.
wolffd@0 108
wolffd@0 109 Basic Startup Options
wolffd@0 110
wolffd@0 111
wolffd@0 112 -V
wolffd@0 113 --version
wolffd@0 114 Display the version of Wget.
wolffd@0 115
wolffd@0 116 -h
wolffd@0 117 --help
wolffd@0 118 Print a help message describing all of Wget's com-
wolffd@0 119 mand-line options.
wolffd@0 120
wolffd@0 121 -b
wolffd@0 122 --background
wolffd@0 123 Go to background immediately after startup. If no
wolffd@0 124 output file is specified via the -o, output is redi-
wolffd@0 125 rected to wget-log.
wolffd@0 126
wolffd@0 127 -e command
wolffd@0 128 --execute command
wolffd@0 129 Execute command as if it were a part of .wgetrc. A
wolffd@0 130 command thus invoked will be executed after the com-
wolffd@0 131 mands in .wgetrc, thus taking precedence over them.
wolffd@0 132 If you need to specify more than one wgetrc command,
wolffd@0 133 use multiple instances of -e.
wolffd@0 134
wolffd@0 135 Logging and Input File Options
wolffd@0 136
wolffd@0 137
wolffd@0 138 -o logfile
wolffd@0 139 --output-file=logfile
wolffd@0 140 Log all messages to logfile. The messages are nor-
wolffd@0 141 mally reported to standard error.
wolffd@0 142
wolffd@0 143 -a logfile
wolffd@0 144 --append-output=logfile
wolffd@0 145 Append to logfile. This is the same as -o, only it
wolffd@0 146 appends to logfile instead of overwriting the old
wolffd@0 147 log file. If logfile does not exist, a new file is
wolffd@0 148 created.
wolffd@0 149
wolffd@0 150 -d
wolffd@0 151 --debug
wolffd@0 152 Turn on debug output, meaning various information
wolffd@0 153 important to the developers of Wget if it does not
wolffd@0 154 work properly. Your system administrator may have
wolffd@0 155 chosen to compile Wget without debug support, in
wolffd@0 156 which case -d will not work. Please note that com-
wolffd@0 157 piling with debug support is always safe---Wget com-
wolffd@0 158 piled with the debug support will not print any
wolffd@0 159 debug info unless requested with -d.
wolffd@0 160
wolffd@0 161 -q
wolffd@0 162 --quiet
wolffd@0 163 Turn off Wget's output.
wolffd@0 164
wolffd@0 165 -v
wolffd@0 166 --verbose
wolffd@0 167 Turn on verbose output, with all the available data.
wolffd@0 168 The default output is verbose.
wolffd@0 169
wolffd@0 170 -nv
wolffd@0 171 --no-verbose
wolffd@0 172 Turn off verbose without being completely quiet (use
wolffd@0 173 -q for that), which means that error messages and
wolffd@0 174 basic information still get printed.
wolffd@0 175
wolffd@0 176 -i file
wolffd@0 177 --input-file=file
wolffd@0 178 Read URLs from file. If - is specified as file,
wolffd@0 179 URLs are read from the standard input. (Use ./- to
wolffd@0 180 read from a file literally named -.)
wolffd@0 181
wolffd@0 182 If this function is used, no URLs need be present on
wolffd@0 183 the command line. If there are URLs both on the
wolffd@0 184 command line and in an input file, those on the com-
wolffd@0 185 mand lines will be the first ones to be retrieved.
wolffd@0 186 The file need not be an HTML document (but no harm
wolffd@0 187 if it is)---it is enough if the URLs are just listed
wolffd@0 188 sequentially.
wolffd@0 189
wolffd@0 190 However, if you specify --force-html, the document
wolffd@0 191 will be regarded as html. In that case you may have
wolffd@0 192 problems with relative links, which you can solve
wolffd@0 193 either by adding "<base href="url">" to the docu-
wolffd@0 194 ments or by specifying --base=url on the command
wolffd@0 195 line.
wolffd@0 196
wolffd@0 197 -F
wolffd@0 198 --force-html
wolffd@0 199 When input is read from a file, force it to be
wolffd@0 200 treated as an HTML file. This enables you to
wolffd@0 201 retrieve relative links from existing HTML files on
wolffd@0 202 your local disk, by adding "<base href="url">" to
wolffd@0 203 HTML, or using the --base command-line option.
wolffd@0 204
wolffd@0 205 -B URL
wolffd@0 206 --base=URL
wolffd@0 207 Prepends URL to relative links read from the file
wolffd@0 208 specified with the -i option.
wolffd@0 209
wolffd@0 210 Download Options
wolffd@0 211
wolffd@0 212
wolffd@0 213 --bind-address=ADDRESS
wolffd@0 214 When making client TCP/IP connections, bind to
wolffd@0 215 ADDRESS on the local machine. ADDRESS may be speci-
wolffd@0 216 fied as a hostname or IP address. This option can
wolffd@0 217 be useful if your machine is bound to multiple IPs.
wolffd@0 218
wolffd@0 219 -t number
wolffd@0 220 --tries=number
wolffd@0 221 Set number of retries to number. Specify 0 or inf
wolffd@0 222 for infinite retrying. The default is to retry 20
wolffd@0 223 times, with the exception of fatal errors like "con-
wolffd@0 224 nection refused" or "not found" (404), which are not
wolffd@0 225 retried.
wolffd@0 226
wolffd@0 227 -O file
wolffd@0 228 --output-document=file
wolffd@0 229 The documents will not be written to the appropriate
wolffd@0 230 files, but all will be concatenated together and
wolffd@0 231 written to file. If - is used as file, documents
wolffd@0 232 will be printed to standard output, disabling link
wolffd@0 233 conversion. (Use ./- to print to a file literally
wolffd@0 234 named -.)
wolffd@0 235
wolffd@0 236 Use of -O is not intended to mean simply "use the
wolffd@0 237 name file instead of the one in the URL;" rather, it
wolffd@0 238 is analogous to shell redirection: wget -O file
wolffd@0 239 http://foo is intended to work like wget -O -
wolffd@0 240 http://foo > file; file will be truncated immedi-
wolffd@0 241 ately, and all downloaded content will be written
wolffd@0 242 there.
wolffd@0 243
wolffd@0 244 For this reason, -N (for timestamp-checking) is not
wolffd@0 245 supported in combination with -O: since file is
wolffd@0 246 always newly created, it will always have a very new
wolffd@0 247 timestamp. A warning will be issued if this combina-
wolffd@0 248 tion is used.
wolffd@0 249
wolffd@0 250 Similarly, using -r or -p with -O may not work as
wolffd@0 251 you expect: Wget won't just download the first file
wolffd@0 252 to file and then download the rest to their normal
wolffd@0 253 names: all downloaded content will be placed in
wolffd@0 254 file. This was disabled in version 1.11, but has
wolffd@0 255 been reinstated (with a warning) in 1.11.2, as there
wolffd@0 256 are some cases where this behavior can actually have
wolffd@0 257 some use.
wolffd@0 258
wolffd@0 259 Note that a combination with -k is only permitted
wolffd@0 260 when downloading a single document, as in that case
wolffd@0 261 it will just convert all relative URIs to external
wolffd@0 262 ones; -k makes no sense for multiple URIs when
wolffd@0 263 they're all being downloaded to a single file.
wolffd@0 264
wolffd@0 265 -nc
wolffd@0 266 --no-clobber
wolffd@0 267 If a file is downloaded more than once in the same
wolffd@0 268 directory, Wget's behavior depends on a few options,
wolffd@0 269 including -nc. In certain cases, the local file
wolffd@0 270 will be clobbered, or overwritten, upon repeated
wolffd@0 271 download. In other cases it will be preserved.
wolffd@0 272
wolffd@0 273 When running Wget without -N, -nc, -r, or p, down-
wolffd@0 274 loading the same file in the same directory will
wolffd@0 275 result in the original copy of file being preserved
wolffd@0 276 and the second copy being named file.1. If that
wolffd@0 277 file is downloaded yet again, the third copy will be
wolffd@0 278 named file.2, and so on. When -nc is specified,
wolffd@0 279 this behavior is suppressed, and Wget will refuse to
wolffd@0 280 download newer copies of file. Therefore,
wolffd@0 281 ""no-clobber"" is actually a misnomer in this
wolffd@0 282 mode---it's not clobbering that's prevented (as the
wolffd@0 283 numeric suffixes were already preventing clobber-
wolffd@0 284 ing), but rather the multiple version saving that's
wolffd@0 285 prevented.
wolffd@0 286
wolffd@0 287 When running Wget with -r or -p, but without -N or
wolffd@0 288 -nc, re-downloading a file will result in the new
wolffd@0 289 copy simply overwriting the old. Adding -nc will
wolffd@0 290 prevent this behavior, instead causing the original
wolffd@0 291 version to be preserved and any newer copies on the
wolffd@0 292 server to be ignored.
wolffd@0 293
wolffd@0 294 When running Wget with -N, with or without -r or -p,
wolffd@0 295 the decision as to whether or not to download a
wolffd@0 296 newer copy of a file depends on the local and remote
wolffd@0 297 timestamp and size of the file. -nc may not be
wolffd@0 298 specified at the same time as -N.
wolffd@0 299
wolffd@0 300 Note that when -nc is specified, files with the suf-
wolffd@0 301 fixes .html or .htm will be loaded from the local
wolffd@0 302 disk and parsed as if they had been retrieved from
wolffd@0 303 the Web.
wolffd@0 304
wolffd@0 305 -c
wolffd@0 306 --continue
wolffd@0 307 Continue getting a partially-downloaded file. This
wolffd@0 308 is useful when you want to finish up a download
wolffd@0 309 started by a previous instance of Wget, or by
wolffd@0 310 another program. For instance:
wolffd@0 311
wolffd@0 312 wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
wolffd@0 313
wolffd@0 314 If there is a file named ls-lR.Z in the current
wolffd@0 315 directory, Wget will assume that it is the first
wolffd@0 316 portion of the remote file, and will ask the server
wolffd@0 317 to continue the retrieval from an offset equal to
wolffd@0 318 the length of the local file.
wolffd@0 319
wolffd@0 320 Note that you don't need to specify this option if
wolffd@0 321 you just want the current invocation of Wget to
wolffd@0 322 retry downloading a file should the connection be
wolffd@0 323 lost midway through. This is the default behavior.
wolffd@0 324 -c only affects resumption of downloads started
wolffd@0 325 prior to this invocation of Wget, and whose local
wolffd@0 326 files are still sitting around.
wolffd@0 327
wolffd@0 328 Without -c, the previous example would just download
wolffd@0 329 the remote file to ls-lR.Z.1, leaving the truncated
wolffd@0 330 ls-lR.Z file alone.
wolffd@0 331
wolffd@0 332 Beginning with Wget 1.7, if you use -c on a non-
wolffd@0 333 empty file, and it turns out that the server does
wolffd@0 334 not support continued downloading, Wget will refuse
wolffd@0 335 to start the download from scratch, which would
wolffd@0 336 effectively ruin existing contents. If you really
wolffd@0 337 want the download to start from scratch, remove the
wolffd@0 338 file.
wolffd@0 339
wolffd@0 340 Also beginning with Wget 1.7, if you use -c on a
wolffd@0 341 file which is of equal size as the one on the
wolffd@0 342 server, Wget will refuse to download the file and
wolffd@0 343 print an explanatory message. The same happens when
wolffd@0 344 the file is smaller on the server than locally (pre-
wolffd@0 345 sumably because it was changed on the server since
wolffd@0 346 your last download attempt)---because "continuing"
wolffd@0 347 is not meaningful, no download occurs.
wolffd@0 348
wolffd@0 349 On the other side of the coin, while using -c, any
wolffd@0 350 file that's bigger on the server than locally will
wolffd@0 351 be considered an incomplete download and only
wolffd@0 352 "(length(remote) - length(local))" bytes will be
wolffd@0 353 downloaded and tacked onto the end of the local
wolffd@0 354 file. This behavior can be desirable in certain
wolffd@0 355 cases---for instance, you can use wget -c to down-
wolffd@0 356 load just the new portion that's been appended to a
wolffd@0 357 data collection or log file.
wolffd@0 358
wolffd@0 359 However, if the file is bigger on the server because
wolffd@0 360 it's been changed, as opposed to just appended to,
wolffd@0 361 you'll end up with a garbled file. Wget has no way
wolffd@0 362 of verifying that the local file is really a valid
wolffd@0 363 prefix of the remote file. You need to be espe-
wolffd@0 364 cially careful of this when using -c in conjunction
wolffd@0 365 with -r, since every file will be considered as an
wolffd@0 366 "incomplete download" candidate.
wolffd@0 367
wolffd@0 368 Another instance where you'll get a garbled file if
wolffd@0 369 you try to use -c is if you have a lame HTTP proxy
wolffd@0 370 that inserts a "transfer interrupted" string into
wolffd@0 371 the local file. In the future a "rollback" option
wolffd@0 372 may be added to deal with this case.
wolffd@0 373
wolffd@0 374 Note that -c only works with FTP servers and with
wolffd@0 375 HTTP servers that support the "Range" header.
wolffd@0 376
wolffd@0 377 --progress=type
wolffd@0 378 Select the type of the progress indicator you wish
wolffd@0 379 to use. Legal indicators are "dot" and "bar".
wolffd@0 380
wolffd@0 381 The "bar" indicator is used by default. It draws an
wolffd@0 382 ASCII progress bar graphics (a.k.a "thermometer"
wolffd@0 383 display) indicating the status of retrieval. If the
wolffd@0 384 output is not a TTY, the "dot" bar will be used by
wolffd@0 385 default.
wolffd@0 386
wolffd@0 387 Use --progress=dot to switch to the "dot" display.
wolffd@0 388 It traces the retrieval by printing dots on the
wolffd@0 389 screen, each dot representing a fixed amount of
wolffd@0 390 downloaded data.
wolffd@0 391
wolffd@0 392 When using the dotted retrieval, you may also set
wolffd@0 393 the style by specifying the type as dot:style. Dif-
wolffd@0 394 ferent styles assign different meaning to one dot.
wolffd@0 395 With the "default" style each dot represents 1K,
wolffd@0 396 there are ten dots in a cluster and 50 dots in a
wolffd@0 397 line. The "binary" style has a more "computer"-like
wolffd@0 398 orientation---8K dots, 16-dots clusters and 48 dots
wolffd@0 399 per line (which makes for 384K lines). The "mega"
wolffd@0 400 style is suitable for downloading very large
wolffd@0 401 files---each dot represents 64K retrieved, there are
wolffd@0 402 eight dots in a cluster, and 48 dots on each line
wolffd@0 403 (so each line contains 3M).
wolffd@0 404
wolffd@0 405 Note that you can set the default style using the
wolffd@0 406 "progress" command in .wgetrc. That setting may be
wolffd@0 407 overridden from the command line. The exception is
wolffd@0 408 that, when the output is not a TTY, the "dot"
wolffd@0 409 progress will be favored over "bar". To force the
wolffd@0 410 bar output, use --progress=bar:force.
wolffd@0 411
wolffd@0 412 -N
wolffd@0 413 --timestamping
wolffd@0 414 Turn on time-stamping.
wolffd@0 415
wolffd@0 416 -S
wolffd@0 417 --server-response
wolffd@0 418 Print the headers sent by HTTP servers and responses
wolffd@0 419 sent by FTP servers.
wolffd@0 420
wolffd@0 421 --spider
wolffd@0 422 When invoked with this option, Wget will behave as a
wolffd@0 423 Web spider, which means that it will not download
wolffd@0 424 the pages, just check that they are there. For
wolffd@0 425 example, you can use Wget to check your bookmarks:
wolffd@0 426
wolffd@0 427 wget --spider --force-html -i bookmarks.html
wolffd@0 428
wolffd@0 429 This feature needs much more work for Wget to get
wolffd@0 430 close to the functionality of real web spiders.
wolffd@0 431
wolffd@0 432 -T seconds
wolffd@0 433 --timeout=seconds
wolffd@0 434 Set the network timeout to seconds seconds. This is
wolffd@0 435 equivalent to specifying --dns-timeout, --con-
wolffd@0 436 nect-timeout, and --read-timeout, all at the same
wolffd@0 437 time.
wolffd@0 438
wolffd@0 439 When interacting with the network, Wget can check
wolffd@0 440 for timeout and abort the operation if it takes too
wolffd@0 441 long. This prevents anomalies like hanging reads
wolffd@0 442 and infinite connects. The only timeout enabled by
wolffd@0 443 default is a 900-second read timeout. Setting a
wolffd@0 444 timeout to 0 disables it altogether. Unless you
wolffd@0 445 know what you are doing, it is best not to change
wolffd@0 446 the default timeout settings.
wolffd@0 447
wolffd@0 448 All timeout-related options accept decimal values,
wolffd@0 449 as well as subsecond values. For example, 0.1 sec-
wolffd@0 450 onds is a legal (though unwise) choice of timeout.
wolffd@0 451 Subsecond timeouts are useful for checking server
wolffd@0 452 response times or for testing network latency.
wolffd@0 453
wolffd@0 454 --dns-timeout=seconds
wolffd@0 455 Set the DNS lookup timeout to seconds seconds. DNS
wolffd@0 456 lookups that don't complete within the specified
wolffd@0 457 time will fail. By default, there is no timeout on
wolffd@0 458 DNS lookups, other than that implemented by system
wolffd@0 459 libraries.
wolffd@0 460
wolffd@0 461 --connect-timeout=seconds
wolffd@0 462 Set the connect timeout to seconds seconds. TCP
wolffd@0 463 connections that take longer to establish will be
wolffd@0 464 aborted. By default, there is no connect timeout,
wolffd@0 465 other than that implemented by system libraries.
wolffd@0 466
wolffd@0 467 --read-timeout=seconds
wolffd@0 468 Set the read (and write) timeout to seconds seconds.
wolffd@0 469 The "time" of this timeout refers to idle time: if,
wolffd@0 470 at any point in the download, no data is received
wolffd@0 471 for more than the specified number of seconds, read-
wolffd@0 472 ing fails and the download is restarted. This
wolffd@0 473 option does not directly affect the duration of the
wolffd@0 474 entire download.
wolffd@0 475
wolffd@0 476 Of course, the remote server may choose to terminate
wolffd@0 477 the connection sooner than this option requires.
wolffd@0 478 The default read timeout is 900 seconds.
wolffd@0 479
wolffd@0 480 --limit-rate=amount
wolffd@0 481 Limit the download speed to amount bytes per second.
wolffd@0 482 Amount may be expressed in bytes, kilobytes with the
wolffd@0 483 k suffix, or megabytes with the m suffix. For exam-
wolffd@0 484 ple, --limit-rate=20k will limit the retrieval rate
wolffd@0 485 to 20KB/s. This is useful when, for whatever rea-
wolffd@0 486 son, you don't want Wget to consume the entire
wolffd@0 487 available bandwidth.
wolffd@0 488
wolffd@0 489 This option allows the use of decimal numbers, usu-
wolffd@0 490 ally in conjunction with power suffixes; for exam-
wolffd@0 491 ple, --limit-rate=2.5k is a legal value.
wolffd@0 492
wolffd@0 493 Note that Wget implements the limiting by sleeping
wolffd@0 494 the appropriate amount of time after a network read
wolffd@0 495 that took less time than specified by the rate.
wolffd@0 496 Eventually this strategy causes the TCP transfer to
wolffd@0 497 slow down to approximately the specified rate. How-
wolffd@0 498 ever, it may take some time for this balance to be
wolffd@0 499 achieved, so don't be surprised if limiting the rate
wolffd@0 500 doesn't work well with very small files.
wolffd@0 501
wolffd@0 502 -w seconds
wolffd@0 503 --wait=seconds
wolffd@0 504 Wait the specified number of seconds between the
wolffd@0 505 retrievals. Use of this option is recommended, as
wolffd@0 506 it lightens the server load by making the requests
wolffd@0 507 less frequent. Instead of in seconds, the time can
wolffd@0 508 be specified in minutes using the "m" suffix, in
wolffd@0 509 hours using "h" suffix, or in days using "d" suffix.
wolffd@0 510
wolffd@0 511 Specifying a large value for this option is useful
wolffd@0 512 if the network or the destination host is down, so
wolffd@0 513 that Wget can wait long enough to reasonably expect
wolffd@0 514 the network error to be fixed before the retry. The
wolffd@0 515 waiting interval specified by this function is
wolffd@0 516 influenced by "--random-wait", which see.
wolffd@0 517
wolffd@0 518 --waitretry=seconds
wolffd@0 519 If you don't want Wget to wait between every
wolffd@0 520 retrieval, but only between retries of failed down-
wolffd@0 521 loads, you can use this option. Wget will use lin-
wolffd@0 522 ear backoff, waiting 1 second after the first fail-
wolffd@0 523 ure on a given file, then waiting 2 seconds after
wolffd@0 524 the second failure on that file, up to the maximum
wolffd@0 525 number of seconds you specify. Therefore, a value
wolffd@0 526 of 10 will actually make Wget wait up to (1 + 2 +
wolffd@0 527 ... + 10) = 55 seconds per file.
wolffd@0 528
wolffd@0 529 Note that this option is turned on by default in the
wolffd@0 530 global wgetrc file.
wolffd@0 531
wolffd@0 532 --random-wait
wolffd@0 533 Some web sites may perform log analysis to identify
wolffd@0 534 retrieval programs such as Wget by looking for sta-
wolffd@0 535 tistically significant similarities in the time
wolffd@0 536 between requests. This option causes the time
wolffd@0 537 between requests to vary between 0.5 and 1.5 * wait
wolffd@0 538 seconds, where wait was specified using the --wait
wolffd@0 539 option, in order to mask Wget's presence from such
wolffd@0 540 analysis.
wolffd@0 541
wolffd@0 542 A 2001 article in a publication devoted to develop-
wolffd@0 543 ment on a popular consumer platform provided code to
wolffd@0 544 perform this analysis on the fly. Its author sug-
wolffd@0 545 gested blocking at the class C address level to
wolffd@0 546 ensure automated retrieval programs were blocked
wolffd@0 547 despite changing DHCP-supplied addresses.
wolffd@0 548
wolffd@0 549 The --random-wait option was inspired by this ill-
wolffd@0 550 advised recommendation to block many unrelated users
wolffd@0 551 from a web site due to the actions of one.
wolffd@0 552
wolffd@0 553 --no-proxy
wolffd@0 554 Don't use proxies, even if the appropriate *_proxy
wolffd@0 555 environment variable is defined.
wolffd@0 556
wolffd@0 557 -Q quota
wolffd@0 558 --quota=quota
wolffd@0 559 Specify download quota for automatic retrievals.
wolffd@0 560 The value can be specified in bytes (default), kilo-
wolffd@0 561 bytes (with k suffix), or megabytes (with m suffix).
wolffd@0 562
wolffd@0 563 Note that quota will never affect downloading a sin-
wolffd@0 564 gle file. So if you specify wget -Q10k
wolffd@0 565 ftp://wuarchive.wustl.edu/ls-lR.gz, all of the
wolffd@0 566 ls-lR.gz will be downloaded. The same goes even
wolffd@0 567 when several URLs are specified on the command-line.
wolffd@0 568 However, quota is respected when retrieving either
wolffd@0 569 recursively, or from an input file. Thus you may
wolffd@0 570 safely type wget -Q2m -i sites---download will be
wolffd@0 571 aborted when the quota is exceeded.
wolffd@0 572
wolffd@0 573 Setting quota to 0 or to inf unlimits the download
wolffd@0 574 quota.
wolffd@0 575
wolffd@0 576 --no-dns-cache
wolffd@0 577 Turn off caching of DNS lookups. Normally, Wget
wolffd@0 578 remembers the IP addresses it looked up from DNS so
wolffd@0 579 it doesn't have to repeatedly contact the DNS server
wolffd@0 580 for the same (typically small) set of hosts it
wolffd@0 581 retrieves from. This cache exists in memory only; a
wolffd@0 582 new Wget run will contact DNS again.
wolffd@0 583
wolffd@0 584 However, it has been reported that in some situa-
wolffd@0 585 tions it is not desirable to cache host names, even
wolffd@0 586 for the duration of a short-running application like
wolffd@0 587 Wget. With this option Wget issues a new DNS lookup
wolffd@0 588 (more precisely, a new call to "gethostbyname" or
wolffd@0 589 "getaddrinfo") each time it makes a new connection.
wolffd@0 590 Please note that this option will not affect caching
wolffd@0 591 that might be performed by the resolving library or
wolffd@0 592 by an external caching layer, such as NSCD.
wolffd@0 593
wolffd@0 594 If you don't understand exactly what this option
wolffd@0 595 does, you probably won't need it.
wolffd@0 596
wolffd@0 597 --restrict-file-names=mode
wolffd@0 598 Change which characters found in remote URLs may
wolffd@0 599 show up in local file names generated from those
wolffd@0 600 URLs. Characters that are restricted by this option
wolffd@0 601 are escaped, i.e. replaced with %HH, where HH is the
wolffd@0 602 hexadecimal number that corresponds to the
wolffd@0 603 restricted character.
wolffd@0 604
wolffd@0 605 By default, Wget escapes the characters that are not
wolffd@0 606 valid as part of file names on your operating sys-
wolffd@0 607 tem, as well as control characters that are typi-
wolffd@0 608 cally unprintable. This option is useful for chang-
wolffd@0 609 ing these defaults, either because you are download-
wolffd@0 610 ing to a non-native partition, or because you want
wolffd@0 611 to disable escaping of the control characters.
wolffd@0 612
wolffd@0 613 When mode is set to "unix", Wget escapes the charac-
wolffd@0 614 ter / and the control characters in the ranges 0--31
wolffd@0 615 and 128--159. This is the default on Unix-like
wolffd@0 616 OS'es.
wolffd@0 617
wolffd@0 618 When mode is set to "windows", Wget escapes the
wolffd@0 619 characters \, |, /, :, ?, ", *, <, >, and the con-
wolffd@0 620 trol characters in the ranges 0--31 and 128--159.
wolffd@0 621 In addition to this, Wget in Windows mode uses +
wolffd@0 622 instead of : to separate host and port in local file
wolffd@0 623 names, and uses @ instead of ? to separate the query
wolffd@0 624 portion of the file name from the rest. Therefore,
wolffd@0 625 a URL that would be saved as
wolffd@0 626 www.xemacs.org:4300/search.pl?input=blah in Unix
wolffd@0 627 mode would be saved as
wolffd@0 628 www.xemacs.org+4300/search.pl@input=blah in Windows
wolffd@0 629 mode. This mode is the default on Windows.
wolffd@0 630
wolffd@0 631 If you append ,nocontrol to the mode, as in
wolffd@0 632 unix,nocontrol, escaping of the control characters
wolffd@0 633 is also switched off. You can use
wolffd@0 634 --restrict-file-names=nocontrol to turn off escaping
wolffd@0 635 of control characters without affecting the choice
wolffd@0 636 of the OS to use as file name restriction mode.
wolffd@0 637
wolffd@0 638 -4
wolffd@0 639 --inet4-only
wolffd@0 640 -6
wolffd@0 641 --inet6-only
wolffd@0 642 Force connecting to IPv4 or IPv6 addresses. With
wolffd@0 643 --inet4-only or -4, Wget will only connect to IPv4
wolffd@0 644 hosts, ignoring AAAA records in DNS, and refusing to
wolffd@0 645 connect to IPv6 addresses specified in URLs. Con-
wolffd@0 646 versely, with --inet6-only or -6, Wget will only
wolffd@0 647 connect to IPv6 hosts and ignore A records and IPv4
wolffd@0 648 addresses.
wolffd@0 649
wolffd@0 650 Neither options should be needed normally. By
wolffd@0 651 default, an IPv6-aware Wget will use the address
wolffd@0 652 family specified by the host's DNS record. If the
wolffd@0 653 DNS responds with both IPv4 and IPv6 addresses, Wget
wolffd@0 654 will try them in sequence until it finds one it can
wolffd@0 655 connect to. (Also see "--prefer-family" option
wolffd@0 656 described below.)
wolffd@0 657
wolffd@0 658 These options can be used to deliberately force the
wolffd@0 659 use of IPv4 or IPv6 address families on dual family
wolffd@0 660 systems, usually to aid debugging or to deal with
wolffd@0 661 broken network configuration. Only one of
wolffd@0 662 --inet6-only and --inet4-only may be specified at
wolffd@0 663 the same time. Neither option is available in Wget
wolffd@0 664 compiled without IPv6 support.
wolffd@0 665
wolffd@0 666 --prefer-family=IPv4/IPv6/none
wolffd@0 667 When given a choice of several addresses, connect to
wolffd@0 668 the addresses with specified address family first.
wolffd@0 669 IPv4 addresses are preferred by default.
wolffd@0 670
wolffd@0 671 This avoids spurious errors and connect attempts
wolffd@0 672 when accessing hosts that resolve to both IPv6 and
wolffd@0 673 IPv4 addresses from IPv4 networks. For example,
wolffd@0 674 www.kame.net resolves to
wolffd@0 675 2001:200:0:8002:203:47ff:fea5:3085 and to
wolffd@0 676 203.178.141.194. When the preferred family is
wolffd@0 677 "IPv4", the IPv4 address is used first; when the
wolffd@0 678 preferred family is "IPv6", the IPv6 address is used
wolffd@0 679 first; if the specified value is "none", the address
wolffd@0 680 order returned by DNS is used without change.
wolffd@0 681
wolffd@0 682 Unlike -4 and -6, this option doesn't inhibit access
wolffd@0 683 to any address family, it only changes the order in
wolffd@0 684 which the addresses are accessed. Also note that
wolffd@0 685 the reordering performed by this option is sta-
wolffd@0 686 ble---it doesn't affect order of addresses of the
wolffd@0 687 same family. That is, the relative order of all
wolffd@0 688 IPv4 addresses and of all IPv6 addresses remains
wolffd@0 689 intact in all cases.
wolffd@0 690
wolffd@0 691 --retry-connrefused
wolffd@0 692 Consider "connection refused" a transient error and
wolffd@0 693 try again. Normally Wget gives up on a URL when it
wolffd@0 694 is unable to connect to the site because failure to
wolffd@0 695 connect is taken as a sign that the server is not
wolffd@0 696 running at all and that retries would not help.
wolffd@0 697 This option is for mirroring unreliable sites whose
wolffd@0 698 servers tend to disappear for short periods of time.
wolffd@0 699
wolffd@0 700 --user=user
wolffd@0 701 --password=password
wolffd@0 702 Specify the username user and password password for
wolffd@0 703 both FTP and HTTP file retrieval. These parameters
wolffd@0 704 can be overridden using the --ftp-user and
wolffd@0 705 --ftp-password options for FTP connections and the
wolffd@0 706 --http-user and --http-password options for HTTP
wolffd@0 707 connections.
wolffd@0 708
wolffd@0 709 Directory Options
wolffd@0 710
wolffd@0 711
wolffd@0 712 -nd
wolffd@0 713 --no-directories
wolffd@0 714 Do not create a hierarchy of directories when
wolffd@0 715 retrieving recursively. With this option turned on,
wolffd@0 716 all files will get saved to the current directory,
wolffd@0 717 without clobbering (if a name shows up more than
wolffd@0 718 once, the filenames will get extensions .n).
wolffd@0 719
wolffd@0 720 -x
wolffd@0 721 --force-directories
wolffd@0 722 The opposite of -nd---create a hierarchy of directo-
wolffd@0 723 ries, even if one would not have been created other-
wolffd@0 724 wise. E.g. wget -x http://fly.srk.fer.hr/robots.txt
wolffd@0 725 will save the downloaded file to fly.srk.fer.hr/ro-
wolffd@0 726 bots.txt.
wolffd@0 727
wolffd@0 728 -nH
wolffd@0 729 --no-host-directories
wolffd@0 730 Disable generation of host-prefixed directories. By
wolffd@0 731 default, invoking Wget with -r
wolffd@0 732 http://fly.srk.fer.hr/ will create a structure of
wolffd@0 733 directories beginning with fly.srk.fer.hr/. This
wolffd@0 734 option disables such behavior.
wolffd@0 735
wolffd@0 736 --protocol-directories
wolffd@0 737 Use the protocol name as a directory component of
wolffd@0 738 local file names. For example, with this option,
wolffd@0 739 wget -r http://host will save to http/host/...
wolffd@0 740 rather than just to host/....
wolffd@0 741
wolffd@0 742 --cut-dirs=number
wolffd@0 743 Ignore number directory components. This is useful
wolffd@0 744 for getting a fine-grained control over the direc-
wolffd@0 745 tory where recursive retrieval will be saved.
wolffd@0 746
wolffd@0 747 Take, for example, the directory at
wolffd@0 748 ftp://ftp.xemacs.org/pub/xemacs/. If you retrieve
wolffd@0 749 it with -r, it will be saved locally under
wolffd@0 750 ftp.xemacs.org/pub/xemacs/. While the -nH option
wolffd@0 751 can remove the ftp.xemacs.org/ part, you are still
wolffd@0 752 stuck with pub/xemacs. This is where --cut-dirs
wolffd@0 753 comes in handy; it makes Wget not "see" number
wolffd@0 754 remote directory components. Here are several exam-
wolffd@0 755 ples of how --cut-dirs option works.
wolffd@0 756
wolffd@0 757 No options -> ftp.xemacs.org/pub/xemacs/
wolffd@0 758 -nH -> pub/xemacs/
wolffd@0 759 -nH --cut-dirs=1 -> xemacs/
wolffd@0 760 -nH --cut-dirs=2 -> .
wolffd@0 761
wolffd@0 762 --cut-dirs=1 -> ftp.xemacs.org/xemacs/
wolffd@0 763 ...
wolffd@0 764
wolffd@0 765 If you just want to get rid of the directory struc-
wolffd@0 766 ture, this option is similar to a combination of -nd
wolffd@0 767 and -P. However, unlike -nd, --cut-dirs does not
wolffd@0 768 lose with subdirectories---for instance, with -nH
wolffd@0 769 --cut-dirs=1, a beta/ subdirectory will be placed to
wolffd@0 770 xemacs/beta, as one would expect.
wolffd@0 771
wolffd@0 772 -P prefix
wolffd@0 773 --directory-prefix=prefix
wolffd@0 774 Set directory prefix to prefix. The directory pre-
wolffd@0 775 fix is the directory where all other files and sub-
wolffd@0 776 directories will be saved to, i.e. the top of the
wolffd@0 777 retrieval tree. The default is . (the current
wolffd@0 778 directory).
wolffd@0 779
wolffd@0 780 HTTP Options
wolffd@0 781
wolffd@0 782
wolffd@0 783 -E
wolffd@0 784 --html-extension
wolffd@0 785 If a file of type application/xhtml+xml or text/html
wolffd@0 786 is downloaded and the URL does not end with the reg-
wolffd@0 787 exp \.[Hh][Tt][Mm][Ll]?, this option will cause the
wolffd@0 788 suffix .html to be appended to the local filename.
wolffd@0 789 This is useful, for instance, when you're mirroring
wolffd@0 790 a remote site that uses .asp pages, but you want the
wolffd@0 791 mirrored pages to be viewable on your stock Apache
wolffd@0 792 server. Another good use for this is when you're
wolffd@0 793 downloading CGI-generated materials. A URL like
wolffd@0 794 http://site.com/article.cgi?25 will be saved as
wolffd@0 795 article.cgi?25.html.
wolffd@0 796
wolffd@0 797 Note that filenames changed in this way will be re-
wolffd@0 798 downloaded every time you re-mirror a site, because
wolffd@0 799 Wget can't tell that the local X.html file corre-
wolffd@0 800 sponds to remote URL X (since it doesn't yet know
wolffd@0 801 that the URL produces output of type text/html or
wolffd@0 802 application/xhtml+xml. To prevent this re-download-
wolffd@0 803 ing, you must use -k and -K so that the original
wolffd@0 804 version of the file will be saved as X.orig.
wolffd@0 805
wolffd@0 806 --http-user=user
wolffd@0 807 --http-password=password
wolffd@0 808 Specify the username user and password password on
wolffd@0 809 an HTTP server. According to the type of the chal-
wolffd@0 810 lenge, Wget will encode them using either the
wolffd@0 811 "basic" (insecure), the "digest", or the Windows
wolffd@0 812 "NTLM" authentication scheme.
wolffd@0 813
wolffd@0 814 Another way to specify username and password is in
wolffd@0 815 the URL itself. Either method reveals your password
wolffd@0 816 to anyone who bothers to run "ps". To prevent the
wolffd@0 817 passwords from being seen, store them in .wgetrc or
wolffd@0 818 .netrc, and make sure to protect those files from
wolffd@0 819 other users with "chmod". If the passwords are
wolffd@0 820 really important, do not leave them lying in those
wolffd@0 821 files either---edit the files and delete them after
wolffd@0 822 Wget has started the download.
wolffd@0 823
wolffd@0 824 --no-cache
wolffd@0 825 Disable server-side cache. In this case, Wget will
wolffd@0 826 send the remote server an appropriate directive
wolffd@0 827 (Pragma: no-cache) to get the file from the remote
wolffd@0 828 service, rather than returning the cached version.
wolffd@0 829 This is especially useful for retrieving and flush-
wolffd@0 830 ing out-of-date documents on proxy servers.
wolffd@0 831
wolffd@0 832 Caching is allowed by default.
wolffd@0 833
wolffd@0 834 --no-cookies
wolffd@0 835 Disable the use of cookies. Cookies are a mechanism
wolffd@0 836 for maintaining server-side state. The server sends
wolffd@0 837 the client a cookie using the "Set-Cookie" header,
wolffd@0 838 and the client responds with the same cookie upon
wolffd@0 839 further requests. Since cookies allow the server
wolffd@0 840 owners to keep track of visitors and for sites to
wolffd@0 841 exchange this information, some consider them a
wolffd@0 842 breach of privacy. The default is to use cookies;
wolffd@0 843 however, storing cookies is not on by default.
wolffd@0 844
wolffd@0 845 --load-cookies file
wolffd@0 846 Load cookies from file before the first HTTP
wolffd@0 847 retrieval. file is a textual file in the format
wolffd@0 848 originally used by Netscape's cookies.txt file.
wolffd@0 849
wolffd@0 850 You will typically use this option when mirroring
wolffd@0 851 sites that require that you be logged in to access
wolffd@0 852 some or all of their content. The login process
wolffd@0 853 typically works by the web server issuing an HTTP
wolffd@0 854 cookie upon receiving and verifying your creden-
wolffd@0 855 tials. The cookie is then resent by the browser
wolffd@0 856 when accessing that part of the site, and so proves
wolffd@0 857 your identity.
wolffd@0 858
wolffd@0 859 Mirroring such a site requires Wget to send the same
wolffd@0 860 cookies your browser sends when communicating with
wolffd@0 861 the site. This is achieved by --load-cookies---sim-
wolffd@0 862 ply point Wget to the location of the cookies.txt
wolffd@0 863 file, and it will send the same cookies your browser
wolffd@0 864 would send in the same situation. Different
wolffd@0 865 browsers keep textual cookie files in different
wolffd@0 866 locations:
wolffd@0 867
wolffd@0 868 @asis<Netscape 4.x.>
wolffd@0 869 The cookies are in ~/.netscape/cookies.txt.
wolffd@0 870
wolffd@0 871 @asis<Mozilla and Netscape 6.x.>
wolffd@0 872 Mozilla's cookie file is also named cookies.txt,
wolffd@0 873 located somewhere under ~/.mozilla, in the
wolffd@0 874 directory of your profile. The full path usu-
wolffd@0 875 ally ends up looking somewhat like
wolffd@0 876 ~/.mozilla/default/some-weird-string/cook-
wolffd@0 877 ies.txt.
wolffd@0 878
wolffd@0 879 @asis<Internet Explorer.>
wolffd@0 880 You can produce a cookie file Wget can use by
wolffd@0 881 using the File menu, Import and Export, Export
wolffd@0 882 Cookies. This has been tested with Internet
wolffd@0 883 Explorer 5; it is not guaranteed to work with
wolffd@0 884 earlier versions.
wolffd@0 885
wolffd@0 886 @asis<Other browsers.>
wolffd@0 887 If you are using a different browser to create
wolffd@0 888 your cookies, --load-cookies will only work if
wolffd@0 889 you can locate or produce a cookie file in the
wolffd@0 890 Netscape format that Wget expects.
wolffd@0 891
wolffd@0 892 If you cannot use --load-cookies, there might still
wolffd@0 893 be an alternative. If your browser supports a
wolffd@0 894 "cookie manager", you can use it to view the cookies
wolffd@0 895 used when accessing the site you're mirroring.
wolffd@0 896 Write down the name and value of the cookie, and
wolffd@0 897 manually instruct Wget to send those cookies,
wolffd@0 898 bypassing the "official" cookie support:
wolffd@0 899
wolffd@0 900 wget --no-cookies --header "Cookie: <name>=<value>"
wolffd@0 901
wolffd@0 902 --save-cookies file
wolffd@0 903 Save cookies to file before exiting. This will not
wolffd@0 904 save cookies that have expired or that have no
wolffd@0 905 expiry time (so-called "session cookies"), but also
wolffd@0 906 see --keep-session-cookies.
wolffd@0 907
wolffd@0 908 --keep-session-cookies
wolffd@0 909 When specified, causes --save-cookies to also save
wolffd@0 910 session cookies. Session cookies are normally not
wolffd@0 911 saved because they are meant to be kept in memory
wolffd@0 912 and forgotten when you exit the browser. Saving
wolffd@0 913 them is useful on sites that require you to log in
wolffd@0 914 or to visit the home page before you can access some
wolffd@0 915 pages. With this option, multiple Wget runs are
wolffd@0 916 considered a single browser session as far as the
wolffd@0 917 site is concerned.
wolffd@0 918
wolffd@0 919 Since the cookie file format does not normally carry
wolffd@0 920 session cookies, Wget marks them with an expiry
wolffd@0 921 timestamp of 0. Wget's --load-cookies recognizes
wolffd@0 922 those as session cookies, but it might confuse other
wolffd@0 923 browsers. Also note that cookies so loaded will be
wolffd@0 924 treated as other session cookies, which means that
wolffd@0 925 if you want --save-cookies to preserve them again,
wolffd@0 926 you must use --keep-session-cookies again.
wolffd@0 927
wolffd@0 928 --ignore-length
wolffd@0 929 Unfortunately, some HTTP servers (CGI programs, to
wolffd@0 930 be more precise) send out bogus "Content-Length"
wolffd@0 931 headers, which makes Wget go wild, as it thinks not
wolffd@0 932 all the document was retrieved. You can spot this
wolffd@0 933 syndrome if Wget retries getting the same document
wolffd@0 934 again and again, each time claiming that the (other-
wolffd@0 935 wise normal) connection has closed on the very same
wolffd@0 936 byte.
wolffd@0 937
wolffd@0 938 With this option, Wget will ignore the "Con-
wolffd@0 939 tent-Length" header---as if it never existed.
wolffd@0 940
wolffd@0 941 --header=header-line
wolffd@0 942 Send header-line along with the rest of the headers
wolffd@0 943 in each HTTP request. The supplied header is sent
wolffd@0 944 as-is, which means it must contain name and value
wolffd@0 945 separated by colon, and must not contain newlines.
wolffd@0 946
wolffd@0 947 You may define more than one additional header by
wolffd@0 948 specifying --header more than once.
wolffd@0 949
wolffd@0 950 wget --header='Accept-Charset: iso-8859-2' \
wolffd@0 951 --header='Accept-Language: hr' \
wolffd@0 952 http://fly.srk.fer.hr/
wolffd@0 953
wolffd@0 954 Specification of an empty string as the header value
wolffd@0 955 will clear all previous user-defined headers.
wolffd@0 956
wolffd@0 957 As of Wget 1.10, this option can be used to override
wolffd@0 958 headers otherwise generated automatically. This
wolffd@0 959 example instructs Wget to connect to localhost, but
wolffd@0 960 to specify foo.bar in the "Host" header:
wolffd@0 961
wolffd@0 962 wget --header="Host: foo.bar" http://localhost/
wolffd@0 963
wolffd@0 964 In versions of Wget prior to 1.10 such use of
wolffd@0 965 --header caused sending of duplicate headers.
wolffd@0 966
wolffd@0 967 --max-redirect=number
wolffd@0 968 Specifies the maximum number of redirections to fol-
wolffd@0 969 low for a resource. The default is 20, which is
wolffd@0 970 usually far more than necessary. However, on those
wolffd@0 971 occasions where you want to allow more (or fewer),
wolffd@0 972 this is the option to use.
wolffd@0 973
wolffd@0 974 --proxy-user=user
wolffd@0 975 --proxy-password=password
wolffd@0 976 Specify the username user and password password for
wolffd@0 977 authentication on a proxy server. Wget will encode
wolffd@0 978 them using the "basic" authentication scheme.
wolffd@0 979
wolffd@0 980 Security considerations similar to those with
wolffd@0 981 --http-password pertain here as well.
wolffd@0 982
wolffd@0 983 --referer=url
wolffd@0 984 Include `Referer: url' header in HTTP request. Use-
wolffd@0 985 ful for retrieving documents with server-side pro-
wolffd@0 986 cessing that assume they are always being retrieved
wolffd@0 987 by interactive web browsers and only come out prop-
wolffd@0 988 erly when Referer is set to one of the pages that
wolffd@0 989 point to them.
wolffd@0 990
wolffd@0 991 --save-headers
wolffd@0 992 Save the headers sent by the HTTP server to the
wolffd@0 993 file, preceding the actual contents, with an empty
wolffd@0 994 line as the separator.
wolffd@0 995
wolffd@0 996 -U agent-string
wolffd@0 997 --user-agent=agent-string
wolffd@0 998 Identify as agent-string to the HTTP server.
wolffd@0 999
wolffd@0 1000 The HTTP protocol allows the clients to identify
wolffd@0 1001 themselves using a "User-Agent" header field. This
wolffd@0 1002 enables distinguishing the WWW software, usually for
wolffd@0 1003 statistical purposes or for tracing of protocol vio-
wolffd@0 1004 lations. Wget normally identifies as Wget/version,
wolffd@0 1005 version being the current version number of Wget.
wolffd@0 1006
wolffd@0 1007 However, some sites have been known to impose the
wolffd@0 1008 policy of tailoring the output according to the
wolffd@0 1009 "User-Agent"-supplied information. While this is
wolffd@0 1010 not such a bad idea in theory, it has been abused by
wolffd@0 1011 servers denying information to clients other than
wolffd@0 1012 (historically) Netscape or, more frequently,
wolffd@0 1013 Microsoft Internet Explorer. This option allows you
wolffd@0 1014 to change the "User-Agent" line issued by Wget. Use
wolffd@0 1015 of this option is discouraged, unless you really
wolffd@0 1016 know what you are doing.
wolffd@0 1017
wolffd@0 1018 Specifying empty user agent with --user-agent=""
wolffd@0 1019 instructs Wget not to send the "User-Agent" header
wolffd@0 1020 in HTTP requests.
wolffd@0 1021
wolffd@0 1022 --post-data=string
wolffd@0 1023 --post-file=file
wolffd@0 1024 Use POST as the method for all HTTP requests and
wolffd@0 1025 send the specified data in the request body.
wolffd@0 1026 "--post-data" sends string as data, whereas
wolffd@0 1027 "--post-file" sends the contents of file. Other
wolffd@0 1028 than that, they work in exactly the same way.
wolffd@0 1029
wolffd@0 1030 Please be aware that Wget needs to know the size of
wolffd@0 1031 the POST data in advance. Therefore the argument to
wolffd@0 1032 "--post-file" must be a regular file; specifying a
wolffd@0 1033 FIFO or something like /dev/stdin won't work. It's
wolffd@0 1034 not quite clear how to work around this limitation
wolffd@0 1035 inherent in HTTP/1.0. Although HTTP/1.1 introduces
wolffd@0 1036 chunked transfer that doesn't require knowing the
wolffd@0 1037 request length in advance, a client can't use chun-
wolffd@0 1038 ked unless it knows it's talking to an HTTP/1.1
wolffd@0 1039 server. And it can't know that until it receives a
wolffd@0 1040 response, which in turn requires the request to have
wolffd@0 1041 been completed -- a chicken-and-egg problem.
wolffd@0 1042
wolffd@0 1043 Note: if Wget is redirected after the POST request
wolffd@0 1044 is completed, it will not send the POST data to the
wolffd@0 1045 redirected URL. This is because URLs that process
wolffd@0 1046 POST often respond with a redirection to a regular
wolffd@0 1047 page, which does not desire or accept POST. It is
wolffd@0 1048 not completely clear that this behavior is optimal;
wolffd@0 1049 if it doesn't work out, it might be changed in the
wolffd@0 1050 future.
wolffd@0 1051
wolffd@0 1052 This example shows how to log to a server using POST
wolffd@0 1053 and then proceed to download the desired pages, pre-
wolffd@0 1054 sumably only accessible to authorized users:
wolffd@0 1055
wolffd@0 1056 # Log in to the server. This can be done only once.
wolffd@0 1057 wget --save-cookies cookies.txt \
wolffd@0 1058 --post-data 'user=foo&password=bar' \
wolffd@0 1059 http://server.com/auth.php
wolffd@0 1060
wolffd@0 1061 # Now grab the page or pages we care about.
wolffd@0 1062 wget --load-cookies cookies.txt \
wolffd@0 1063 -p http://server.com/interesting/article.php
wolffd@0 1064
wolffd@0 1065 If the server is using session cookies to track user
wolffd@0 1066 authentication, the above will not work because
wolffd@0 1067 --save-cookies will not save them (and neither will
wolffd@0 1068 browsers) and the cookies.txt file will be empty.
wolffd@0 1069 In that case use --keep-session-cookies along with
wolffd@0 1070 --save-cookies to force saving of session cookies.
wolffd@0 1071
wolffd@0 1072 --content-disposition
wolffd@0 1073 If this is set to on, experimental (not fully-func-
wolffd@0 1074 tional) support for "Content-Disposition" headers is
wolffd@0 1075 enabled. This can currently result in extra round-
wolffd@0 1076 trips to the server for a "HEAD" request, and is
wolffd@0 1077 known to suffer from a few bugs, which is why it is
wolffd@0 1078 not currently enabled by default.
wolffd@0 1079
wolffd@0 1080 This option is useful for some file-downloading CGI
wolffd@0 1081 programs that use "Content-Disposition" headers to
wolffd@0 1082 describe what the name of a downloaded file should
wolffd@0 1083 be.
wolffd@0 1084
wolffd@0 1085 --auth-no-challenge
wolffd@0 1086 If this option is given, Wget will send Basic HTTP
wolffd@0 1087 authentication information (plaintext username and
wolffd@0 1088 password) for all requests, just like Wget 1.10.2
wolffd@0 1089 and prior did by default.
wolffd@0 1090
wolffd@0 1091 Use of this option is not recommended, and is
wolffd@0 1092 intended only to support some few obscure servers,
wolffd@0 1093 which never send HTTP authentication challenges, but
wolffd@0 1094 accept unsolicited auth info, say, in addition to
wolffd@0 1095 form-based authentication.
wolffd@0 1096
wolffd@0 1097 HTTPS (SSL/TLS) Options
wolffd@0 1098
wolffd@0 1099 To support encrypted HTTP (HTTPS) downloads, Wget must
wolffd@0 1100 be compiled with an external SSL library, currently
wolffd@0 1101 OpenSSL. If Wget is compiled without SSL support, none
wolffd@0 1102 of these options are available.
wolffd@0 1103
wolffd@0 1104 --secure-protocol=protocol
wolffd@0 1105 Choose the secure protocol to be used. Legal values
wolffd@0 1106 are auto, SSLv2, SSLv3, and TLSv1. If auto is used,
wolffd@0 1107 the SSL library is given the liberty of choosing the
wolffd@0 1108 appropriate protocol automatically, which is
wolffd@0 1109 achieved by sending an SSLv2 greeting and announcing
wolffd@0 1110 support for SSLv3 and TLSv1. This is the default.
wolffd@0 1111
wolffd@0 1112 Specifying SSLv2, SSLv3, or TLSv1 forces the use of
wolffd@0 1113 the corresponding protocol. This is useful when
wolffd@0 1114 talking to old and buggy SSL server implementations
wolffd@0 1115 that make it hard for OpenSSL to choose the correct
wolffd@0 1116 protocol version. Fortunately, such servers are
wolffd@0 1117 quite rare.
wolffd@0 1118
wolffd@0 1119 --no-check-certificate
wolffd@0 1120 Don't check the server certificate against the
wolffd@0 1121 available certificate authorities. Also don't
wolffd@0 1122 require the URL host name to match the common name
wolffd@0 1123 presented by the certificate.
wolffd@0 1124
wolffd@0 1125 As of Wget 1.10, the default is to verify the
wolffd@0 1126 server's certificate against the recognized certifi-
wolffd@0 1127 cate authorities, breaking the SSL handshake and
wolffd@0 1128 aborting the download if the verification fails.
wolffd@0 1129 Although this provides more secure downloads, it
wolffd@0 1130 does break interoperability with some sites that
wolffd@0 1131 worked with previous Wget versions, particularly
wolffd@0 1132 those using self-signed, expired, or otherwise
wolffd@0 1133 invalid certificates. This option forces an "inse-
wolffd@0 1134 cure" mode of operation that turns the certificate
wolffd@0 1135 verification errors into warnings and allows you to
wolffd@0 1136 proceed.
wolffd@0 1137
wolffd@0 1138 If you encounter "certificate verification" errors
wolffd@0 1139 or ones saying that "common name doesn't match
wolffd@0 1140 requested host name", you can use this option to
wolffd@0 1141 bypass the verification and proceed with the down-
wolffd@0 1142 load. Only use this option if you are otherwise
wolffd@0 1143 convinced of the site's authenticity, or if you
wolffd@0 1144 really don't care about the validity of its certifi-
wolffd@0 1145 cate. It is almost always a bad idea not to check
wolffd@0 1146 the certificates when transmitting confidential or
wolffd@0 1147 important data.
wolffd@0 1148
wolffd@0 1149 --certificate=file
wolffd@0 1150 Use the client certificate stored in file. This is
wolffd@0 1151 needed for servers that are configured to require
wolffd@0 1152 certificates from the clients that connect to them.
wolffd@0 1153 Normally a certificate is not required and this
wolffd@0 1154 switch is optional.
wolffd@0 1155
wolffd@0 1156 --certificate-type=type
wolffd@0 1157 Specify the type of the client certificate. Legal
wolffd@0 1158 values are PEM (assumed by default) and DER, also
wolffd@0 1159 known as ASN1.
wolffd@0 1160
wolffd@0 1161 --private-key=file
wolffd@0 1162 Read the private key from file. This allows you to
wolffd@0 1163 provide the private key in a file separate from the
wolffd@0 1164 certificate.
wolffd@0 1165
wolffd@0 1166 --private-key-type=type
wolffd@0 1167 Specify the type of the private key. Accepted val-
wolffd@0 1168 ues are PEM (the default) and DER.
wolffd@0 1169
wolffd@0 1170 --ca-certificate=file
wolffd@0 1171 Use file as the file with the bundle of certificate
wolffd@0 1172 authorities ("CA") to verify the peers. The cer-
wolffd@0 1173 tificates must be in PEM format.
wolffd@0 1174
wolffd@0 1175 Without this option Wget looks for CA certificates
wolffd@0 1176 at the system-specified locations, chosen at OpenSSL
wolffd@0 1177 installation time.
wolffd@0 1178
wolffd@0 1179 --ca-directory=directory
wolffd@0 1180 Specifies directory containing CA certificates in
wolffd@0 1181 PEM format. Each file contains one CA certificate,
wolffd@0 1182 and the file name is based on a hash value derived
wolffd@0 1183 from the certificate. This is achieved by process-
wolffd@0 1184 ing a certificate directory with the "c_rehash"
wolffd@0 1185 utility supplied with OpenSSL. Using --ca-directory
wolffd@0 1186 is more efficient than --ca-certificate when many
wolffd@0 1187 certificates are installed because it allows Wget to
wolffd@0 1188 fetch certificates on demand.
wolffd@0 1189
wolffd@0 1190 Without this option Wget looks for CA certificates
wolffd@0 1191 at the system-specified locations, chosen at OpenSSL
wolffd@0 1192 installation time.
wolffd@0 1193
wolffd@0 1194 --random-file=file
wolffd@0 1195 Use file as the source of random data for seeding
wolffd@0 1196 the pseudo-random number generator on systems with-
wolffd@0 1197 out /dev/random.
wolffd@0 1198
wolffd@0 1199 On such systems the SSL library needs an external
wolffd@0 1200 source of randomness to initialize. Randomness may
wolffd@0 1201 be provided by EGD (see --egd-file below) or read
wolffd@0 1202 from an external source specified by the user. If
wolffd@0 1203 this option is not specified, Wget looks for random
wolffd@0 1204 data in $RANDFILE or, if that is unset, in
wolffd@0 1205 $HOME/.rnd. If none of those are available, it is
wolffd@0 1206 likely that SSL encryption will not be usable.
wolffd@0 1207
wolffd@0 1208 If you're getting the "Could not seed OpenSSL PRNG;
wolffd@0 1209 disabling SSL." error, you should provide random
wolffd@0 1210 data using some of the methods described above.
wolffd@0 1211
wolffd@0 1212 --egd-file=file
wolffd@0 1213 Use file as the EGD socket. EGD stands for Entropy
wolffd@0 1214 Gathering Daemon, a user-space program that collects
wolffd@0 1215 data from various unpredictable system sources and
wolffd@0 1216 makes it available to other programs that might need
wolffd@0 1217 it. Encryption software, such as the SSL library,
wolffd@0 1218 needs sources of non-repeating randomness to seed
wolffd@0 1219 the random number generator used to produce crypto-
wolffd@0 1220 graphically strong keys.
wolffd@0 1221
wolffd@0 1222 OpenSSL allows the user to specify his own source of
wolffd@0 1223 entropy using the "RAND_FILE" environment variable.
wolffd@0 1224 If this variable is unset, or if the specified file
wolffd@0 1225 does not produce enough randomness, OpenSSL will
wolffd@0 1226 read random data from EGD socket specified using
wolffd@0 1227 this option.
wolffd@0 1228
wolffd@0 1229 If this option is not specified (and the equivalent
wolffd@0 1230 startup command is not used), EGD is never con-
wolffd@0 1231 tacted. EGD is not needed on modern Unix systems
wolffd@0 1232 that support /dev/random.
wolffd@0 1233
wolffd@0 1234 FTP Options
wolffd@0 1235
wolffd@0 1236
wolffd@0 1237 --ftp-user=user
wolffd@0 1238 --ftp-password=password
wolffd@0 1239 Specify the username user and password password on
wolffd@0 1240 an FTP server. Without this, or the corresponding
wolffd@0 1241 startup option, the password defaults to -wget@,
wolffd@0 1242 normally used for anonymous FTP.
wolffd@0 1243
wolffd@0 1244 Another way to specify username and password is in
wolffd@0 1245 the URL itself. Either method reveals your password
wolffd@0 1246 to anyone who bothers to run "ps". To prevent the
wolffd@0 1247 passwords from being seen, store them in .wgetrc or
wolffd@0 1248 .netrc, and make sure to protect those files from
wolffd@0 1249 other users with "chmod". If the passwords are
wolffd@0 1250 really important, do not leave them lying in those
wolffd@0 1251 files either---edit the files and delete them after
wolffd@0 1252 Wget has started the download.
wolffd@0 1253
wolffd@0 1254 --no-remove-listing
wolffd@0 1255 Don't remove the temporary .listing files generated
wolffd@0 1256 by FTP retrievals. Normally, these files contain
wolffd@0 1257 the raw directory listings received from FTP
wolffd@0 1258 servers. Not removing them can be useful for debug-
wolffd@0 1259 ging purposes, or when you want to be able to easily
wolffd@0 1260 check on the contents of remote server directories
wolffd@0 1261 (e.g. to verify that a mirror you're running is com-
wolffd@0 1262 plete).
wolffd@0 1263
wolffd@0 1264 Note that even though Wget writes to a known file-
wolffd@0 1265 name for this file, this is not a security hole in
wolffd@0 1266 the scenario of a user making .listing a symbolic
wolffd@0 1267 link to /etc/passwd or something and asking "root"
wolffd@0 1268 to run Wget in his or her directory. Depending on
wolffd@0 1269 the options used, either Wget will refuse to write
wolffd@0 1270 to .listing, making the globbing/recur-
wolffd@0 1271 sion/time-stamping operation fail, or the symbolic
wolffd@0 1272 link will be deleted and replaced with the actual
wolffd@0 1273 .listing file, or the listing will be written to a
wolffd@0 1274 .listing.number file.
wolffd@0 1275
wolffd@0 1276 Even though this situation isn't a problem, though,
wolffd@0 1277 "root" should never run Wget in a non-trusted user's
wolffd@0 1278 directory. A user could do something as simple as
wolffd@0 1279 linking index.html to /etc/passwd and asking "root"
wolffd@0 1280 to run Wget with -N or -r so the file will be over-
wolffd@0 1281 written.
wolffd@0 1282
wolffd@0 1283 --no-glob
wolffd@0 1284 Turn off FTP globbing. Globbing refers to the use
wolffd@0 1285 of shell-like special characters (wildcards), like
wolffd@0 1286 *, ?, [ and ] to retrieve more than one file from
wolffd@0 1287 the same directory at once, like:
wolffd@0 1288
wolffd@0 1289 wget ftp://gnjilux.srk.fer.hr/*.msg
wolffd@0 1290
wolffd@0 1291 By default, globbing will be turned on if the URL
wolffd@0 1292 contains a globbing character. This option may be
wolffd@0 1293 used to turn globbing on or off permanently.
wolffd@0 1294
wolffd@0 1295 You may have to quote the URL to protect it from
wolffd@0 1296 being expanded by your shell. Globbing makes Wget
wolffd@0 1297 look for a directory listing, which is system-spe-
wolffd@0 1298 cific. This is why it currently works only with
wolffd@0 1299 Unix FTP servers (and the ones emulating Unix "ls"
wolffd@0 1300 output).
wolffd@0 1301
wolffd@0 1302 --no-passive-ftp
wolffd@0 1303 Disable the use of the passive FTP transfer mode.
wolffd@0 1304 Passive FTP mandates that the client connect to the
wolffd@0 1305 server to establish the data connection rather than
wolffd@0 1306 the other way around.
wolffd@0 1307
wolffd@0 1308 If the machine is connected to the Internet
wolffd@0 1309 directly, both passive and active FTP should work
wolffd@0 1310 equally well. Behind most firewall and NAT configu-
wolffd@0 1311 rations passive FTP has a better chance of working.
wolffd@0 1312 However, in some rare firewall configurations,
wolffd@0 1313 active FTP actually works when passive FTP doesn't.
wolffd@0 1314 If you suspect this to be the case, use this option,
wolffd@0 1315 or set "passive_ftp=off" in your init file.
wolffd@0 1316
wolffd@0 1317 --retr-symlinks
wolffd@0 1318 Usually, when retrieving FTP directories recursively
wolffd@0 1319 and a symbolic link is encountered, the linked-to
wolffd@0 1320 file is not downloaded. Instead, a matching sym-
wolffd@0 1321 bolic link is created on the local filesystem. The
wolffd@0 1322 pointed-to file will not be downloaded unless this
wolffd@0 1323 recursive retrieval would have encountered it sepa-
wolffd@0 1324 rately and downloaded it anyway.
wolffd@0 1325
wolffd@0 1326 When --retr-symlinks is specified, however, symbolic
wolffd@0 1327 links are traversed and the pointed-to files are
wolffd@0 1328 retrieved. At this time, this option does not cause
wolffd@0 1329 Wget to traverse symlinks to directories and recurse
wolffd@0 1330 through them, but in the future it should be
wolffd@0 1331 enhanced to do this.
wolffd@0 1332
wolffd@0 1333 Note that when retrieving a file (not a directory)
wolffd@0 1334 because it was specified on the command-line, rather
wolffd@0 1335 than because it was recursed to, this option has no
wolffd@0 1336 effect. Symbolic links are always traversed in this
wolffd@0 1337 case.
wolffd@0 1338
wolffd@0 1339 --no-http-keep-alive
wolffd@0 1340 Turn off the "keep-alive" feature for HTTP down-
wolffd@0 1341 loads. Normally, Wget asks the server to keep the
wolffd@0 1342 connection open so that, when you download more than
wolffd@0 1343 one document from the same server, they get trans-
wolffd@0 1344 ferred over the same TCP connection. This saves
wolffd@0 1345 time and at the same time reduces the load on the
wolffd@0 1346 server.
wolffd@0 1347
wolffd@0 1348 This option is useful when, for some reason, persis-
wolffd@0 1349 tent (keep-alive) connections don't work for you,
wolffd@0 1350 for example due to a server bug or due to the
wolffd@0 1351 inability of server-side scripts to cope with the
wolffd@0 1352 connections.
wolffd@0 1353
wolffd@0 1354 Recursive Retrieval Options
wolffd@0 1355
wolffd@0 1356
wolffd@0 1357 -r
wolffd@0 1358 --recursive
wolffd@0 1359 Turn on recursive retrieving.
wolffd@0 1360
wolffd@0 1361 -l depth
wolffd@0 1362 --level=depth
wolffd@0 1363 Specify recursion maximum depth level depth. The
wolffd@0 1364 default maximum depth is 5.
wolffd@0 1365
wolffd@0 1366 --delete-after
wolffd@0 1367 This option tells Wget to delete every single file
wolffd@0 1368 it downloads, after having done so. It is useful
wolffd@0 1369 for pre-fetching popular pages through a proxy,
wolffd@0 1370 e.g.:
wolffd@0 1371
wolffd@0 1372 wget -r -nd --delete-after http://whatever.com/~popular/page/
wolffd@0 1373
wolffd@0 1374 The -r option is to retrieve recursively, and -nd to
wolffd@0 1375 not create directories.
wolffd@0 1376
wolffd@0 1377 Note that --delete-after deletes files on the local
wolffd@0 1378 machine. It does not issue the DELE command to
wolffd@0 1379 remote FTP sites, for instance. Also note that when
wolffd@0 1380 --delete-after is specified, --convert-links is
wolffd@0 1381 ignored, so .orig files are simply not created in
wolffd@0 1382 the first place.
wolffd@0 1383
wolffd@0 1384 -k
wolffd@0 1385 --convert-links
wolffd@0 1386 After the download is complete, convert the links in
wolffd@0 1387 the document to make them suitable for local view-
wolffd@0 1388 ing. This affects not only the visible hyperlinks,
wolffd@0 1389 but any part of the document that links to external
wolffd@0 1390 content, such as embedded images, links to style
wolffd@0 1391 sheets, hyperlinks to non-HTML content, etc.
wolffd@0 1392
wolffd@0 1393 Each link will be changed in one of the two ways:
wolffd@0 1394
wolffd@0 1395 * The links to files that have been downloaded by
wolffd@0 1396 Wget will be changed to refer to the file they
wolffd@0 1397 point to as a relative link.
wolffd@0 1398
wolffd@0 1399 Example: if the downloaded file /foo/doc.html
wolffd@0 1400 links to /bar/img.gif, also downloaded, then the
wolffd@0 1401 link in doc.html will be modified to point to
wolffd@0 1402 ../bar/img.gif. This kind of transformation
wolffd@0 1403 works reliably for arbitrary combinations of
wolffd@0 1404 directories.
wolffd@0 1405
wolffd@0 1406 * The links to files that have not been downloaded
wolffd@0 1407 by Wget will be changed to include host name and
wolffd@0 1408 absolute path of the location they point to.
wolffd@0 1409
wolffd@0 1410 Example: if the downloaded file /foo/doc.html
wolffd@0 1411 links to /bar/img.gif (or to ../bar/img.gif),
wolffd@0 1412 then the link in doc.html will be modified to
wolffd@0 1413 point to http://hostname/bar/img.gif.
wolffd@0 1414
wolffd@0 1415 Because of this, local browsing works reliably: if a
wolffd@0 1416 linked file was downloaded, the link will refer to
wolffd@0 1417 its local name; if it was not downloaded, the link
wolffd@0 1418 will refer to its full Internet address rather than
wolffd@0 1419 presenting a broken link. The fact that the former
wolffd@0 1420 links are converted to relative links ensures that
wolffd@0 1421 you can move the downloaded hierarchy to another
wolffd@0 1422 directory.
wolffd@0 1423
wolffd@0 1424 Note that only at the end of the download can Wget
wolffd@0 1425 know which links have been downloaded. Because of
wolffd@0 1426 that, the work done by -k will be performed at the
wolffd@0 1427 end of all the downloads.
wolffd@0 1428
wolffd@0 1429 -K
wolffd@0 1430 --backup-converted
wolffd@0 1431 When converting a file, back up the original version
wolffd@0 1432 with a .orig suffix. Affects the behavior of -N.
wolffd@0 1433
wolffd@0 1434 -m
wolffd@0 1435 --mirror
wolffd@0 1436 Turn on options suitable for mirroring. This option
wolffd@0 1437 turns on recursion and time-stamping, sets infinite
wolffd@0 1438 recursion depth and keeps FTP directory listings.
wolffd@0 1439 It is currently equivalent to -r -N -l inf
wolffd@0 1440 --no-remove-listing.
wolffd@0 1441
wolffd@0 1442 -p
wolffd@0 1443 --page-requisites
wolffd@0 1444 This option causes Wget to download all the files
wolffd@0 1445 that are necessary to properly display a given HTML
wolffd@0 1446 page. This includes such things as inlined images,
wolffd@0 1447 sounds, and referenced stylesheets.
wolffd@0 1448
wolffd@0 1449 Ordinarily, when downloading a single HTML page, any
wolffd@0 1450 requisite documents that may be needed to display it
wolffd@0 1451 properly are not downloaded. Using -r together with
wolffd@0 1452 -l can help, but since Wget does not ordinarily dis-
wolffd@0 1453 tinguish between external and inlined documents, one
wolffd@0 1454 is generally left with "leaf documents" that are
wolffd@0 1455 missing their requisites.
wolffd@0 1456
wolffd@0 1457 For instance, say document 1.html contains an
wolffd@0 1458 "<IMG>" tag referencing 1.gif and an "<A>" tag
wolffd@0 1459 pointing to external document 2.html. Say that
wolffd@0 1460 2.html is similar but that its image is 2.gif and it
wolffd@0 1461 links to 3.html. Say this continues up to some
wolffd@0 1462 arbitrarily high number.
wolffd@0 1463
wolffd@0 1464 If one executes the command:
wolffd@0 1465
wolffd@0 1466 wget -r -l 2 http://<site>/1.html
wolffd@0 1467
wolffd@0 1468 then 1.html, 1.gif, 2.html, 2.gif, and 3.html will
wolffd@0 1469 be downloaded. As you can see, 3.html is without
wolffd@0 1470 its requisite 3.gif because Wget is simply counting
wolffd@0 1471 the number of hops (up to 2) away from 1.html in
wolffd@0 1472 order to determine where to stop the recursion.
wolffd@0 1473 However, with this command:
wolffd@0 1474
wolffd@0 1475 wget -r -l 2 -p http://<site>/1.html
wolffd@0 1476
wolffd@0 1477 all the above files and 3.html's requisite 3.gif
wolffd@0 1478 will be downloaded. Similarly,
wolffd@0 1479
wolffd@0 1480 wget -r -l 1 -p http://<site>/1.html
wolffd@0 1481
wolffd@0 1482 will cause 1.html, 1.gif, 2.html, and 2.gif to be
wolffd@0 1483 downloaded. One might think that:
wolffd@0 1484
wolffd@0 1485 wget -r -l 0 -p http://<site>/1.html
wolffd@0 1486
wolffd@0 1487 would download just 1.html and 1.gif, but unfortu-
wolffd@0 1488 nately this is not the case, because -l 0 is equiva-
wolffd@0 1489 lent to -l inf---that is, infinite recursion. To
wolffd@0 1490 download a single HTML page (or a handful of them,
wolffd@0 1491 all specified on the command-line or in a -i URL
wolffd@0 1492 input file) and its (or their) requisites, simply
wolffd@0 1493 leave off -r and -l:
wolffd@0 1494
wolffd@0 1495 wget -p http://<site>/1.html
wolffd@0 1496
wolffd@0 1497 Note that Wget will behave as if -r had been speci-
wolffd@0 1498 fied, but only that single page and its requisites
wolffd@0 1499 will be downloaded. Links from that page to exter-
wolffd@0 1500 nal documents will not be followed. Actually, to
wolffd@0 1501 download a single page and all its requisites (even
wolffd@0 1502 if they exist on separate websites), and make sure
wolffd@0 1503 the lot displays properly locally, this author likes
wolffd@0 1504 to use a few options in addition to -p:
wolffd@0 1505
wolffd@0 1506 wget -E -H -k -K -p http://<site>/<document>
wolffd@0 1507
wolffd@0 1508 To finish off this topic, it's worth knowing that
wolffd@0 1509 Wget's idea of an external document link is any URL
wolffd@0 1510 specified in an "<A>" tag, an "<AREA>" tag, or a
wolffd@0 1511 "<LINK>" tag other than "<LINK REL="stylesheet">".
wolffd@0 1512
wolffd@0 1513 --strict-comments
wolffd@0 1514 Turn on strict parsing of HTML comments. The
wolffd@0 1515 default is to terminate comments at the first occur-
wolffd@0 1516 rence of -->.
wolffd@0 1517
wolffd@0 1518 According to specifications, HTML comments are
wolffd@0 1519 expressed as SGML declarations. Declaration is spe-
wolffd@0 1520 cial markup that begins with <! and ends with >,
wolffd@0 1521 such as <!DOCTYPE ...>, that may contain comments
wolffd@0 1522 between a pair of -- delimiters. HTML comments are
wolffd@0 1523 "empty declarations", SGML declarations without any
wolffd@0 1524 non-comment text. Therefore, <!--foo--> is a valid
wolffd@0 1525 comment, and so is <!--one-- --two-->, but
wolffd@0 1526 <!--1--2--> is not.
wolffd@0 1527
wolffd@0 1528 On the other hand, most HTML writers don't perceive
wolffd@0 1529 comments as anything other than text delimited with
wolffd@0 1530 <!-- and -->, which is not quite the same. For
wolffd@0 1531 example, something like <!------------> works as a
wolffd@0 1532 valid comment as long as the number of dashes is a
wolffd@0 1533 multiple of four (!). If not, the comment techni-
wolffd@0 1534 cally lasts until the next --, which may be at the
wolffd@0 1535 other end of the document. Because of this, many
wolffd@0 1536 popular browsers completely ignore the specification
wolffd@0 1537 and implement what users have come to expect: com-
wolffd@0 1538 ments delimited with <!-- and -->.
wolffd@0 1539
wolffd@0 1540 Until version 1.9, Wget interpreted comments
wolffd@0 1541 strictly, which resulted in missing links in many
wolffd@0 1542 web pages that displayed fine in browsers, but had
wolffd@0 1543 the misfortune of containing non-compliant comments.
wolffd@0 1544 Beginning with version 1.9, Wget has joined the
wolffd@0 1545 ranks of clients that implements "naive" comments,
wolffd@0 1546 terminating each comment at the first occurrence of
wolffd@0 1547 -->.
wolffd@0 1548
wolffd@0 1549 If, for whatever reason, you want strict comment
wolffd@0 1550 parsing, use this option to turn it on.
wolffd@0 1551
wolffd@0 1552 Recursive Accept/Reject Options
wolffd@0 1553
wolffd@0 1554
wolffd@0 1555 -A acclist --accept acclist
wolffd@0 1556 -R rejlist --reject rejlist
wolffd@0 1557 Specify comma-separated lists of file name suffixes
wolffd@0 1558 or patterns to accept or reject. Note that if any of
wolffd@0 1559 the wildcard characters, *, ?, [ or ], appear in an
wolffd@0 1560 element of acclist or rejlist, it will be treated as
wolffd@0 1561 a pattern, rather than a suffix.
wolffd@0 1562
wolffd@0 1563 -D domain-list
wolffd@0 1564 --domains=domain-list
wolffd@0 1565 Set domains to be followed. domain-list is a comma-
wolffd@0 1566 separated list of domains. Note that it does not
wolffd@0 1567 turn on -H.
wolffd@0 1568
wolffd@0 1569 --exclude-domains domain-list
wolffd@0 1570 Specify the domains that are not to be followed..
wolffd@0 1571
wolffd@0 1572 --follow-ftp
wolffd@0 1573 Follow FTP links from HTML documents. Without this
wolffd@0 1574 option, Wget will ignore all the FTP links.
wolffd@0 1575
wolffd@0 1576 --follow-tags=list
wolffd@0 1577 Wget has an internal table of HTML tag / attribute
wolffd@0 1578 pairs that it considers when looking for linked doc-
wolffd@0 1579 uments during a recursive retrieval. If a user
wolffd@0 1580 wants only a subset of those tags to be considered,
wolffd@0 1581 however, he or she should be specify such tags in a
wolffd@0 1582 comma-separated list with this option.
wolffd@0 1583
wolffd@0 1584 --ignore-tags=list
wolffd@0 1585 This is the opposite of the --follow-tags option.
wolffd@0 1586 To skip certain HTML tags when recursively looking
wolffd@0 1587 for documents to download, specify them in a comma-
wolffd@0 1588 separated list.
wolffd@0 1589
wolffd@0 1590 In the past, this option was the best bet for down-
wolffd@0 1591 loading a single page and its requisites, using a
wolffd@0 1592 command-line like:
wolffd@0 1593
wolffd@0 1594 wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>
wolffd@0 1595
wolffd@0 1596 However, the author of this option came across a
wolffd@0 1597 page with tags like "<LINK REL="home" HREF="/">" and
wolffd@0 1598 came to the realization that specifying tags to
wolffd@0 1599 ignore was not enough. One can't just tell Wget to
wolffd@0 1600 ignore "<LINK>", because then stylesheets will not
wolffd@0 1601 be downloaded. Now the best bet for downloading a
wolffd@0 1602 single page and its requisites is the dedicated
wolffd@0 1603 --page-requisites option.
wolffd@0 1604
wolffd@0 1605 --ignore-case
wolffd@0 1606 Ignore case when matching files and directories.
wolffd@0 1607 This influences the behavior of -R, -A, -I, and -X
wolffd@0 1608 options, as well as globbing implemented when down-
wolffd@0 1609 loading from FTP sites. For example, with this
wolffd@0 1610 option, -A *.txt will match file1.txt, but also
wolffd@0 1611 file2.TXT, file3.TxT, and so on.
wolffd@0 1612
wolffd@0 1613 -H
wolffd@0 1614 --span-hosts
wolffd@0 1615 Enable spanning across hosts when doing recursive
wolffd@0 1616 retrieving.
wolffd@0 1617
wolffd@0 1618 -L
wolffd@0 1619 --relative
wolffd@0 1620 Follow relative links only. Useful for retrieving a
wolffd@0 1621 specific home page without any distractions, not
wolffd@0 1622 even those from the same hosts.
wolffd@0 1623
wolffd@0 1624 -I list
wolffd@0 1625 --include-directories=list
wolffd@0 1626 Specify a comma-separated list of directories you
wolffd@0 1627 wish to follow when downloading. Elements of list
wolffd@0 1628 may contain wildcards.
wolffd@0 1629
wolffd@0 1630 -X list
wolffd@0 1631 --exclude-directories=list
wolffd@0 1632 Specify a comma-separated list of directories you
wolffd@0 1633 wish to exclude from download. Elements of list may
wolffd@0 1634 contain wildcards.
wolffd@0 1635
wolffd@0 1636 -np
wolffd@0 1637 --no-parent
wolffd@0 1638 Do not ever ascend to the parent directory when
wolffd@0 1639 retrieving recursively. This is a useful option,
wolffd@0 1640 since it guarantees that only the files below a cer-
wolffd@0 1641 tain hierarchy will be downloaded.
wolffd@0 1642
wolffd@0 1643 FILES
wolffd@0 1644 /usr/local/etc/wgetrc
wolffd@0 1645 Default location of the global startup file.
wolffd@0 1646
wolffd@0 1647 .wgetrc
wolffd@0 1648 User startup file.
wolffd@0 1649
wolffd@0 1650 BUGS
wolffd@0 1651 You are welcome to submit bug reports via the GNU Wget
wolffd@0 1652 bug tracker (see <http://wget.addictivecode.org/Bug-
wolffd@0 1653 Tracker>).
wolffd@0 1654
wolffd@0 1655 Before actually submitting a bug report, please try to
wolffd@0 1656 follow a few simple guidelines.
wolffd@0 1657
wolffd@0 1658 1. Please try to ascertain that the behavior you see
wolffd@0 1659 really is a bug. If Wget crashes, it's a bug. If
wolffd@0 1660 Wget does not behave as documented, it's a bug. If
wolffd@0 1661 things work strange, but you are not sure about the
wolffd@0 1662 way they are supposed to work, it might well be a
wolffd@0 1663 bug, but you might want to double-check the documen-
wolffd@0 1664 tation and the mailing lists.
wolffd@0 1665
wolffd@0 1666 2. Try to repeat the bug in as simple circumstances as
wolffd@0 1667 possible. E.g. if Wget crashes while downloading
wolffd@0 1668 wget -rl0 -kKE -t5 --no-proxy http://yoyodyne.com -o
wolffd@0 1669 /tmp/log, you should try to see if the crash is
wolffd@0 1670 repeatable, and if will occur with a simpler set of
wolffd@0 1671 options. You might even try to start the download
wolffd@0 1672 at the page where the crash occurred to see if that
wolffd@0 1673 page somehow triggered the crash.
wolffd@0 1674
wolffd@0 1675 Also, while I will probably be interested to know
wolffd@0 1676 the contents of your .wgetrc file, just dumping it
wolffd@0 1677 into the debug message is probably a bad idea.
wolffd@0 1678 Instead, you should first try to see if the bug
wolffd@0 1679 repeats with .wgetrc moved out of the way. Only if
wolffd@0 1680 it turns out that .wgetrc settings affect the bug,
wolffd@0 1681 mail me the relevant parts of the file.
wolffd@0 1682
wolffd@0 1683 3. Please start Wget with -d option and send us the
wolffd@0 1684 resulting output (or relevant parts thereof). If
wolffd@0 1685 Wget was compiled without debug support, recompile
wolffd@0 1686 it---it is much easier to trace bugs with debug sup-
wolffd@0 1687 port on.
wolffd@0 1688
wolffd@0 1689 Note: please make sure to remove any potentially
wolffd@0 1690 sensitive information from the debug log before
wolffd@0 1691 sending it to the bug address. The "-d" won't go
wolffd@0 1692 out of its way to collect sensitive information, but
wolffd@0 1693 the log will contain a fairly complete transcript of
wolffd@0 1694 Wget's communication with the server, which may
wolffd@0 1695 include passwords and pieces of downloaded data.
wolffd@0 1696 Since the bug address is publically archived, you
wolffd@0 1697 may assume that all bug reports are visible to the
wolffd@0 1698 public.
wolffd@0 1699
wolffd@0 1700 4. If Wget has crashed, try to run it in a debugger,
wolffd@0 1701 e.g. "gdb `which wget` core" and type "where" to get
wolffd@0 1702 the backtrace. This may not work if the system
wolffd@0 1703 administrator has disabled core files, but it is
wolffd@0 1704 safe to try.
wolffd@0 1705
wolffd@0 1706 SEE ALSO
wolffd@0 1707 This is not the complete manual for GNU Wget. For more
wolffd@0 1708 complete information, including more detailed explana-
wolffd@0 1709 tions of some of the options, and a number of commands
wolffd@0 1710 available for use with .wgetrc files and the -e option,
wolffd@0 1711 see the GNU Info entry for wget.
wolffd@0 1712
wolffd@0 1713 AUTHOR
wolffd@0 1714 Originally written by Hrvoje Niksic
wolffd@0 1715 <hniksic@xemacs.org>. Currently maintained by Micah
wolffd@0 1716 Cowan <micah@cowan.name>.
wolffd@0 1717
wolffd@0 1718 COPYRIGHT
wolffd@0 1719 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002,
wolffd@0 1720 2003, 2004, 2005, 2006, 2007, 2008 Free Software Founda-
wolffd@0 1721 tion, Inc.
wolffd@0 1722
wolffd@0 1723 Permission is granted to copy, distribute and/or modify
wolffd@0 1724 this document under the terms of the GNU Free Documenta-
wolffd@0 1725 tion License, Version 1.2 or any later version published
wolffd@0 1726 by the Free Software Foundation; with no Invariant Sec-
wolffd@0 1727 tions, no Front-Cover Texts, and no Back-Cover Texts. A
wolffd@0 1728 copy of the license is included in the section entitled
wolffd@0 1729 "GNU Free Documentation License".
wolffd@0 1730
wolffd@0 1731
wolffd@0 1732
wolffd@0 1733 GNU Wget 1.11.4 2008-06-29 WGET(1)