wolffd@0
|
1 WGET(1) GNU Wget WGET(1)
|
wolffd@0
|
2
|
wolffd@0
|
3
|
wolffd@0
|
4
|
wolffd@0
|
5 NAME
|
wolffd@0
|
6 Wget - The non-interactive network downloader.
|
wolffd@0
|
7
|
wolffd@0
|
8 SYNOPSIS
|
wolffd@0
|
9 wget [option]... [URL]...
|
wolffd@0
|
10
|
wolffd@0
|
11 DESCRIPTION
|
wolffd@0
|
12 GNU Wget is a free utility for non-interactive download
|
wolffd@0
|
13 of files from the Web. It supports HTTP, HTTPS, and FTP
|
wolffd@0
|
14 protocols, as well as retrieval through HTTP proxies.
|
wolffd@0
|
15
|
wolffd@0
|
16 Wget is non-interactive, meaning that it can work in the
|
wolffd@0
|
17 background, while the user is not logged on. This
|
wolffd@0
|
18 allows you to start a retrieval and disconnect from the
|
wolffd@0
|
19 system, letting Wget finish the work. By contrast, most
|
wolffd@0
|
20 of the Web browsers require constant user's presence,
|
wolffd@0
|
21 which can be a great hindrance when transferring a lot
|
wolffd@0
|
22 of data.
|
wolffd@0
|
23
|
wolffd@0
|
24 Wget can follow links in HTML and XHTML pages and create
|
wolffd@0
|
25 local versions of remote web sites, fully recreating the
|
wolffd@0
|
26 directory structure of the original site. This is some-
|
wolffd@0
|
27 times referred to as "recursive downloading." While
|
wolffd@0
|
28 doing that, Wget respects the Robot Exclusion Standard
|
wolffd@0
|
29 (/robots.txt). Wget can be instructed to convert the
|
wolffd@0
|
30 links in downloaded HTML files to the local files for
|
wolffd@0
|
31 offline viewing.
|
wolffd@0
|
32
|
wolffd@0
|
33 Wget has been designed for robustness over slow or
|
wolffd@0
|
34 unstable network connections; if a download fails due to
|
wolffd@0
|
35 a network problem, it will keep retrying until the whole
|
wolffd@0
|
36 file has been retrieved. If the server supports reget-
|
wolffd@0
|
37 ting, it will instruct the server to continue the down-
|
wolffd@0
|
38 load from where it left off.
|
wolffd@0
|
39
|
wolffd@0
|
40 OPTIONS
|
wolffd@0
|
41 Option Syntax
|
wolffd@0
|
42
|
wolffd@0
|
43 Since Wget uses GNU getopt to process command-line argu-
|
wolffd@0
|
44 ments, every option has a long form along with the short
|
wolffd@0
|
45 one. Long options are more convenient to remember, but
|
wolffd@0
|
46 take time to type. You may freely mix different option
|
wolffd@0
|
47 styles, or specify options after the command-line argu-
|
wolffd@0
|
48 ments. Thus you may write:
|
wolffd@0
|
49
|
wolffd@0
|
50 wget -r --tries=10 http://fly.srk.fer.hr/ -o log
|
wolffd@0
|
51
|
wolffd@0
|
52 The space between the option accepting an argument and
|
wolffd@0
|
53 the argument may be omitted. Instead of -o log you can
|
wolffd@0
|
54 write -olog.
|
wolffd@0
|
55
|
wolffd@0
|
56 You may put several options that do not require argu-
|
wolffd@0
|
57 ments together, like:
|
wolffd@0
|
58
|
wolffd@0
|
59 wget -drc <URL>
|
wolffd@0
|
60
|
wolffd@0
|
61 This is a complete equivalent of:
|
wolffd@0
|
62
|
wolffd@0
|
63 wget -d -r -c <URL>
|
wolffd@0
|
64
|
wolffd@0
|
65 Since the options can be specified after the arguments,
|
wolffd@0
|
66 you may terminate them with --. So the following will
|
wolffd@0
|
67 try to download URL -x, reporting failure to log:
|
wolffd@0
|
68
|
wolffd@0
|
69 wget -o log -- -x
|
wolffd@0
|
70
|
wolffd@0
|
71 The options that accept comma-separated lists all
|
wolffd@0
|
72 respect the convention that specifying an empty list
|
wolffd@0
|
73 clears its value. This can be useful to clear the
|
wolffd@0
|
74 .wgetrc settings. For instance, if your .wgetrc sets
|
wolffd@0
|
75 "exclude_directories" to /cgi-bin, the following example
|
wolffd@0
|
76 will first reset it, and then set it to exclude /~nobody
|
wolffd@0
|
77 and /~somebody. You can also clear the lists in
|
wolffd@0
|
78 .wgetrc.
|
wolffd@0
|
79
|
wolffd@0
|
80 wget -X " -X /~nobody,/~somebody
|
wolffd@0
|
81
|
wolffd@0
|
82 Most options that do not accept arguments are boolean
|
wolffd@0
|
83 options, so named because their state can be captured
|
wolffd@0
|
84 with a yes-or-no ("boolean") variable. For example,
|
wolffd@0
|
85 --follow-ftp tells Wget to follow FTP links from HTML
|
wolffd@0
|
86 files and, on the other hand, --no-glob tells it not to
|
wolffd@0
|
87 perform file globbing on FTP URLs. A boolean option is
|
wolffd@0
|
88 either affirmative or negative (beginning with --no).
|
wolffd@0
|
89 All such options share several properties.
|
wolffd@0
|
90
|
wolffd@0
|
91 Unless stated otherwise, it is assumed that the default
|
wolffd@0
|
92 behavior is the opposite of what the option accom-
|
wolffd@0
|
93 plishes. For example, the documented existence of
|
wolffd@0
|
94 --follow-ftp assumes that the default is to not follow
|
wolffd@0
|
95 FTP links from HTML pages.
|
wolffd@0
|
96
|
wolffd@0
|
97 Affirmative options can be negated by prepending the
|
wolffd@0
|
98 --no- to the option name; negative options can be
|
wolffd@0
|
99 negated by omitting the --no- prefix. This might seem
|
wolffd@0
|
100 superfluous---if the default for an affirmative option
|
wolffd@0
|
101 is to not do something, then why provide a way to
|
wolffd@0
|
102 explicitly turn it off? But the startup file may in
|
wolffd@0
|
103 fact change the default. For instance, using "fol-
|
wolffd@0
|
104 low_ftp = off" in .wgetrc makes Wget not follow FTP
|
wolffd@0
|
105 links by default, and using --no-follow-ftp is the only
|
wolffd@0
|
106 way to restore the factory default from the command
|
wolffd@0
|
107 line.
|
wolffd@0
|
108
|
wolffd@0
|
109 Basic Startup Options
|
wolffd@0
|
110
|
wolffd@0
|
111
|
wolffd@0
|
112 -V
|
wolffd@0
|
113 --version
|
wolffd@0
|
114 Display the version of Wget.
|
wolffd@0
|
115
|
wolffd@0
|
116 -h
|
wolffd@0
|
117 --help
|
wolffd@0
|
118 Print a help message describing all of Wget's com-
|
wolffd@0
|
119 mand-line options.
|
wolffd@0
|
120
|
wolffd@0
|
121 -b
|
wolffd@0
|
122 --background
|
wolffd@0
|
123 Go to background immediately after startup. If no
|
wolffd@0
|
124 output file is specified via the -o, output is redi-
|
wolffd@0
|
125 rected to wget-log.
|
wolffd@0
|
126
|
wolffd@0
|
127 -e command
|
wolffd@0
|
128 --execute command
|
wolffd@0
|
129 Execute command as if it were a part of .wgetrc. A
|
wolffd@0
|
130 command thus invoked will be executed after the com-
|
wolffd@0
|
131 mands in .wgetrc, thus taking precedence over them.
|
wolffd@0
|
132 If you need to specify more than one wgetrc command,
|
wolffd@0
|
133 use multiple instances of -e.
|
wolffd@0
|
134
|
wolffd@0
|
135 Logging and Input File Options
|
wolffd@0
|
136
|
wolffd@0
|
137
|
wolffd@0
|
138 -o logfile
|
wolffd@0
|
139 --output-file=logfile
|
wolffd@0
|
140 Log all messages to logfile. The messages are nor-
|
wolffd@0
|
141 mally reported to standard error.
|
wolffd@0
|
142
|
wolffd@0
|
143 -a logfile
|
wolffd@0
|
144 --append-output=logfile
|
wolffd@0
|
145 Append to logfile. This is the same as -o, only it
|
wolffd@0
|
146 appends to logfile instead of overwriting the old
|
wolffd@0
|
147 log file. If logfile does not exist, a new file is
|
wolffd@0
|
148 created.
|
wolffd@0
|
149
|
wolffd@0
|
150 -d
|
wolffd@0
|
151 --debug
|
wolffd@0
|
152 Turn on debug output, meaning various information
|
wolffd@0
|
153 important to the developers of Wget if it does not
|
wolffd@0
|
154 work properly. Your system administrator may have
|
wolffd@0
|
155 chosen to compile Wget without debug support, in
|
wolffd@0
|
156 which case -d will not work. Please note that com-
|
wolffd@0
|
157 piling with debug support is always safe---Wget com-
|
wolffd@0
|
158 piled with the debug support will not print any
|
wolffd@0
|
159 debug info unless requested with -d.
|
wolffd@0
|
160
|
wolffd@0
|
161 -q
|
wolffd@0
|
162 --quiet
|
wolffd@0
|
163 Turn off Wget's output.
|
wolffd@0
|
164
|
wolffd@0
|
165 -v
|
wolffd@0
|
166 --verbose
|
wolffd@0
|
167 Turn on verbose output, with all the available data.
|
wolffd@0
|
168 The default output is verbose.
|
wolffd@0
|
169
|
wolffd@0
|
170 -nv
|
wolffd@0
|
171 --no-verbose
|
wolffd@0
|
172 Turn off verbose without being completely quiet (use
|
wolffd@0
|
173 -q for that), which means that error messages and
|
wolffd@0
|
174 basic information still get printed.
|
wolffd@0
|
175
|
wolffd@0
|
176 -i file
|
wolffd@0
|
177 --input-file=file
|
wolffd@0
|
178 Read URLs from file. If - is specified as file,
|
wolffd@0
|
179 URLs are read from the standard input. (Use ./- to
|
wolffd@0
|
180 read from a file literally named -.)
|
wolffd@0
|
181
|
wolffd@0
|
182 If this function is used, no URLs need be present on
|
wolffd@0
|
183 the command line. If there are URLs both on the
|
wolffd@0
|
184 command line and in an input file, those on the com-
|
wolffd@0
|
185 mand lines will be the first ones to be retrieved.
|
wolffd@0
|
186 The file need not be an HTML document (but no harm
|
wolffd@0
|
187 if it is)---it is enough if the URLs are just listed
|
wolffd@0
|
188 sequentially.
|
wolffd@0
|
189
|
wolffd@0
|
190 However, if you specify --force-html, the document
|
wolffd@0
|
191 will be regarded as html. In that case you may have
|
wolffd@0
|
192 problems with relative links, which you can solve
|
wolffd@0
|
193 either by adding "<base href="url">" to the docu-
|
wolffd@0
|
194 ments or by specifying --base=url on the command
|
wolffd@0
|
195 line.
|
wolffd@0
|
196
|
wolffd@0
|
197 -F
|
wolffd@0
|
198 --force-html
|
wolffd@0
|
199 When input is read from a file, force it to be
|
wolffd@0
|
200 treated as an HTML file. This enables you to
|
wolffd@0
|
201 retrieve relative links from existing HTML files on
|
wolffd@0
|
202 your local disk, by adding "<base href="url">" to
|
wolffd@0
|
203 HTML, or using the --base command-line option.
|
wolffd@0
|
204
|
wolffd@0
|
205 -B URL
|
wolffd@0
|
206 --base=URL
|
wolffd@0
|
207 Prepends URL to relative links read from the file
|
wolffd@0
|
208 specified with the -i option.
|
wolffd@0
|
209
|
wolffd@0
|
210 Download Options
|
wolffd@0
|
211
|
wolffd@0
|
212
|
wolffd@0
|
213 --bind-address=ADDRESS
|
wolffd@0
|
214 When making client TCP/IP connections, bind to
|
wolffd@0
|
215 ADDRESS on the local machine. ADDRESS may be speci-
|
wolffd@0
|
216 fied as a hostname or IP address. This option can
|
wolffd@0
|
217 be useful if your machine is bound to multiple IPs.
|
wolffd@0
|
218
|
wolffd@0
|
219 -t number
|
wolffd@0
|
220 --tries=number
|
wolffd@0
|
221 Set number of retries to number. Specify 0 or inf
|
wolffd@0
|
222 for infinite retrying. The default is to retry 20
|
wolffd@0
|
223 times, with the exception of fatal errors like "con-
|
wolffd@0
|
224 nection refused" or "not found" (404), which are not
|
wolffd@0
|
225 retried.
|
wolffd@0
|
226
|
wolffd@0
|
227 -O file
|
wolffd@0
|
228 --output-document=file
|
wolffd@0
|
229 The documents will not be written to the appropriate
|
wolffd@0
|
230 files, but all will be concatenated together and
|
wolffd@0
|
231 written to file. If - is used as file, documents
|
wolffd@0
|
232 will be printed to standard output, disabling link
|
wolffd@0
|
233 conversion. (Use ./- to print to a file literally
|
wolffd@0
|
234 named -.)
|
wolffd@0
|
235
|
wolffd@0
|
236 Use of -O is not intended to mean simply "use the
|
wolffd@0
|
237 name file instead of the one in the URL;" rather, it
|
wolffd@0
|
238 is analogous to shell redirection: wget -O file
|
wolffd@0
|
239 http://foo is intended to work like wget -O -
|
wolffd@0
|
240 http://foo > file; file will be truncated immedi-
|
wolffd@0
|
241 ately, and all downloaded content will be written
|
wolffd@0
|
242 there.
|
wolffd@0
|
243
|
wolffd@0
|
244 For this reason, -N (for timestamp-checking) is not
|
wolffd@0
|
245 supported in combination with -O: since file is
|
wolffd@0
|
246 always newly created, it will always have a very new
|
wolffd@0
|
247 timestamp. A warning will be issued if this combina-
|
wolffd@0
|
248 tion is used.
|
wolffd@0
|
249
|
wolffd@0
|
250 Similarly, using -r or -p with -O may not work as
|
wolffd@0
|
251 you expect: Wget won't just download the first file
|
wolffd@0
|
252 to file and then download the rest to their normal
|
wolffd@0
|
253 names: all downloaded content will be placed in
|
wolffd@0
|
254 file. This was disabled in version 1.11, but has
|
wolffd@0
|
255 been reinstated (with a warning) in 1.11.2, as there
|
wolffd@0
|
256 are some cases where this behavior can actually have
|
wolffd@0
|
257 some use.
|
wolffd@0
|
258
|
wolffd@0
|
259 Note that a combination with -k is only permitted
|
wolffd@0
|
260 when downloading a single document, as in that case
|
wolffd@0
|
261 it will just convert all relative URIs to external
|
wolffd@0
|
262 ones; -k makes no sense for multiple URIs when
|
wolffd@0
|
263 they're all being downloaded to a single file.
|
wolffd@0
|
264
|
wolffd@0
|
265 -nc
|
wolffd@0
|
266 --no-clobber
|
wolffd@0
|
267 If a file is downloaded more than once in the same
|
wolffd@0
|
268 directory, Wget's behavior depends on a few options,
|
wolffd@0
|
269 including -nc. In certain cases, the local file
|
wolffd@0
|
270 will be clobbered, or overwritten, upon repeated
|
wolffd@0
|
271 download. In other cases it will be preserved.
|
wolffd@0
|
272
|
wolffd@0
|
273 When running Wget without -N, -nc, -r, or p, down-
|
wolffd@0
|
274 loading the same file in the same directory will
|
wolffd@0
|
275 result in the original copy of file being preserved
|
wolffd@0
|
276 and the second copy being named file.1. If that
|
wolffd@0
|
277 file is downloaded yet again, the third copy will be
|
wolffd@0
|
278 named file.2, and so on. When -nc is specified,
|
wolffd@0
|
279 this behavior is suppressed, and Wget will refuse to
|
wolffd@0
|
280 download newer copies of file. Therefore,
|
wolffd@0
|
281 ""no-clobber"" is actually a misnomer in this
|
wolffd@0
|
282 mode---it's not clobbering that's prevented (as the
|
wolffd@0
|
283 numeric suffixes were already preventing clobber-
|
wolffd@0
|
284 ing), but rather the multiple version saving that's
|
wolffd@0
|
285 prevented.
|
wolffd@0
|
286
|
wolffd@0
|
287 When running Wget with -r or -p, but without -N or
|
wolffd@0
|
288 -nc, re-downloading a file will result in the new
|
wolffd@0
|
289 copy simply overwriting the old. Adding -nc will
|
wolffd@0
|
290 prevent this behavior, instead causing the original
|
wolffd@0
|
291 version to be preserved and any newer copies on the
|
wolffd@0
|
292 server to be ignored.
|
wolffd@0
|
293
|
wolffd@0
|
294 When running Wget with -N, with or without -r or -p,
|
wolffd@0
|
295 the decision as to whether or not to download a
|
wolffd@0
|
296 newer copy of a file depends on the local and remote
|
wolffd@0
|
297 timestamp and size of the file. -nc may not be
|
wolffd@0
|
298 specified at the same time as -N.
|
wolffd@0
|
299
|
wolffd@0
|
300 Note that when -nc is specified, files with the suf-
|
wolffd@0
|
301 fixes .html or .htm will be loaded from the local
|
wolffd@0
|
302 disk and parsed as if they had been retrieved from
|
wolffd@0
|
303 the Web.
|
wolffd@0
|
304
|
wolffd@0
|
305 -c
|
wolffd@0
|
306 --continue
|
wolffd@0
|
307 Continue getting a partially-downloaded file. This
|
wolffd@0
|
308 is useful when you want to finish up a download
|
wolffd@0
|
309 started by a previous instance of Wget, or by
|
wolffd@0
|
310 another program. For instance:
|
wolffd@0
|
311
|
wolffd@0
|
312 wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
|
wolffd@0
|
313
|
wolffd@0
|
314 If there is a file named ls-lR.Z in the current
|
wolffd@0
|
315 directory, Wget will assume that it is the first
|
wolffd@0
|
316 portion of the remote file, and will ask the server
|
wolffd@0
|
317 to continue the retrieval from an offset equal to
|
wolffd@0
|
318 the length of the local file.
|
wolffd@0
|
319
|
wolffd@0
|
320 Note that you don't need to specify this option if
|
wolffd@0
|
321 you just want the current invocation of Wget to
|
wolffd@0
|
322 retry downloading a file should the connection be
|
wolffd@0
|
323 lost midway through. This is the default behavior.
|
wolffd@0
|
324 -c only affects resumption of downloads started
|
wolffd@0
|
325 prior to this invocation of Wget, and whose local
|
wolffd@0
|
326 files are still sitting around.
|
wolffd@0
|
327
|
wolffd@0
|
328 Without -c, the previous example would just download
|
wolffd@0
|
329 the remote file to ls-lR.Z.1, leaving the truncated
|
wolffd@0
|
330 ls-lR.Z file alone.
|
wolffd@0
|
331
|
wolffd@0
|
332 Beginning with Wget 1.7, if you use -c on a non-
|
wolffd@0
|
333 empty file, and it turns out that the server does
|
wolffd@0
|
334 not support continued downloading, Wget will refuse
|
wolffd@0
|
335 to start the download from scratch, which would
|
wolffd@0
|
336 effectively ruin existing contents. If you really
|
wolffd@0
|
337 want the download to start from scratch, remove the
|
wolffd@0
|
338 file.
|
wolffd@0
|
339
|
wolffd@0
|
340 Also beginning with Wget 1.7, if you use -c on a
|
wolffd@0
|
341 file which is of equal size as the one on the
|
wolffd@0
|
342 server, Wget will refuse to download the file and
|
wolffd@0
|
343 print an explanatory message. The same happens when
|
wolffd@0
|
344 the file is smaller on the server than locally (pre-
|
wolffd@0
|
345 sumably because it was changed on the server since
|
wolffd@0
|
346 your last download attempt)---because "continuing"
|
wolffd@0
|
347 is not meaningful, no download occurs.
|
wolffd@0
|
348
|
wolffd@0
|
349 On the other side of the coin, while using -c, any
|
wolffd@0
|
350 file that's bigger on the server than locally will
|
wolffd@0
|
351 be considered an incomplete download and only
|
wolffd@0
|
352 "(length(remote) - length(local))" bytes will be
|
wolffd@0
|
353 downloaded and tacked onto the end of the local
|
wolffd@0
|
354 file. This behavior can be desirable in certain
|
wolffd@0
|
355 cases---for instance, you can use wget -c to down-
|
wolffd@0
|
356 load just the new portion that's been appended to a
|
wolffd@0
|
357 data collection or log file.
|
wolffd@0
|
358
|
wolffd@0
|
359 However, if the file is bigger on the server because
|
wolffd@0
|
360 it's been changed, as opposed to just appended to,
|
wolffd@0
|
361 you'll end up with a garbled file. Wget has no way
|
wolffd@0
|
362 of verifying that the local file is really a valid
|
wolffd@0
|
363 prefix of the remote file. You need to be espe-
|
wolffd@0
|
364 cially careful of this when using -c in conjunction
|
wolffd@0
|
365 with -r, since every file will be considered as an
|
wolffd@0
|
366 "incomplete download" candidate.
|
wolffd@0
|
367
|
wolffd@0
|
368 Another instance where you'll get a garbled file if
|
wolffd@0
|
369 you try to use -c is if you have a lame HTTP proxy
|
wolffd@0
|
370 that inserts a "transfer interrupted" string into
|
wolffd@0
|
371 the local file. In the future a "rollback" option
|
wolffd@0
|
372 may be added to deal with this case.
|
wolffd@0
|
373
|
wolffd@0
|
374 Note that -c only works with FTP servers and with
|
wolffd@0
|
375 HTTP servers that support the "Range" header.
|
wolffd@0
|
376
|
wolffd@0
|
377 --progress=type
|
wolffd@0
|
378 Select the type of the progress indicator you wish
|
wolffd@0
|
379 to use. Legal indicators are "dot" and "bar".
|
wolffd@0
|
380
|
wolffd@0
|
381 The "bar" indicator is used by default. It draws an
|
wolffd@0
|
382 ASCII progress bar graphics (a.k.a "thermometer"
|
wolffd@0
|
383 display) indicating the status of retrieval. If the
|
wolffd@0
|
384 output is not a TTY, the "dot" bar will be used by
|
wolffd@0
|
385 default.
|
wolffd@0
|
386
|
wolffd@0
|
387 Use --progress=dot to switch to the "dot" display.
|
wolffd@0
|
388 It traces the retrieval by printing dots on the
|
wolffd@0
|
389 screen, each dot representing a fixed amount of
|
wolffd@0
|
390 downloaded data.
|
wolffd@0
|
391
|
wolffd@0
|
392 When using the dotted retrieval, you may also set
|
wolffd@0
|
393 the style by specifying the type as dot:style. Dif-
|
wolffd@0
|
394 ferent styles assign different meaning to one dot.
|
wolffd@0
|
395 With the "default" style each dot represents 1K,
|
wolffd@0
|
396 there are ten dots in a cluster and 50 dots in a
|
wolffd@0
|
397 line. The "binary" style has a more "computer"-like
|
wolffd@0
|
398 orientation---8K dots, 16-dots clusters and 48 dots
|
wolffd@0
|
399 per line (which makes for 384K lines). The "mega"
|
wolffd@0
|
400 style is suitable for downloading very large
|
wolffd@0
|
401 files---each dot represents 64K retrieved, there are
|
wolffd@0
|
402 eight dots in a cluster, and 48 dots on each line
|
wolffd@0
|
403 (so each line contains 3M).
|
wolffd@0
|
404
|
wolffd@0
|
405 Note that you can set the default style using the
|
wolffd@0
|
406 "progress" command in .wgetrc. That setting may be
|
wolffd@0
|
407 overridden from the command line. The exception is
|
wolffd@0
|
408 that, when the output is not a TTY, the "dot"
|
wolffd@0
|
409 progress will be favored over "bar". To force the
|
wolffd@0
|
410 bar output, use --progress=bar:force.
|
wolffd@0
|
411
|
wolffd@0
|
412 -N
|
wolffd@0
|
413 --timestamping
|
wolffd@0
|
414 Turn on time-stamping.
|
wolffd@0
|
415
|
wolffd@0
|
416 -S
|
wolffd@0
|
417 --server-response
|
wolffd@0
|
418 Print the headers sent by HTTP servers and responses
|
wolffd@0
|
419 sent by FTP servers.
|
wolffd@0
|
420
|
wolffd@0
|
421 --spider
|
wolffd@0
|
422 When invoked with this option, Wget will behave as a
|
wolffd@0
|
423 Web spider, which means that it will not download
|
wolffd@0
|
424 the pages, just check that they are there. For
|
wolffd@0
|
425 example, you can use Wget to check your bookmarks:
|
wolffd@0
|
426
|
wolffd@0
|
427 wget --spider --force-html -i bookmarks.html
|
wolffd@0
|
428
|
wolffd@0
|
429 This feature needs much more work for Wget to get
|
wolffd@0
|
430 close to the functionality of real web spiders.
|
wolffd@0
|
431
|
wolffd@0
|
432 -T seconds
|
wolffd@0
|
433 --timeout=seconds
|
wolffd@0
|
434 Set the network timeout to seconds seconds. This is
|
wolffd@0
|
435 equivalent to specifying --dns-timeout, --con-
|
wolffd@0
|
436 nect-timeout, and --read-timeout, all at the same
|
wolffd@0
|
437 time.
|
wolffd@0
|
438
|
wolffd@0
|
439 When interacting with the network, Wget can check
|
wolffd@0
|
440 for timeout and abort the operation if it takes too
|
wolffd@0
|
441 long. This prevents anomalies like hanging reads
|
wolffd@0
|
442 and infinite connects. The only timeout enabled by
|
wolffd@0
|
443 default is a 900-second read timeout. Setting a
|
wolffd@0
|
444 timeout to 0 disables it altogether. Unless you
|
wolffd@0
|
445 know what you are doing, it is best not to change
|
wolffd@0
|
446 the default timeout settings.
|
wolffd@0
|
447
|
wolffd@0
|
448 All timeout-related options accept decimal values,
|
wolffd@0
|
449 as well as subsecond values. For example, 0.1 sec-
|
wolffd@0
|
450 onds is a legal (though unwise) choice of timeout.
|
wolffd@0
|
451 Subsecond timeouts are useful for checking server
|
wolffd@0
|
452 response times or for testing network latency.
|
wolffd@0
|
453
|
wolffd@0
|
454 --dns-timeout=seconds
|
wolffd@0
|
455 Set the DNS lookup timeout to seconds seconds. DNS
|
wolffd@0
|
456 lookups that don't complete within the specified
|
wolffd@0
|
457 time will fail. By default, there is no timeout on
|
wolffd@0
|
458 DNS lookups, other than that implemented by system
|
wolffd@0
|
459 libraries.
|
wolffd@0
|
460
|
wolffd@0
|
461 --connect-timeout=seconds
|
wolffd@0
|
462 Set the connect timeout to seconds seconds. TCP
|
wolffd@0
|
463 connections that take longer to establish will be
|
wolffd@0
|
464 aborted. By default, there is no connect timeout,
|
wolffd@0
|
465 other than that implemented by system libraries.
|
wolffd@0
|
466
|
wolffd@0
|
467 --read-timeout=seconds
|
wolffd@0
|
468 Set the read (and write) timeout to seconds seconds.
|
wolffd@0
|
469 The "time" of this timeout refers to idle time: if,
|
wolffd@0
|
470 at any point in the download, no data is received
|
wolffd@0
|
471 for more than the specified number of seconds, read-
|
wolffd@0
|
472 ing fails and the download is restarted. This
|
wolffd@0
|
473 option does not directly affect the duration of the
|
wolffd@0
|
474 entire download.
|
wolffd@0
|
475
|
wolffd@0
|
476 Of course, the remote server may choose to terminate
|
wolffd@0
|
477 the connection sooner than this option requires.
|
wolffd@0
|
478 The default read timeout is 900 seconds.
|
wolffd@0
|
479
|
wolffd@0
|
480 --limit-rate=amount
|
wolffd@0
|
481 Limit the download speed to amount bytes per second.
|
wolffd@0
|
482 Amount may be expressed in bytes, kilobytes with the
|
wolffd@0
|
483 k suffix, or megabytes with the m suffix. For exam-
|
wolffd@0
|
484 ple, --limit-rate=20k will limit the retrieval rate
|
wolffd@0
|
485 to 20KB/s. This is useful when, for whatever rea-
|
wolffd@0
|
486 son, you don't want Wget to consume the entire
|
wolffd@0
|
487 available bandwidth.
|
wolffd@0
|
488
|
wolffd@0
|
489 This option allows the use of decimal numbers, usu-
|
wolffd@0
|
490 ally in conjunction with power suffixes; for exam-
|
wolffd@0
|
491 ple, --limit-rate=2.5k is a legal value.
|
wolffd@0
|
492
|
wolffd@0
|
493 Note that Wget implements the limiting by sleeping
|
wolffd@0
|
494 the appropriate amount of time after a network read
|
wolffd@0
|
495 that took less time than specified by the rate.
|
wolffd@0
|
496 Eventually this strategy causes the TCP transfer to
|
wolffd@0
|
497 slow down to approximately the specified rate. How-
|
wolffd@0
|
498 ever, it may take some time for this balance to be
|
wolffd@0
|
499 achieved, so don't be surprised if limiting the rate
|
wolffd@0
|
500 doesn't work well with very small files.
|
wolffd@0
|
501
|
wolffd@0
|
502 -w seconds
|
wolffd@0
|
503 --wait=seconds
|
wolffd@0
|
504 Wait the specified number of seconds between the
|
wolffd@0
|
505 retrievals. Use of this option is recommended, as
|
wolffd@0
|
506 it lightens the server load by making the requests
|
wolffd@0
|
507 less frequent. Instead of in seconds, the time can
|
wolffd@0
|
508 be specified in minutes using the "m" suffix, in
|
wolffd@0
|
509 hours using "h" suffix, or in days using "d" suffix.
|
wolffd@0
|
510
|
wolffd@0
|
511 Specifying a large value for this option is useful
|
wolffd@0
|
512 if the network or the destination host is down, so
|
wolffd@0
|
513 that Wget can wait long enough to reasonably expect
|
wolffd@0
|
514 the network error to be fixed before the retry. The
|
wolffd@0
|
515 waiting interval specified by this function is
|
wolffd@0
|
516 influenced by "--random-wait", which see.
|
wolffd@0
|
517
|
wolffd@0
|
518 --waitretry=seconds
|
wolffd@0
|
519 If you don't want Wget to wait between every
|
wolffd@0
|
520 retrieval, but only between retries of failed down-
|
wolffd@0
|
521 loads, you can use this option. Wget will use lin-
|
wolffd@0
|
522 ear backoff, waiting 1 second after the first fail-
|
wolffd@0
|
523 ure on a given file, then waiting 2 seconds after
|
wolffd@0
|
524 the second failure on that file, up to the maximum
|
wolffd@0
|
525 number of seconds you specify. Therefore, a value
|
wolffd@0
|
526 of 10 will actually make Wget wait up to (1 + 2 +
|
wolffd@0
|
527 ... + 10) = 55 seconds per file.
|
wolffd@0
|
528
|
wolffd@0
|
529 Note that this option is turned on by default in the
|
wolffd@0
|
530 global wgetrc file.
|
wolffd@0
|
531
|
wolffd@0
|
532 --random-wait
|
wolffd@0
|
533 Some web sites may perform log analysis to identify
|
wolffd@0
|
534 retrieval programs such as Wget by looking for sta-
|
wolffd@0
|
535 tistically significant similarities in the time
|
wolffd@0
|
536 between requests. This option causes the time
|
wolffd@0
|
537 between requests to vary between 0.5 and 1.5 * wait
|
wolffd@0
|
538 seconds, where wait was specified using the --wait
|
wolffd@0
|
539 option, in order to mask Wget's presence from such
|
wolffd@0
|
540 analysis.
|
wolffd@0
|
541
|
wolffd@0
|
542 A 2001 article in a publication devoted to develop-
|
wolffd@0
|
543 ment on a popular consumer platform provided code to
|
wolffd@0
|
544 perform this analysis on the fly. Its author sug-
|
wolffd@0
|
545 gested blocking at the class C address level to
|
wolffd@0
|
546 ensure automated retrieval programs were blocked
|
wolffd@0
|
547 despite changing DHCP-supplied addresses.
|
wolffd@0
|
548
|
wolffd@0
|
549 The --random-wait option was inspired by this ill-
|
wolffd@0
|
550 advised recommendation to block many unrelated users
|
wolffd@0
|
551 from a web site due to the actions of one.
|
wolffd@0
|
552
|
wolffd@0
|
553 --no-proxy
|
wolffd@0
|
554 Don't use proxies, even if the appropriate *_proxy
|
wolffd@0
|
555 environment variable is defined.
|
wolffd@0
|
556
|
wolffd@0
|
557 -Q quota
|
wolffd@0
|
558 --quota=quota
|
wolffd@0
|
559 Specify download quota for automatic retrievals.
|
wolffd@0
|
560 The value can be specified in bytes (default), kilo-
|
wolffd@0
|
561 bytes (with k suffix), or megabytes (with m suffix).
|
wolffd@0
|
562
|
wolffd@0
|
563 Note that quota will never affect downloading a sin-
|
wolffd@0
|
564 gle file. So if you specify wget -Q10k
|
wolffd@0
|
565 ftp://wuarchive.wustl.edu/ls-lR.gz, all of the
|
wolffd@0
|
566 ls-lR.gz will be downloaded. The same goes even
|
wolffd@0
|
567 when several URLs are specified on the command-line.
|
wolffd@0
|
568 However, quota is respected when retrieving either
|
wolffd@0
|
569 recursively, or from an input file. Thus you may
|
wolffd@0
|
570 safely type wget -Q2m -i sites---download will be
|
wolffd@0
|
571 aborted when the quota is exceeded.
|
wolffd@0
|
572
|
wolffd@0
|
573 Setting quota to 0 or to inf unlimits the download
|
wolffd@0
|
574 quota.
|
wolffd@0
|
575
|
wolffd@0
|
576 --no-dns-cache
|
wolffd@0
|
577 Turn off caching of DNS lookups. Normally, Wget
|
wolffd@0
|
578 remembers the IP addresses it looked up from DNS so
|
wolffd@0
|
579 it doesn't have to repeatedly contact the DNS server
|
wolffd@0
|
580 for the same (typically small) set of hosts it
|
wolffd@0
|
581 retrieves from. This cache exists in memory only; a
|
wolffd@0
|
582 new Wget run will contact DNS again.
|
wolffd@0
|
583
|
wolffd@0
|
584 However, it has been reported that in some situa-
|
wolffd@0
|
585 tions it is not desirable to cache host names, even
|
wolffd@0
|
586 for the duration of a short-running application like
|
wolffd@0
|
587 Wget. With this option Wget issues a new DNS lookup
|
wolffd@0
|
588 (more precisely, a new call to "gethostbyname" or
|
wolffd@0
|
589 "getaddrinfo") each time it makes a new connection.
|
wolffd@0
|
590 Please note that this option will not affect caching
|
wolffd@0
|
591 that might be performed by the resolving library or
|
wolffd@0
|
592 by an external caching layer, such as NSCD.
|
wolffd@0
|
593
|
wolffd@0
|
594 If you don't understand exactly what this option
|
wolffd@0
|
595 does, you probably won't need it.
|
wolffd@0
|
596
|
wolffd@0
|
597 --restrict-file-names=mode
|
wolffd@0
|
598 Change which characters found in remote URLs may
|
wolffd@0
|
599 show up in local file names generated from those
|
wolffd@0
|
600 URLs. Characters that are restricted by this option
|
wolffd@0
|
601 are escaped, i.e. replaced with %HH, where HH is the
|
wolffd@0
|
602 hexadecimal number that corresponds to the
|
wolffd@0
|
603 restricted character.
|
wolffd@0
|
604
|
wolffd@0
|
605 By default, Wget escapes the characters that are not
|
wolffd@0
|
606 valid as part of file names on your operating sys-
|
wolffd@0
|
607 tem, as well as control characters that are typi-
|
wolffd@0
|
608 cally unprintable. This option is useful for chang-
|
wolffd@0
|
609 ing these defaults, either because you are download-
|
wolffd@0
|
610 ing to a non-native partition, or because you want
|
wolffd@0
|
611 to disable escaping of the control characters.
|
wolffd@0
|
612
|
wolffd@0
|
613 When mode is set to "unix", Wget escapes the charac-
|
wolffd@0
|
614 ter / and the control characters in the ranges 0--31
|
wolffd@0
|
615 and 128--159. This is the default on Unix-like
|
wolffd@0
|
616 OS'es.
|
wolffd@0
|
617
|
wolffd@0
|
618 When mode is set to "windows", Wget escapes the
|
wolffd@0
|
619 characters \, |, /, :, ?, ", *, <, >, and the con-
|
wolffd@0
|
620 trol characters in the ranges 0--31 and 128--159.
|
wolffd@0
|
621 In addition to this, Wget in Windows mode uses +
|
wolffd@0
|
622 instead of : to separate host and port in local file
|
wolffd@0
|
623 names, and uses @ instead of ? to separate the query
|
wolffd@0
|
624 portion of the file name from the rest. Therefore,
|
wolffd@0
|
625 a URL that would be saved as
|
wolffd@0
|
626 www.xemacs.org:4300/search.pl?input=blah in Unix
|
wolffd@0
|
627 mode would be saved as
|
wolffd@0
|
628 www.xemacs.org+4300/search.pl@input=blah in Windows
|
wolffd@0
|
629 mode. This mode is the default on Windows.
|
wolffd@0
|
630
|
wolffd@0
|
631 If you append ,nocontrol to the mode, as in
|
wolffd@0
|
632 unix,nocontrol, escaping of the control characters
|
wolffd@0
|
633 is also switched off. You can use
|
wolffd@0
|
634 --restrict-file-names=nocontrol to turn off escaping
|
wolffd@0
|
635 of control characters without affecting the choice
|
wolffd@0
|
636 of the OS to use as file name restriction mode.
|
wolffd@0
|
637
|
wolffd@0
|
638 -4
|
wolffd@0
|
639 --inet4-only
|
wolffd@0
|
640 -6
|
wolffd@0
|
641 --inet6-only
|
wolffd@0
|
642 Force connecting to IPv4 or IPv6 addresses. With
|
wolffd@0
|
643 --inet4-only or -4, Wget will only connect to IPv4
|
wolffd@0
|
644 hosts, ignoring AAAA records in DNS, and refusing to
|
wolffd@0
|
645 connect to IPv6 addresses specified in URLs. Con-
|
wolffd@0
|
646 versely, with --inet6-only or -6, Wget will only
|
wolffd@0
|
647 connect to IPv6 hosts and ignore A records and IPv4
|
wolffd@0
|
648 addresses.
|
wolffd@0
|
649
|
wolffd@0
|
650 Neither options should be needed normally. By
|
wolffd@0
|
651 default, an IPv6-aware Wget will use the address
|
wolffd@0
|
652 family specified by the host's DNS record. If the
|
wolffd@0
|
653 DNS responds with both IPv4 and IPv6 addresses, Wget
|
wolffd@0
|
654 will try them in sequence until it finds one it can
|
wolffd@0
|
655 connect to. (Also see "--prefer-family" option
|
wolffd@0
|
656 described below.)
|
wolffd@0
|
657
|
wolffd@0
|
658 These options can be used to deliberately force the
|
wolffd@0
|
659 use of IPv4 or IPv6 address families on dual family
|
wolffd@0
|
660 systems, usually to aid debugging or to deal with
|
wolffd@0
|
661 broken network configuration. Only one of
|
wolffd@0
|
662 --inet6-only and --inet4-only may be specified at
|
wolffd@0
|
663 the same time. Neither option is available in Wget
|
wolffd@0
|
664 compiled without IPv6 support.
|
wolffd@0
|
665
|
wolffd@0
|
666 --prefer-family=IPv4/IPv6/none
|
wolffd@0
|
667 When given a choice of several addresses, connect to
|
wolffd@0
|
668 the addresses with specified address family first.
|
wolffd@0
|
669 IPv4 addresses are preferred by default.
|
wolffd@0
|
670
|
wolffd@0
|
671 This avoids spurious errors and connect attempts
|
wolffd@0
|
672 when accessing hosts that resolve to both IPv6 and
|
wolffd@0
|
673 IPv4 addresses from IPv4 networks. For example,
|
wolffd@0
|
674 www.kame.net resolves to
|
wolffd@0
|
675 2001:200:0:8002:203:47ff:fea5:3085 and to
|
wolffd@0
|
676 203.178.141.194. When the preferred family is
|
wolffd@0
|
677 "IPv4", the IPv4 address is used first; when the
|
wolffd@0
|
678 preferred family is "IPv6", the IPv6 address is used
|
wolffd@0
|
679 first; if the specified value is "none", the address
|
wolffd@0
|
680 order returned by DNS is used without change.
|
wolffd@0
|
681
|
wolffd@0
|
682 Unlike -4 and -6, this option doesn't inhibit access
|
wolffd@0
|
683 to any address family, it only changes the order in
|
wolffd@0
|
684 which the addresses are accessed. Also note that
|
wolffd@0
|
685 the reordering performed by this option is sta-
|
wolffd@0
|
686 ble---it doesn't affect order of addresses of the
|
wolffd@0
|
687 same family. That is, the relative order of all
|
wolffd@0
|
688 IPv4 addresses and of all IPv6 addresses remains
|
wolffd@0
|
689 intact in all cases.
|
wolffd@0
|
690
|
wolffd@0
|
691 --retry-connrefused
|
wolffd@0
|
692 Consider "connection refused" a transient error and
|
wolffd@0
|
693 try again. Normally Wget gives up on a URL when it
|
wolffd@0
|
694 is unable to connect to the site because failure to
|
wolffd@0
|
695 connect is taken as a sign that the server is not
|
wolffd@0
|
696 running at all and that retries would not help.
|
wolffd@0
|
697 This option is for mirroring unreliable sites whose
|
wolffd@0
|
698 servers tend to disappear for short periods of time.
|
wolffd@0
|
699
|
wolffd@0
|
700 --user=user
|
wolffd@0
|
701 --password=password
|
wolffd@0
|
702 Specify the username user and password password for
|
wolffd@0
|
703 both FTP and HTTP file retrieval. These parameters
|
wolffd@0
|
704 can be overridden using the --ftp-user and
|
wolffd@0
|
705 --ftp-password options for FTP connections and the
|
wolffd@0
|
706 --http-user and --http-password options for HTTP
|
wolffd@0
|
707 connections.
|
wolffd@0
|
708
|
wolffd@0
|
709 Directory Options
|
wolffd@0
|
710
|
wolffd@0
|
711
|
wolffd@0
|
712 -nd
|
wolffd@0
|
713 --no-directories
|
wolffd@0
|
714 Do not create a hierarchy of directories when
|
wolffd@0
|
715 retrieving recursively. With this option turned on,
|
wolffd@0
|
716 all files will get saved to the current directory,
|
wolffd@0
|
717 without clobbering (if a name shows up more than
|
wolffd@0
|
718 once, the filenames will get extensions .n).
|
wolffd@0
|
719
|
wolffd@0
|
720 -x
|
wolffd@0
|
721 --force-directories
|
wolffd@0
|
722 The opposite of -nd---create a hierarchy of directo-
|
wolffd@0
|
723 ries, even if one would not have been created other-
|
wolffd@0
|
724 wise. E.g. wget -x http://fly.srk.fer.hr/robots.txt
|
wolffd@0
|
725 will save the downloaded file to fly.srk.fer.hr/ro-
|
wolffd@0
|
726 bots.txt.
|
wolffd@0
|
727
|
wolffd@0
|
728 -nH
|
wolffd@0
|
729 --no-host-directories
|
wolffd@0
|
730 Disable generation of host-prefixed directories. By
|
wolffd@0
|
731 default, invoking Wget with -r
|
wolffd@0
|
732 http://fly.srk.fer.hr/ will create a structure of
|
wolffd@0
|
733 directories beginning with fly.srk.fer.hr/. This
|
wolffd@0
|
734 option disables such behavior.
|
wolffd@0
|
735
|
wolffd@0
|
736 --protocol-directories
|
wolffd@0
|
737 Use the protocol name as a directory component of
|
wolffd@0
|
738 local file names. For example, with this option,
|
wolffd@0
|
739 wget -r http://host will save to http/host/...
|
wolffd@0
|
740 rather than just to host/....
|
wolffd@0
|
741
|
wolffd@0
|
742 --cut-dirs=number
|
wolffd@0
|
743 Ignore number directory components. This is useful
|
wolffd@0
|
744 for getting a fine-grained control over the direc-
|
wolffd@0
|
745 tory where recursive retrieval will be saved.
|
wolffd@0
|
746
|
wolffd@0
|
747 Take, for example, the directory at
|
wolffd@0
|
748 ftp://ftp.xemacs.org/pub/xemacs/. If you retrieve
|
wolffd@0
|
749 it with -r, it will be saved locally under
|
wolffd@0
|
750 ftp.xemacs.org/pub/xemacs/. While the -nH option
|
wolffd@0
|
751 can remove the ftp.xemacs.org/ part, you are still
|
wolffd@0
|
752 stuck with pub/xemacs. This is where --cut-dirs
|
wolffd@0
|
753 comes in handy; it makes Wget not "see" number
|
wolffd@0
|
754 remote directory components. Here are several exam-
|
wolffd@0
|
755 ples of how --cut-dirs option works.
|
wolffd@0
|
756
|
wolffd@0
|
757 No options -> ftp.xemacs.org/pub/xemacs/
|
wolffd@0
|
758 -nH -> pub/xemacs/
|
wolffd@0
|
759 -nH --cut-dirs=1 -> xemacs/
|
wolffd@0
|
760 -nH --cut-dirs=2 -> .
|
wolffd@0
|
761
|
wolffd@0
|
762 --cut-dirs=1 -> ftp.xemacs.org/xemacs/
|
wolffd@0
|
763 ...
|
wolffd@0
|
764
|
wolffd@0
|
765 If you just want to get rid of the directory struc-
|
wolffd@0
|
766 ture, this option is similar to a combination of -nd
|
wolffd@0
|
767 and -P. However, unlike -nd, --cut-dirs does not
|
wolffd@0
|
768 lose with subdirectories---for instance, with -nH
|
wolffd@0
|
769 --cut-dirs=1, a beta/ subdirectory will be placed to
|
wolffd@0
|
770 xemacs/beta, as one would expect.
|
wolffd@0
|
771
|
wolffd@0
|
772 -P prefix
|
wolffd@0
|
773 --directory-prefix=prefix
|
wolffd@0
|
774 Set directory prefix to prefix. The directory pre-
|
wolffd@0
|
775 fix is the directory where all other files and sub-
|
wolffd@0
|
776 directories will be saved to, i.e. the top of the
|
wolffd@0
|
777 retrieval tree. The default is . (the current
|
wolffd@0
|
778 directory).
|
wolffd@0
|
779
|
wolffd@0
|
780 HTTP Options
|
wolffd@0
|
781
|
wolffd@0
|
782
|
wolffd@0
|
783 -E
|
wolffd@0
|
784 --html-extension
|
wolffd@0
|
785 If a file of type application/xhtml+xml or text/html
|
wolffd@0
|
786 is downloaded and the URL does not end with the reg-
|
wolffd@0
|
787 exp \.[Hh][Tt][Mm][Ll]?, this option will cause the
|
wolffd@0
|
788 suffix .html to be appended to the local filename.
|
wolffd@0
|
789 This is useful, for instance, when you're mirroring
|
wolffd@0
|
790 a remote site that uses .asp pages, but you want the
|
wolffd@0
|
791 mirrored pages to be viewable on your stock Apache
|
wolffd@0
|
792 server. Another good use for this is when you're
|
wolffd@0
|
793 downloading CGI-generated materials. A URL like
|
wolffd@0
|
794 http://site.com/article.cgi?25 will be saved as
|
wolffd@0
|
795 article.cgi?25.html.
|
wolffd@0
|
796
|
wolffd@0
|
797 Note that filenames changed in this way will be re-
|
wolffd@0
|
798 downloaded every time you re-mirror a site, because
|
wolffd@0
|
799 Wget can't tell that the local X.html file corre-
|
wolffd@0
|
800 sponds to remote URL X (since it doesn't yet know
|
wolffd@0
|
801 that the URL produces output of type text/html or
|
wolffd@0
|
802 application/xhtml+xml. To prevent this re-download-
|
wolffd@0
|
803 ing, you must use -k and -K so that the original
|
wolffd@0
|
804 version of the file will be saved as X.orig.
|
wolffd@0
|
805
|
wolffd@0
|
806 --http-user=user
|
wolffd@0
|
807 --http-password=password
|
wolffd@0
|
808 Specify the username user and password password on
|
wolffd@0
|
809 an HTTP server. According to the type of the chal-
|
wolffd@0
|
810 lenge, Wget will encode them using either the
|
wolffd@0
|
811 "basic" (insecure), the "digest", or the Windows
|
wolffd@0
|
812 "NTLM" authentication scheme.
|
wolffd@0
|
813
|
wolffd@0
|
814 Another way to specify username and password is in
|
wolffd@0
|
815 the URL itself. Either method reveals your password
|
wolffd@0
|
816 to anyone who bothers to run "ps". To prevent the
|
wolffd@0
|
817 passwords from being seen, store them in .wgetrc or
|
wolffd@0
|
818 .netrc, and make sure to protect those files from
|
wolffd@0
|
819 other users with "chmod". If the passwords are
|
wolffd@0
|
820 really important, do not leave them lying in those
|
wolffd@0
|
821 files either---edit the files and delete them after
|
wolffd@0
|
822 Wget has started the download.
|
wolffd@0
|
823
|
wolffd@0
|
824 --no-cache
|
wolffd@0
|
825 Disable server-side cache. In this case, Wget will
|
wolffd@0
|
826 send the remote server an appropriate directive
|
wolffd@0
|
827 (Pragma: no-cache) to get the file from the remote
|
wolffd@0
|
828 service, rather than returning the cached version.
|
wolffd@0
|
829 This is especially useful for retrieving and flush-
|
wolffd@0
|
830 ing out-of-date documents on proxy servers.
|
wolffd@0
|
831
|
wolffd@0
|
832 Caching is allowed by default.
|
wolffd@0
|
833
|
wolffd@0
|
834 --no-cookies
|
wolffd@0
|
835 Disable the use of cookies. Cookies are a mechanism
|
wolffd@0
|
836 for maintaining server-side state. The server sends
|
wolffd@0
|
837 the client a cookie using the "Set-Cookie" header,
|
wolffd@0
|
838 and the client responds with the same cookie upon
|
wolffd@0
|
839 further requests. Since cookies allow the server
|
wolffd@0
|
840 owners to keep track of visitors and for sites to
|
wolffd@0
|
841 exchange this information, some consider them a
|
wolffd@0
|
842 breach of privacy. The default is to use cookies;
|
wolffd@0
|
843 however, storing cookies is not on by default.
|
wolffd@0
|
844
|
wolffd@0
|
845 --load-cookies file
|
wolffd@0
|
846 Load cookies from file before the first HTTP
|
wolffd@0
|
847 retrieval. file is a textual file in the format
|
wolffd@0
|
848 originally used by Netscape's cookies.txt file.
|
wolffd@0
|
849
|
wolffd@0
|
850 You will typically use this option when mirroring
|
wolffd@0
|
851 sites that require that you be logged in to access
|
wolffd@0
|
852 some or all of their content. The login process
|
wolffd@0
|
853 typically works by the web server issuing an HTTP
|
wolffd@0
|
854 cookie upon receiving and verifying your creden-
|
wolffd@0
|
855 tials. The cookie is then resent by the browser
|
wolffd@0
|
856 when accessing that part of the site, and so proves
|
wolffd@0
|
857 your identity.
|
wolffd@0
|
858
|
wolffd@0
|
859 Mirroring such a site requires Wget to send the same
|
wolffd@0
|
860 cookies your browser sends when communicating with
|
wolffd@0
|
861 the site. This is achieved by --load-cookies---sim-
|
wolffd@0
|
862 ply point Wget to the location of the cookies.txt
|
wolffd@0
|
863 file, and it will send the same cookies your browser
|
wolffd@0
|
864 would send in the same situation. Different
|
wolffd@0
|
865 browsers keep textual cookie files in different
|
wolffd@0
|
866 locations:
|
wolffd@0
|
867
|
wolffd@0
|
868 @asis<Netscape 4.x.>
|
wolffd@0
|
869 The cookies are in ~/.netscape/cookies.txt.
|
wolffd@0
|
870
|
wolffd@0
|
871 @asis<Mozilla and Netscape 6.x.>
|
wolffd@0
|
872 Mozilla's cookie file is also named cookies.txt,
|
wolffd@0
|
873 located somewhere under ~/.mozilla, in the
|
wolffd@0
|
874 directory of your profile. The full path usu-
|
wolffd@0
|
875 ally ends up looking somewhat like
|
wolffd@0
|
876 ~/.mozilla/default/some-weird-string/cook-
|
wolffd@0
|
877 ies.txt.
|
wolffd@0
|
878
|
wolffd@0
|
879 @asis<Internet Explorer.>
|
wolffd@0
|
880 You can produce a cookie file Wget can use by
|
wolffd@0
|
881 using the File menu, Import and Export, Export
|
wolffd@0
|
882 Cookies. This has been tested with Internet
|
wolffd@0
|
883 Explorer 5; it is not guaranteed to work with
|
wolffd@0
|
884 earlier versions.
|
wolffd@0
|
885
|
wolffd@0
|
886 @asis<Other browsers.>
|
wolffd@0
|
887 If you are using a different browser to create
|
wolffd@0
|
888 your cookies, --load-cookies will only work if
|
wolffd@0
|
889 you can locate or produce a cookie file in the
|
wolffd@0
|
890 Netscape format that Wget expects.
|
wolffd@0
|
891
|
wolffd@0
|
892 If you cannot use --load-cookies, there might still
|
wolffd@0
|
893 be an alternative. If your browser supports a
|
wolffd@0
|
894 "cookie manager", you can use it to view the cookies
|
wolffd@0
|
895 used when accessing the site you're mirroring.
|
wolffd@0
|
896 Write down the name and value of the cookie, and
|
wolffd@0
|
897 manually instruct Wget to send those cookies,
|
wolffd@0
|
898 bypassing the "official" cookie support:
|
wolffd@0
|
899
|
wolffd@0
|
900 wget --no-cookies --header "Cookie: <name>=<value>"
|
wolffd@0
|
901
|
wolffd@0
|
902 --save-cookies file
|
wolffd@0
|
903 Save cookies to file before exiting. This will not
|
wolffd@0
|
904 save cookies that have expired or that have no
|
wolffd@0
|
905 expiry time (so-called "session cookies"), but also
|
wolffd@0
|
906 see --keep-session-cookies.
|
wolffd@0
|
907
|
wolffd@0
|
908 --keep-session-cookies
|
wolffd@0
|
909 When specified, causes --save-cookies to also save
|
wolffd@0
|
910 session cookies. Session cookies are normally not
|
wolffd@0
|
911 saved because they are meant to be kept in memory
|
wolffd@0
|
912 and forgotten when you exit the browser. Saving
|
wolffd@0
|
913 them is useful on sites that require you to log in
|
wolffd@0
|
914 or to visit the home page before you can access some
|
wolffd@0
|
915 pages. With this option, multiple Wget runs are
|
wolffd@0
|
916 considered a single browser session as far as the
|
wolffd@0
|
917 site is concerned.
|
wolffd@0
|
918
|
wolffd@0
|
919 Since the cookie file format does not normally carry
|
wolffd@0
|
920 session cookies, Wget marks them with an expiry
|
wolffd@0
|
921 timestamp of 0. Wget's --load-cookies recognizes
|
wolffd@0
|
922 those as session cookies, but it might confuse other
|
wolffd@0
|
923 browsers. Also note that cookies so loaded will be
|
wolffd@0
|
924 treated as other session cookies, which means that
|
wolffd@0
|
925 if you want --save-cookies to preserve them again,
|
wolffd@0
|
926 you must use --keep-session-cookies again.
|
wolffd@0
|
927
|
wolffd@0
|
928 --ignore-length
|
wolffd@0
|
929 Unfortunately, some HTTP servers (CGI programs, to
|
wolffd@0
|
930 be more precise) send out bogus "Content-Length"
|
wolffd@0
|
931 headers, which makes Wget go wild, as it thinks not
|
wolffd@0
|
932 all the document was retrieved. You can spot this
|
wolffd@0
|
933 syndrome if Wget retries getting the same document
|
wolffd@0
|
934 again and again, each time claiming that the (other-
|
wolffd@0
|
935 wise normal) connection has closed on the very same
|
wolffd@0
|
936 byte.
|
wolffd@0
|
937
|
wolffd@0
|
938 With this option, Wget will ignore the "Con-
|
wolffd@0
|
939 tent-Length" header---as if it never existed.
|
wolffd@0
|
940
|
wolffd@0
|
941 --header=header-line
|
wolffd@0
|
942 Send header-line along with the rest of the headers
|
wolffd@0
|
943 in each HTTP request. The supplied header is sent
|
wolffd@0
|
944 as-is, which means it must contain name and value
|
wolffd@0
|
945 separated by colon, and must not contain newlines.
|
wolffd@0
|
946
|
wolffd@0
|
947 You may define more than one additional header by
|
wolffd@0
|
948 specifying --header more than once.
|
wolffd@0
|
949
|
wolffd@0
|
950 wget --header='Accept-Charset: iso-8859-2' \
|
wolffd@0
|
951 --header='Accept-Language: hr' \
|
wolffd@0
|
952 http://fly.srk.fer.hr/
|
wolffd@0
|
953
|
wolffd@0
|
954 Specification of an empty string as the header value
|
wolffd@0
|
955 will clear all previous user-defined headers.
|
wolffd@0
|
956
|
wolffd@0
|
957 As of Wget 1.10, this option can be used to override
|
wolffd@0
|
958 headers otherwise generated automatically. This
|
wolffd@0
|
959 example instructs Wget to connect to localhost, but
|
wolffd@0
|
960 to specify foo.bar in the "Host" header:
|
wolffd@0
|
961
|
wolffd@0
|
962 wget --header="Host: foo.bar" http://localhost/
|
wolffd@0
|
963
|
wolffd@0
|
964 In versions of Wget prior to 1.10 such use of
|
wolffd@0
|
965 --header caused sending of duplicate headers.
|
wolffd@0
|
966
|
wolffd@0
|
967 --max-redirect=number
|
wolffd@0
|
968 Specifies the maximum number of redirections to fol-
|
wolffd@0
|
969 low for a resource. The default is 20, which is
|
wolffd@0
|
970 usually far more than necessary. However, on those
|
wolffd@0
|
971 occasions where you want to allow more (or fewer),
|
wolffd@0
|
972 this is the option to use.
|
wolffd@0
|
973
|
wolffd@0
|
974 --proxy-user=user
|
wolffd@0
|
975 --proxy-password=password
|
wolffd@0
|
976 Specify the username user and password password for
|
wolffd@0
|
977 authentication on a proxy server. Wget will encode
|
wolffd@0
|
978 them using the "basic" authentication scheme.
|
wolffd@0
|
979
|
wolffd@0
|
980 Security considerations similar to those with
|
wolffd@0
|
981 --http-password pertain here as well.
|
wolffd@0
|
982
|
wolffd@0
|
983 --referer=url
|
wolffd@0
|
984 Include `Referer: url' header in HTTP request. Use-
|
wolffd@0
|
985 ful for retrieving documents with server-side pro-
|
wolffd@0
|
986 cessing that assume they are always being retrieved
|
wolffd@0
|
987 by interactive web browsers and only come out prop-
|
wolffd@0
|
988 erly when Referer is set to one of the pages that
|
wolffd@0
|
989 point to them.
|
wolffd@0
|
990
|
wolffd@0
|
991 --save-headers
|
wolffd@0
|
992 Save the headers sent by the HTTP server to the
|
wolffd@0
|
993 file, preceding the actual contents, with an empty
|
wolffd@0
|
994 line as the separator.
|
wolffd@0
|
995
|
wolffd@0
|
996 -U agent-string
|
wolffd@0
|
997 --user-agent=agent-string
|
wolffd@0
|
998 Identify as agent-string to the HTTP server.
|
wolffd@0
|
999
|
wolffd@0
|
1000 The HTTP protocol allows the clients to identify
|
wolffd@0
|
1001 themselves using a "User-Agent" header field. This
|
wolffd@0
|
1002 enables distinguishing the WWW software, usually for
|
wolffd@0
|
1003 statistical purposes or for tracing of protocol vio-
|
wolffd@0
|
1004 lations. Wget normally identifies as Wget/version,
|
wolffd@0
|
1005 version being the current version number of Wget.
|
wolffd@0
|
1006
|
wolffd@0
|
1007 However, some sites have been known to impose the
|
wolffd@0
|
1008 policy of tailoring the output according to the
|
wolffd@0
|
1009 "User-Agent"-supplied information. While this is
|
wolffd@0
|
1010 not such a bad idea in theory, it has been abused by
|
wolffd@0
|
1011 servers denying information to clients other than
|
wolffd@0
|
1012 (historically) Netscape or, more frequently,
|
wolffd@0
|
1013 Microsoft Internet Explorer. This option allows you
|
wolffd@0
|
1014 to change the "User-Agent" line issued by Wget. Use
|
wolffd@0
|
1015 of this option is discouraged, unless you really
|
wolffd@0
|
1016 know what you are doing.
|
wolffd@0
|
1017
|
wolffd@0
|
1018 Specifying empty user agent with --user-agent=""
|
wolffd@0
|
1019 instructs Wget not to send the "User-Agent" header
|
wolffd@0
|
1020 in HTTP requests.
|
wolffd@0
|
1021
|
wolffd@0
|
1022 --post-data=string
|
wolffd@0
|
1023 --post-file=file
|
wolffd@0
|
1024 Use POST as the method for all HTTP requests and
|
wolffd@0
|
1025 send the specified data in the request body.
|
wolffd@0
|
1026 "--post-data" sends string as data, whereas
|
wolffd@0
|
1027 "--post-file" sends the contents of file. Other
|
wolffd@0
|
1028 than that, they work in exactly the same way.
|
wolffd@0
|
1029
|
wolffd@0
|
1030 Please be aware that Wget needs to know the size of
|
wolffd@0
|
1031 the POST data in advance. Therefore the argument to
|
wolffd@0
|
1032 "--post-file" must be a regular file; specifying a
|
wolffd@0
|
1033 FIFO or something like /dev/stdin won't work. It's
|
wolffd@0
|
1034 not quite clear how to work around this limitation
|
wolffd@0
|
1035 inherent in HTTP/1.0. Although HTTP/1.1 introduces
|
wolffd@0
|
1036 chunked transfer that doesn't require knowing the
|
wolffd@0
|
1037 request length in advance, a client can't use chun-
|
wolffd@0
|
1038 ked unless it knows it's talking to an HTTP/1.1
|
wolffd@0
|
1039 server. And it can't know that until it receives a
|
wolffd@0
|
1040 response, which in turn requires the request to have
|
wolffd@0
|
1041 been completed -- a chicken-and-egg problem.
|
wolffd@0
|
1042
|
wolffd@0
|
1043 Note: if Wget is redirected after the POST request
|
wolffd@0
|
1044 is completed, it will not send the POST data to the
|
wolffd@0
|
1045 redirected URL. This is because URLs that process
|
wolffd@0
|
1046 POST often respond with a redirection to a regular
|
wolffd@0
|
1047 page, which does not desire or accept POST. It is
|
wolffd@0
|
1048 not completely clear that this behavior is optimal;
|
wolffd@0
|
1049 if it doesn't work out, it might be changed in the
|
wolffd@0
|
1050 future.
|
wolffd@0
|
1051
|
wolffd@0
|
1052 This example shows how to log to a server using POST
|
wolffd@0
|
1053 and then proceed to download the desired pages, pre-
|
wolffd@0
|
1054 sumably only accessible to authorized users:
|
wolffd@0
|
1055
|
wolffd@0
|
1056 # Log in to the server. This can be done only once.
|
wolffd@0
|
1057 wget --save-cookies cookies.txt \
|
wolffd@0
|
1058 --post-data 'user=foo&password=bar' \
|
wolffd@0
|
1059 http://server.com/auth.php
|
wolffd@0
|
1060
|
wolffd@0
|
1061 # Now grab the page or pages we care about.
|
wolffd@0
|
1062 wget --load-cookies cookies.txt \
|
wolffd@0
|
1063 -p http://server.com/interesting/article.php
|
wolffd@0
|
1064
|
wolffd@0
|
1065 If the server is using session cookies to track user
|
wolffd@0
|
1066 authentication, the above will not work because
|
wolffd@0
|
1067 --save-cookies will not save them (and neither will
|
wolffd@0
|
1068 browsers) and the cookies.txt file will be empty.
|
wolffd@0
|
1069 In that case use --keep-session-cookies along with
|
wolffd@0
|
1070 --save-cookies to force saving of session cookies.
|
wolffd@0
|
1071
|
wolffd@0
|
1072 --content-disposition
|
wolffd@0
|
1073 If this is set to on, experimental (not fully-func-
|
wolffd@0
|
1074 tional) support for "Content-Disposition" headers is
|
wolffd@0
|
1075 enabled. This can currently result in extra round-
|
wolffd@0
|
1076 trips to the server for a "HEAD" request, and is
|
wolffd@0
|
1077 known to suffer from a few bugs, which is why it is
|
wolffd@0
|
1078 not currently enabled by default.
|
wolffd@0
|
1079
|
wolffd@0
|
1080 This option is useful for some file-downloading CGI
|
wolffd@0
|
1081 programs that use "Content-Disposition" headers to
|
wolffd@0
|
1082 describe what the name of a downloaded file should
|
wolffd@0
|
1083 be.
|
wolffd@0
|
1084
|
wolffd@0
|
1085 --auth-no-challenge
|
wolffd@0
|
1086 If this option is given, Wget will send Basic HTTP
|
wolffd@0
|
1087 authentication information (plaintext username and
|
wolffd@0
|
1088 password) for all requests, just like Wget 1.10.2
|
wolffd@0
|
1089 and prior did by default.
|
wolffd@0
|
1090
|
wolffd@0
|
1091 Use of this option is not recommended, and is
|
wolffd@0
|
1092 intended only to support some few obscure servers,
|
wolffd@0
|
1093 which never send HTTP authentication challenges, but
|
wolffd@0
|
1094 accept unsolicited auth info, say, in addition to
|
wolffd@0
|
1095 form-based authentication.
|
wolffd@0
|
1096
|
wolffd@0
|
1097 HTTPS (SSL/TLS) Options
|
wolffd@0
|
1098
|
wolffd@0
|
1099 To support encrypted HTTP (HTTPS) downloads, Wget must
|
wolffd@0
|
1100 be compiled with an external SSL library, currently
|
wolffd@0
|
1101 OpenSSL. If Wget is compiled without SSL support, none
|
wolffd@0
|
1102 of these options are available.
|
wolffd@0
|
1103
|
wolffd@0
|
1104 --secure-protocol=protocol
|
wolffd@0
|
1105 Choose the secure protocol to be used. Legal values
|
wolffd@0
|
1106 are auto, SSLv2, SSLv3, and TLSv1. If auto is used,
|
wolffd@0
|
1107 the SSL library is given the liberty of choosing the
|
wolffd@0
|
1108 appropriate protocol automatically, which is
|
wolffd@0
|
1109 achieved by sending an SSLv2 greeting and announcing
|
wolffd@0
|
1110 support for SSLv3 and TLSv1. This is the default.
|
wolffd@0
|
1111
|
wolffd@0
|
1112 Specifying SSLv2, SSLv3, or TLSv1 forces the use of
|
wolffd@0
|
1113 the corresponding protocol. This is useful when
|
wolffd@0
|
1114 talking to old and buggy SSL server implementations
|
wolffd@0
|
1115 that make it hard for OpenSSL to choose the correct
|
wolffd@0
|
1116 protocol version. Fortunately, such servers are
|
wolffd@0
|
1117 quite rare.
|
wolffd@0
|
1118
|
wolffd@0
|
1119 --no-check-certificate
|
wolffd@0
|
1120 Don't check the server certificate against the
|
wolffd@0
|
1121 available certificate authorities. Also don't
|
wolffd@0
|
1122 require the URL host name to match the common name
|
wolffd@0
|
1123 presented by the certificate.
|
wolffd@0
|
1124
|
wolffd@0
|
1125 As of Wget 1.10, the default is to verify the
|
wolffd@0
|
1126 server's certificate against the recognized certifi-
|
wolffd@0
|
1127 cate authorities, breaking the SSL handshake and
|
wolffd@0
|
1128 aborting the download if the verification fails.
|
wolffd@0
|
1129 Although this provides more secure downloads, it
|
wolffd@0
|
1130 does break interoperability with some sites that
|
wolffd@0
|
1131 worked with previous Wget versions, particularly
|
wolffd@0
|
1132 those using self-signed, expired, or otherwise
|
wolffd@0
|
1133 invalid certificates. This option forces an "inse-
|
wolffd@0
|
1134 cure" mode of operation that turns the certificate
|
wolffd@0
|
1135 verification errors into warnings and allows you to
|
wolffd@0
|
1136 proceed.
|
wolffd@0
|
1137
|
wolffd@0
|
1138 If you encounter "certificate verification" errors
|
wolffd@0
|
1139 or ones saying that "common name doesn't match
|
wolffd@0
|
1140 requested host name", you can use this option to
|
wolffd@0
|
1141 bypass the verification and proceed with the down-
|
wolffd@0
|
1142 load. Only use this option if you are otherwise
|
wolffd@0
|
1143 convinced of the site's authenticity, or if you
|
wolffd@0
|
1144 really don't care about the validity of its certifi-
|
wolffd@0
|
1145 cate. It is almost always a bad idea not to check
|
wolffd@0
|
1146 the certificates when transmitting confidential or
|
wolffd@0
|
1147 important data.
|
wolffd@0
|
1148
|
wolffd@0
|
1149 --certificate=file
|
wolffd@0
|
1150 Use the client certificate stored in file. This is
|
wolffd@0
|
1151 needed for servers that are configured to require
|
wolffd@0
|
1152 certificates from the clients that connect to them.
|
wolffd@0
|
1153 Normally a certificate is not required and this
|
wolffd@0
|
1154 switch is optional.
|
wolffd@0
|
1155
|
wolffd@0
|
1156 --certificate-type=type
|
wolffd@0
|
1157 Specify the type of the client certificate. Legal
|
wolffd@0
|
1158 values are PEM (assumed by default) and DER, also
|
wolffd@0
|
1159 known as ASN1.
|
wolffd@0
|
1160
|
wolffd@0
|
1161 --private-key=file
|
wolffd@0
|
1162 Read the private key from file. This allows you to
|
wolffd@0
|
1163 provide the private key in a file separate from the
|
wolffd@0
|
1164 certificate.
|
wolffd@0
|
1165
|
wolffd@0
|
1166 --private-key-type=type
|
wolffd@0
|
1167 Specify the type of the private key. Accepted val-
|
wolffd@0
|
1168 ues are PEM (the default) and DER.
|
wolffd@0
|
1169
|
wolffd@0
|
1170 --ca-certificate=file
|
wolffd@0
|
1171 Use file as the file with the bundle of certificate
|
wolffd@0
|
1172 authorities ("CA") to verify the peers. The cer-
|
wolffd@0
|
1173 tificates must be in PEM format.
|
wolffd@0
|
1174
|
wolffd@0
|
1175 Without this option Wget looks for CA certificates
|
wolffd@0
|
1176 at the system-specified locations, chosen at OpenSSL
|
wolffd@0
|
1177 installation time.
|
wolffd@0
|
1178
|
wolffd@0
|
1179 --ca-directory=directory
|
wolffd@0
|
1180 Specifies directory containing CA certificates in
|
wolffd@0
|
1181 PEM format. Each file contains one CA certificate,
|
wolffd@0
|
1182 and the file name is based on a hash value derived
|
wolffd@0
|
1183 from the certificate. This is achieved by process-
|
wolffd@0
|
1184 ing a certificate directory with the "c_rehash"
|
wolffd@0
|
1185 utility supplied with OpenSSL. Using --ca-directory
|
wolffd@0
|
1186 is more efficient than --ca-certificate when many
|
wolffd@0
|
1187 certificates are installed because it allows Wget to
|
wolffd@0
|
1188 fetch certificates on demand.
|
wolffd@0
|
1189
|
wolffd@0
|
1190 Without this option Wget looks for CA certificates
|
wolffd@0
|
1191 at the system-specified locations, chosen at OpenSSL
|
wolffd@0
|
1192 installation time.
|
wolffd@0
|
1193
|
wolffd@0
|
1194 --random-file=file
|
wolffd@0
|
1195 Use file as the source of random data for seeding
|
wolffd@0
|
1196 the pseudo-random number generator on systems with-
|
wolffd@0
|
1197 out /dev/random.
|
wolffd@0
|
1198
|
wolffd@0
|
1199 On such systems the SSL library needs an external
|
wolffd@0
|
1200 source of randomness to initialize. Randomness may
|
wolffd@0
|
1201 be provided by EGD (see --egd-file below) or read
|
wolffd@0
|
1202 from an external source specified by the user. If
|
wolffd@0
|
1203 this option is not specified, Wget looks for random
|
wolffd@0
|
1204 data in $RANDFILE or, if that is unset, in
|
wolffd@0
|
1205 $HOME/.rnd. If none of those are available, it is
|
wolffd@0
|
1206 likely that SSL encryption will not be usable.
|
wolffd@0
|
1207
|
wolffd@0
|
1208 If you're getting the "Could not seed OpenSSL PRNG;
|
wolffd@0
|
1209 disabling SSL." error, you should provide random
|
wolffd@0
|
1210 data using some of the methods described above.
|
wolffd@0
|
1211
|
wolffd@0
|
1212 --egd-file=file
|
wolffd@0
|
1213 Use file as the EGD socket. EGD stands for Entropy
|
wolffd@0
|
1214 Gathering Daemon, a user-space program that collects
|
wolffd@0
|
1215 data from various unpredictable system sources and
|
wolffd@0
|
1216 makes it available to other programs that might need
|
wolffd@0
|
1217 it. Encryption software, such as the SSL library,
|
wolffd@0
|
1218 needs sources of non-repeating randomness to seed
|
wolffd@0
|
1219 the random number generator used to produce crypto-
|
wolffd@0
|
1220 graphically strong keys.
|
wolffd@0
|
1221
|
wolffd@0
|
1222 OpenSSL allows the user to specify his own source of
|
wolffd@0
|
1223 entropy using the "RAND_FILE" environment variable.
|
wolffd@0
|
1224 If this variable is unset, or if the specified file
|
wolffd@0
|
1225 does not produce enough randomness, OpenSSL will
|
wolffd@0
|
1226 read random data from EGD socket specified using
|
wolffd@0
|
1227 this option.
|
wolffd@0
|
1228
|
wolffd@0
|
1229 If this option is not specified (and the equivalent
|
wolffd@0
|
1230 startup command is not used), EGD is never con-
|
wolffd@0
|
1231 tacted. EGD is not needed on modern Unix systems
|
wolffd@0
|
1232 that support /dev/random.
|
wolffd@0
|
1233
|
wolffd@0
|
1234 FTP Options
|
wolffd@0
|
1235
|
wolffd@0
|
1236
|
wolffd@0
|
1237 --ftp-user=user
|
wolffd@0
|
1238 --ftp-password=password
|
wolffd@0
|
1239 Specify the username user and password password on
|
wolffd@0
|
1240 an FTP server. Without this, or the corresponding
|
wolffd@0
|
1241 startup option, the password defaults to -wget@,
|
wolffd@0
|
1242 normally used for anonymous FTP.
|
wolffd@0
|
1243
|
wolffd@0
|
1244 Another way to specify username and password is in
|
wolffd@0
|
1245 the URL itself. Either method reveals your password
|
wolffd@0
|
1246 to anyone who bothers to run "ps". To prevent the
|
wolffd@0
|
1247 passwords from being seen, store them in .wgetrc or
|
wolffd@0
|
1248 .netrc, and make sure to protect those files from
|
wolffd@0
|
1249 other users with "chmod". If the passwords are
|
wolffd@0
|
1250 really important, do not leave them lying in those
|
wolffd@0
|
1251 files either---edit the files and delete them after
|
wolffd@0
|
1252 Wget has started the download.
|
wolffd@0
|
1253
|
wolffd@0
|
1254 --no-remove-listing
|
wolffd@0
|
1255 Don't remove the temporary .listing files generated
|
wolffd@0
|
1256 by FTP retrievals. Normally, these files contain
|
wolffd@0
|
1257 the raw directory listings received from FTP
|
wolffd@0
|
1258 servers. Not removing them can be useful for debug-
|
wolffd@0
|
1259 ging purposes, or when you want to be able to easily
|
wolffd@0
|
1260 check on the contents of remote server directories
|
wolffd@0
|
1261 (e.g. to verify that a mirror you're running is com-
|
wolffd@0
|
1262 plete).
|
wolffd@0
|
1263
|
wolffd@0
|
1264 Note that even though Wget writes to a known file-
|
wolffd@0
|
1265 name for this file, this is not a security hole in
|
wolffd@0
|
1266 the scenario of a user making .listing a symbolic
|
wolffd@0
|
1267 link to /etc/passwd or something and asking "root"
|
wolffd@0
|
1268 to run Wget in his or her directory. Depending on
|
wolffd@0
|
1269 the options used, either Wget will refuse to write
|
wolffd@0
|
1270 to .listing, making the globbing/recur-
|
wolffd@0
|
1271 sion/time-stamping operation fail, or the symbolic
|
wolffd@0
|
1272 link will be deleted and replaced with the actual
|
wolffd@0
|
1273 .listing file, or the listing will be written to a
|
wolffd@0
|
1274 .listing.number file.
|
wolffd@0
|
1275
|
wolffd@0
|
1276 Even though this situation isn't a problem, though,
|
wolffd@0
|
1277 "root" should never run Wget in a non-trusted user's
|
wolffd@0
|
1278 directory. A user could do something as simple as
|
wolffd@0
|
1279 linking index.html to /etc/passwd and asking "root"
|
wolffd@0
|
1280 to run Wget with -N or -r so the file will be over-
|
wolffd@0
|
1281 written.
|
wolffd@0
|
1282
|
wolffd@0
|
1283 --no-glob
|
wolffd@0
|
1284 Turn off FTP globbing. Globbing refers to the use
|
wolffd@0
|
1285 of shell-like special characters (wildcards), like
|
wolffd@0
|
1286 *, ?, [ and ] to retrieve more than one file from
|
wolffd@0
|
1287 the same directory at once, like:
|
wolffd@0
|
1288
|
wolffd@0
|
1289 wget ftp://gnjilux.srk.fer.hr/*.msg
|
wolffd@0
|
1290
|
wolffd@0
|
1291 By default, globbing will be turned on if the URL
|
wolffd@0
|
1292 contains a globbing character. This option may be
|
wolffd@0
|
1293 used to turn globbing on or off permanently.
|
wolffd@0
|
1294
|
wolffd@0
|
1295 You may have to quote the URL to protect it from
|
wolffd@0
|
1296 being expanded by your shell. Globbing makes Wget
|
wolffd@0
|
1297 look for a directory listing, which is system-spe-
|
wolffd@0
|
1298 cific. This is why it currently works only with
|
wolffd@0
|
1299 Unix FTP servers (and the ones emulating Unix "ls"
|
wolffd@0
|
1300 output).
|
wolffd@0
|
1301
|
wolffd@0
|
1302 --no-passive-ftp
|
wolffd@0
|
1303 Disable the use of the passive FTP transfer mode.
|
wolffd@0
|
1304 Passive FTP mandates that the client connect to the
|
wolffd@0
|
1305 server to establish the data connection rather than
|
wolffd@0
|
1306 the other way around.
|
wolffd@0
|
1307
|
wolffd@0
|
1308 If the machine is connected to the Internet
|
wolffd@0
|
1309 directly, both passive and active FTP should work
|
wolffd@0
|
1310 equally well. Behind most firewall and NAT configu-
|
wolffd@0
|
1311 rations passive FTP has a better chance of working.
|
wolffd@0
|
1312 However, in some rare firewall configurations,
|
wolffd@0
|
1313 active FTP actually works when passive FTP doesn't.
|
wolffd@0
|
1314 If you suspect this to be the case, use this option,
|
wolffd@0
|
1315 or set "passive_ftp=off" in your init file.
|
wolffd@0
|
1316
|
wolffd@0
|
1317 --retr-symlinks
|
wolffd@0
|
1318 Usually, when retrieving FTP directories recursively
|
wolffd@0
|
1319 and a symbolic link is encountered, the linked-to
|
wolffd@0
|
1320 file is not downloaded. Instead, a matching sym-
|
wolffd@0
|
1321 bolic link is created on the local filesystem. The
|
wolffd@0
|
1322 pointed-to file will not be downloaded unless this
|
wolffd@0
|
1323 recursive retrieval would have encountered it sepa-
|
wolffd@0
|
1324 rately and downloaded it anyway.
|
wolffd@0
|
1325
|
wolffd@0
|
1326 When --retr-symlinks is specified, however, symbolic
|
wolffd@0
|
1327 links are traversed and the pointed-to files are
|
wolffd@0
|
1328 retrieved. At this time, this option does not cause
|
wolffd@0
|
1329 Wget to traverse symlinks to directories and recurse
|
wolffd@0
|
1330 through them, but in the future it should be
|
wolffd@0
|
1331 enhanced to do this.
|
wolffd@0
|
1332
|
wolffd@0
|
1333 Note that when retrieving a file (not a directory)
|
wolffd@0
|
1334 because it was specified on the command-line, rather
|
wolffd@0
|
1335 than because it was recursed to, this option has no
|
wolffd@0
|
1336 effect. Symbolic links are always traversed in this
|
wolffd@0
|
1337 case.
|
wolffd@0
|
1338
|
wolffd@0
|
1339 --no-http-keep-alive
|
wolffd@0
|
1340 Turn off the "keep-alive" feature for HTTP down-
|
wolffd@0
|
1341 loads. Normally, Wget asks the server to keep the
|
wolffd@0
|
1342 connection open so that, when you download more than
|
wolffd@0
|
1343 one document from the same server, they get trans-
|
wolffd@0
|
1344 ferred over the same TCP connection. This saves
|
wolffd@0
|
1345 time and at the same time reduces the load on the
|
wolffd@0
|
1346 server.
|
wolffd@0
|
1347
|
wolffd@0
|
1348 This option is useful when, for some reason, persis-
|
wolffd@0
|
1349 tent (keep-alive) connections don't work for you,
|
wolffd@0
|
1350 for example due to a server bug or due to the
|
wolffd@0
|
1351 inability of server-side scripts to cope with the
|
wolffd@0
|
1352 connections.
|
wolffd@0
|
1353
|
wolffd@0
|
1354 Recursive Retrieval Options
|
wolffd@0
|
1355
|
wolffd@0
|
1356
|
wolffd@0
|
1357 -r
|
wolffd@0
|
1358 --recursive
|
wolffd@0
|
1359 Turn on recursive retrieving.
|
wolffd@0
|
1360
|
wolffd@0
|
1361 -l depth
|
wolffd@0
|
1362 --level=depth
|
wolffd@0
|
1363 Specify recursion maximum depth level depth. The
|
wolffd@0
|
1364 default maximum depth is 5.
|
wolffd@0
|
1365
|
wolffd@0
|
1366 --delete-after
|
wolffd@0
|
1367 This option tells Wget to delete every single file
|
wolffd@0
|
1368 it downloads, after having done so. It is useful
|
wolffd@0
|
1369 for pre-fetching popular pages through a proxy,
|
wolffd@0
|
1370 e.g.:
|
wolffd@0
|
1371
|
wolffd@0
|
1372 wget -r -nd --delete-after http://whatever.com/~popular/page/
|
wolffd@0
|
1373
|
wolffd@0
|
1374 The -r option is to retrieve recursively, and -nd to
|
wolffd@0
|
1375 not create directories.
|
wolffd@0
|
1376
|
wolffd@0
|
1377 Note that --delete-after deletes files on the local
|
wolffd@0
|
1378 machine. It does not issue the DELE command to
|
wolffd@0
|
1379 remote FTP sites, for instance. Also note that when
|
wolffd@0
|
1380 --delete-after is specified, --convert-links is
|
wolffd@0
|
1381 ignored, so .orig files are simply not created in
|
wolffd@0
|
1382 the first place.
|
wolffd@0
|
1383
|
wolffd@0
|
1384 -k
|
wolffd@0
|
1385 --convert-links
|
wolffd@0
|
1386 After the download is complete, convert the links in
|
wolffd@0
|
1387 the document to make them suitable for local view-
|
wolffd@0
|
1388 ing. This affects not only the visible hyperlinks,
|
wolffd@0
|
1389 but any part of the document that links to external
|
wolffd@0
|
1390 content, such as embedded images, links to style
|
wolffd@0
|
1391 sheets, hyperlinks to non-HTML content, etc.
|
wolffd@0
|
1392
|
wolffd@0
|
1393 Each link will be changed in one of the two ways:
|
wolffd@0
|
1394
|
wolffd@0
|
1395 * The links to files that have been downloaded by
|
wolffd@0
|
1396 Wget will be changed to refer to the file they
|
wolffd@0
|
1397 point to as a relative link.
|
wolffd@0
|
1398
|
wolffd@0
|
1399 Example: if the downloaded file /foo/doc.html
|
wolffd@0
|
1400 links to /bar/img.gif, also downloaded, then the
|
wolffd@0
|
1401 link in doc.html will be modified to point to
|
wolffd@0
|
1402 ../bar/img.gif. This kind of transformation
|
wolffd@0
|
1403 works reliably for arbitrary combinations of
|
wolffd@0
|
1404 directories.
|
wolffd@0
|
1405
|
wolffd@0
|
1406 * The links to files that have not been downloaded
|
wolffd@0
|
1407 by Wget will be changed to include host name and
|
wolffd@0
|
1408 absolute path of the location they point to.
|
wolffd@0
|
1409
|
wolffd@0
|
1410 Example: if the downloaded file /foo/doc.html
|
wolffd@0
|
1411 links to /bar/img.gif (or to ../bar/img.gif),
|
wolffd@0
|
1412 then the link in doc.html will be modified to
|
wolffd@0
|
1413 point to http://hostname/bar/img.gif.
|
wolffd@0
|
1414
|
wolffd@0
|
1415 Because of this, local browsing works reliably: if a
|
wolffd@0
|
1416 linked file was downloaded, the link will refer to
|
wolffd@0
|
1417 its local name; if it was not downloaded, the link
|
wolffd@0
|
1418 will refer to its full Internet address rather than
|
wolffd@0
|
1419 presenting a broken link. The fact that the former
|
wolffd@0
|
1420 links are converted to relative links ensures that
|
wolffd@0
|
1421 you can move the downloaded hierarchy to another
|
wolffd@0
|
1422 directory.
|
wolffd@0
|
1423
|
wolffd@0
|
1424 Note that only at the end of the download can Wget
|
wolffd@0
|
1425 know which links have been downloaded. Because of
|
wolffd@0
|
1426 that, the work done by -k will be performed at the
|
wolffd@0
|
1427 end of all the downloads.
|
wolffd@0
|
1428
|
wolffd@0
|
1429 -K
|
wolffd@0
|
1430 --backup-converted
|
wolffd@0
|
1431 When converting a file, back up the original version
|
wolffd@0
|
1432 with a .orig suffix. Affects the behavior of -N.
|
wolffd@0
|
1433
|
wolffd@0
|
1434 -m
|
wolffd@0
|
1435 --mirror
|
wolffd@0
|
1436 Turn on options suitable for mirroring. This option
|
wolffd@0
|
1437 turns on recursion and time-stamping, sets infinite
|
wolffd@0
|
1438 recursion depth and keeps FTP directory listings.
|
wolffd@0
|
1439 It is currently equivalent to -r -N -l inf
|
wolffd@0
|
1440 --no-remove-listing.
|
wolffd@0
|
1441
|
wolffd@0
|
1442 -p
|
wolffd@0
|
1443 --page-requisites
|
wolffd@0
|
1444 This option causes Wget to download all the files
|
wolffd@0
|
1445 that are necessary to properly display a given HTML
|
wolffd@0
|
1446 page. This includes such things as inlined images,
|
wolffd@0
|
1447 sounds, and referenced stylesheets.
|
wolffd@0
|
1448
|
wolffd@0
|
1449 Ordinarily, when downloading a single HTML page, any
|
wolffd@0
|
1450 requisite documents that may be needed to display it
|
wolffd@0
|
1451 properly are not downloaded. Using -r together with
|
wolffd@0
|
1452 -l can help, but since Wget does not ordinarily dis-
|
wolffd@0
|
1453 tinguish between external and inlined documents, one
|
wolffd@0
|
1454 is generally left with "leaf documents" that are
|
wolffd@0
|
1455 missing their requisites.
|
wolffd@0
|
1456
|
wolffd@0
|
1457 For instance, say document 1.html contains an
|
wolffd@0
|
1458 "<IMG>" tag referencing 1.gif and an "<A>" tag
|
wolffd@0
|
1459 pointing to external document 2.html. Say that
|
wolffd@0
|
1460 2.html is similar but that its image is 2.gif and it
|
wolffd@0
|
1461 links to 3.html. Say this continues up to some
|
wolffd@0
|
1462 arbitrarily high number.
|
wolffd@0
|
1463
|
wolffd@0
|
1464 If one executes the command:
|
wolffd@0
|
1465
|
wolffd@0
|
1466 wget -r -l 2 http://<site>/1.html
|
wolffd@0
|
1467
|
wolffd@0
|
1468 then 1.html, 1.gif, 2.html, 2.gif, and 3.html will
|
wolffd@0
|
1469 be downloaded. As you can see, 3.html is without
|
wolffd@0
|
1470 its requisite 3.gif because Wget is simply counting
|
wolffd@0
|
1471 the number of hops (up to 2) away from 1.html in
|
wolffd@0
|
1472 order to determine where to stop the recursion.
|
wolffd@0
|
1473 However, with this command:
|
wolffd@0
|
1474
|
wolffd@0
|
1475 wget -r -l 2 -p http://<site>/1.html
|
wolffd@0
|
1476
|
wolffd@0
|
1477 all the above files and 3.html's requisite 3.gif
|
wolffd@0
|
1478 will be downloaded. Similarly,
|
wolffd@0
|
1479
|
wolffd@0
|
1480 wget -r -l 1 -p http://<site>/1.html
|
wolffd@0
|
1481
|
wolffd@0
|
1482 will cause 1.html, 1.gif, 2.html, and 2.gif to be
|
wolffd@0
|
1483 downloaded. One might think that:
|
wolffd@0
|
1484
|
wolffd@0
|
1485 wget -r -l 0 -p http://<site>/1.html
|
wolffd@0
|
1486
|
wolffd@0
|
1487 would download just 1.html and 1.gif, but unfortu-
|
wolffd@0
|
1488 nately this is not the case, because -l 0 is equiva-
|
wolffd@0
|
1489 lent to -l inf---that is, infinite recursion. To
|
wolffd@0
|
1490 download a single HTML page (or a handful of them,
|
wolffd@0
|
1491 all specified on the command-line or in a -i URL
|
wolffd@0
|
1492 input file) and its (or their) requisites, simply
|
wolffd@0
|
1493 leave off -r and -l:
|
wolffd@0
|
1494
|
wolffd@0
|
1495 wget -p http://<site>/1.html
|
wolffd@0
|
1496
|
wolffd@0
|
1497 Note that Wget will behave as if -r had been speci-
|
wolffd@0
|
1498 fied, but only that single page and its requisites
|
wolffd@0
|
1499 will be downloaded. Links from that page to exter-
|
wolffd@0
|
1500 nal documents will not be followed. Actually, to
|
wolffd@0
|
1501 download a single page and all its requisites (even
|
wolffd@0
|
1502 if they exist on separate websites), and make sure
|
wolffd@0
|
1503 the lot displays properly locally, this author likes
|
wolffd@0
|
1504 to use a few options in addition to -p:
|
wolffd@0
|
1505
|
wolffd@0
|
1506 wget -E -H -k -K -p http://<site>/<document>
|
wolffd@0
|
1507
|
wolffd@0
|
1508 To finish off this topic, it's worth knowing that
|
wolffd@0
|
1509 Wget's idea of an external document link is any URL
|
wolffd@0
|
1510 specified in an "<A>" tag, an "<AREA>" tag, or a
|
wolffd@0
|
1511 "<LINK>" tag other than "<LINK REL="stylesheet">".
|
wolffd@0
|
1512
|
wolffd@0
|
1513 --strict-comments
|
wolffd@0
|
1514 Turn on strict parsing of HTML comments. The
|
wolffd@0
|
1515 default is to terminate comments at the first occur-
|
wolffd@0
|
1516 rence of -->.
|
wolffd@0
|
1517
|
wolffd@0
|
1518 According to specifications, HTML comments are
|
wolffd@0
|
1519 expressed as SGML declarations. Declaration is spe-
|
wolffd@0
|
1520 cial markup that begins with <! and ends with >,
|
wolffd@0
|
1521 such as <!DOCTYPE ...>, that may contain comments
|
wolffd@0
|
1522 between a pair of -- delimiters. HTML comments are
|
wolffd@0
|
1523 "empty declarations", SGML declarations without any
|
wolffd@0
|
1524 non-comment text. Therefore, <!--foo--> is a valid
|
wolffd@0
|
1525 comment, and so is <!--one-- --two-->, but
|
wolffd@0
|
1526 <!--1--2--> is not.
|
wolffd@0
|
1527
|
wolffd@0
|
1528 On the other hand, most HTML writers don't perceive
|
wolffd@0
|
1529 comments as anything other than text delimited with
|
wolffd@0
|
1530 <!-- and -->, which is not quite the same. For
|
wolffd@0
|
1531 example, something like <!------------> works as a
|
wolffd@0
|
1532 valid comment as long as the number of dashes is a
|
wolffd@0
|
1533 multiple of four (!). If not, the comment techni-
|
wolffd@0
|
1534 cally lasts until the next --, which may be at the
|
wolffd@0
|
1535 other end of the document. Because of this, many
|
wolffd@0
|
1536 popular browsers completely ignore the specification
|
wolffd@0
|
1537 and implement what users have come to expect: com-
|
wolffd@0
|
1538 ments delimited with <!-- and -->.
|
wolffd@0
|
1539
|
wolffd@0
|
1540 Until version 1.9, Wget interpreted comments
|
wolffd@0
|
1541 strictly, which resulted in missing links in many
|
wolffd@0
|
1542 web pages that displayed fine in browsers, but had
|
wolffd@0
|
1543 the misfortune of containing non-compliant comments.
|
wolffd@0
|
1544 Beginning with version 1.9, Wget has joined the
|
wolffd@0
|
1545 ranks of clients that implements "naive" comments,
|
wolffd@0
|
1546 terminating each comment at the first occurrence of
|
wolffd@0
|
1547 -->.
|
wolffd@0
|
1548
|
wolffd@0
|
1549 If, for whatever reason, you want strict comment
|
wolffd@0
|
1550 parsing, use this option to turn it on.
|
wolffd@0
|
1551
|
wolffd@0
|
1552 Recursive Accept/Reject Options
|
wolffd@0
|
1553
|
wolffd@0
|
1554
|
wolffd@0
|
1555 -A acclist --accept acclist
|
wolffd@0
|
1556 -R rejlist --reject rejlist
|
wolffd@0
|
1557 Specify comma-separated lists of file name suffixes
|
wolffd@0
|
1558 or patterns to accept or reject. Note that if any of
|
wolffd@0
|
1559 the wildcard characters, *, ?, [ or ], appear in an
|
wolffd@0
|
1560 element of acclist or rejlist, it will be treated as
|
wolffd@0
|
1561 a pattern, rather than a suffix.
|
wolffd@0
|
1562
|
wolffd@0
|
1563 -D domain-list
|
wolffd@0
|
1564 --domains=domain-list
|
wolffd@0
|
1565 Set domains to be followed. domain-list is a comma-
|
wolffd@0
|
1566 separated list of domains. Note that it does not
|
wolffd@0
|
1567 turn on -H.
|
wolffd@0
|
1568
|
wolffd@0
|
1569 --exclude-domains domain-list
|
wolffd@0
|
1570 Specify the domains that are not to be followed..
|
wolffd@0
|
1571
|
wolffd@0
|
1572 --follow-ftp
|
wolffd@0
|
1573 Follow FTP links from HTML documents. Without this
|
wolffd@0
|
1574 option, Wget will ignore all the FTP links.
|
wolffd@0
|
1575
|
wolffd@0
|
1576 --follow-tags=list
|
wolffd@0
|
1577 Wget has an internal table of HTML tag / attribute
|
wolffd@0
|
1578 pairs that it considers when looking for linked doc-
|
wolffd@0
|
1579 uments during a recursive retrieval. If a user
|
wolffd@0
|
1580 wants only a subset of those tags to be considered,
|
wolffd@0
|
1581 however, he or she should be specify such tags in a
|
wolffd@0
|
1582 comma-separated list with this option.
|
wolffd@0
|
1583
|
wolffd@0
|
1584 --ignore-tags=list
|
wolffd@0
|
1585 This is the opposite of the --follow-tags option.
|
wolffd@0
|
1586 To skip certain HTML tags when recursively looking
|
wolffd@0
|
1587 for documents to download, specify them in a comma-
|
wolffd@0
|
1588 separated list.
|
wolffd@0
|
1589
|
wolffd@0
|
1590 In the past, this option was the best bet for down-
|
wolffd@0
|
1591 loading a single page and its requisites, using a
|
wolffd@0
|
1592 command-line like:
|
wolffd@0
|
1593
|
wolffd@0
|
1594 wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>
|
wolffd@0
|
1595
|
wolffd@0
|
1596 However, the author of this option came across a
|
wolffd@0
|
1597 page with tags like "<LINK REL="home" HREF="/">" and
|
wolffd@0
|
1598 came to the realization that specifying tags to
|
wolffd@0
|
1599 ignore was not enough. One can't just tell Wget to
|
wolffd@0
|
1600 ignore "<LINK>", because then stylesheets will not
|
wolffd@0
|
1601 be downloaded. Now the best bet for downloading a
|
wolffd@0
|
1602 single page and its requisites is the dedicated
|
wolffd@0
|
1603 --page-requisites option.
|
wolffd@0
|
1604
|
wolffd@0
|
1605 --ignore-case
|
wolffd@0
|
1606 Ignore case when matching files and directories.
|
wolffd@0
|
1607 This influences the behavior of -R, -A, -I, and -X
|
wolffd@0
|
1608 options, as well as globbing implemented when down-
|
wolffd@0
|
1609 loading from FTP sites. For example, with this
|
wolffd@0
|
1610 option, -A *.txt will match file1.txt, but also
|
wolffd@0
|
1611 file2.TXT, file3.TxT, and so on.
|
wolffd@0
|
1612
|
wolffd@0
|
1613 -H
|
wolffd@0
|
1614 --span-hosts
|
wolffd@0
|
1615 Enable spanning across hosts when doing recursive
|
wolffd@0
|
1616 retrieving.
|
wolffd@0
|
1617
|
wolffd@0
|
1618 -L
|
wolffd@0
|
1619 --relative
|
wolffd@0
|
1620 Follow relative links only. Useful for retrieving a
|
wolffd@0
|
1621 specific home page without any distractions, not
|
wolffd@0
|
1622 even those from the same hosts.
|
wolffd@0
|
1623
|
wolffd@0
|
1624 -I list
|
wolffd@0
|
1625 --include-directories=list
|
wolffd@0
|
1626 Specify a comma-separated list of directories you
|
wolffd@0
|
1627 wish to follow when downloading. Elements of list
|
wolffd@0
|
1628 may contain wildcards.
|
wolffd@0
|
1629
|
wolffd@0
|
1630 -X list
|
wolffd@0
|
1631 --exclude-directories=list
|
wolffd@0
|
1632 Specify a comma-separated list of directories you
|
wolffd@0
|
1633 wish to exclude from download. Elements of list may
|
wolffd@0
|
1634 contain wildcards.
|
wolffd@0
|
1635
|
wolffd@0
|
1636 -np
|
wolffd@0
|
1637 --no-parent
|
wolffd@0
|
1638 Do not ever ascend to the parent directory when
|
wolffd@0
|
1639 retrieving recursively. This is a useful option,
|
wolffd@0
|
1640 since it guarantees that only the files below a cer-
|
wolffd@0
|
1641 tain hierarchy will be downloaded.
|
wolffd@0
|
1642
|
wolffd@0
|
1643 FILES
|
wolffd@0
|
1644 /usr/local/etc/wgetrc
|
wolffd@0
|
1645 Default location of the global startup file.
|
wolffd@0
|
1646
|
wolffd@0
|
1647 .wgetrc
|
wolffd@0
|
1648 User startup file.
|
wolffd@0
|
1649
|
wolffd@0
|
1650 BUGS
|
wolffd@0
|
1651 You are welcome to submit bug reports via the GNU Wget
|
wolffd@0
|
1652 bug tracker (see <http://wget.addictivecode.org/Bug-
|
wolffd@0
|
1653 Tracker>).
|
wolffd@0
|
1654
|
wolffd@0
|
1655 Before actually submitting a bug report, please try to
|
wolffd@0
|
1656 follow a few simple guidelines.
|
wolffd@0
|
1657
|
wolffd@0
|
1658 1. Please try to ascertain that the behavior you see
|
wolffd@0
|
1659 really is a bug. If Wget crashes, it's a bug. If
|
wolffd@0
|
1660 Wget does not behave as documented, it's a bug. If
|
wolffd@0
|
1661 things work strange, but you are not sure about the
|
wolffd@0
|
1662 way they are supposed to work, it might well be a
|
wolffd@0
|
1663 bug, but you might want to double-check the documen-
|
wolffd@0
|
1664 tation and the mailing lists.
|
wolffd@0
|
1665
|
wolffd@0
|
1666 2. Try to repeat the bug in as simple circumstances as
|
wolffd@0
|
1667 possible. E.g. if Wget crashes while downloading
|
wolffd@0
|
1668 wget -rl0 -kKE -t5 --no-proxy http://yoyodyne.com -o
|
wolffd@0
|
1669 /tmp/log, you should try to see if the crash is
|
wolffd@0
|
1670 repeatable, and if will occur with a simpler set of
|
wolffd@0
|
1671 options. You might even try to start the download
|
wolffd@0
|
1672 at the page where the crash occurred to see if that
|
wolffd@0
|
1673 page somehow triggered the crash.
|
wolffd@0
|
1674
|
wolffd@0
|
1675 Also, while I will probably be interested to know
|
wolffd@0
|
1676 the contents of your .wgetrc file, just dumping it
|
wolffd@0
|
1677 into the debug message is probably a bad idea.
|
wolffd@0
|
1678 Instead, you should first try to see if the bug
|
wolffd@0
|
1679 repeats with .wgetrc moved out of the way. Only if
|
wolffd@0
|
1680 it turns out that .wgetrc settings affect the bug,
|
wolffd@0
|
1681 mail me the relevant parts of the file.
|
wolffd@0
|
1682
|
wolffd@0
|
1683 3. Please start Wget with -d option and send us the
|
wolffd@0
|
1684 resulting output (or relevant parts thereof). If
|
wolffd@0
|
1685 Wget was compiled without debug support, recompile
|
wolffd@0
|
1686 it---it is much easier to trace bugs with debug sup-
|
wolffd@0
|
1687 port on.
|
wolffd@0
|
1688
|
wolffd@0
|
1689 Note: please make sure to remove any potentially
|
wolffd@0
|
1690 sensitive information from the debug log before
|
wolffd@0
|
1691 sending it to the bug address. The "-d" won't go
|
wolffd@0
|
1692 out of its way to collect sensitive information, but
|
wolffd@0
|
1693 the log will contain a fairly complete transcript of
|
wolffd@0
|
1694 Wget's communication with the server, which may
|
wolffd@0
|
1695 include passwords and pieces of downloaded data.
|
wolffd@0
|
1696 Since the bug address is publically archived, you
|
wolffd@0
|
1697 may assume that all bug reports are visible to the
|
wolffd@0
|
1698 public.
|
wolffd@0
|
1699
|
wolffd@0
|
1700 4. If Wget has crashed, try to run it in a debugger,
|
wolffd@0
|
1701 e.g. "gdb `which wget` core" and type "where" to get
|
wolffd@0
|
1702 the backtrace. This may not work if the system
|
wolffd@0
|
1703 administrator has disabled core files, but it is
|
wolffd@0
|
1704 safe to try.
|
wolffd@0
|
1705
|
wolffd@0
|
1706 SEE ALSO
|
wolffd@0
|
1707 This is not the complete manual for GNU Wget. For more
|
wolffd@0
|
1708 complete information, including more detailed explana-
|
wolffd@0
|
1709 tions of some of the options, and a number of commands
|
wolffd@0
|
1710 available for use with .wgetrc files and the -e option,
|
wolffd@0
|
1711 see the GNU Info entry for wget.
|
wolffd@0
|
1712
|
wolffd@0
|
1713 AUTHOR
|
wolffd@0
|
1714 Originally written by Hrvoje Niksic
|
wolffd@0
|
1715 <hniksic@xemacs.org>. Currently maintained by Micah
|
wolffd@0
|
1716 Cowan <micah@cowan.name>.
|
wolffd@0
|
1717
|
wolffd@0
|
1718 COPYRIGHT
|
wolffd@0
|
1719 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002,
|
wolffd@0
|
1720 2003, 2004, 2005, 2006, 2007, 2008 Free Software Founda-
|
wolffd@0
|
1721 tion, Inc.
|
wolffd@0
|
1722
|
wolffd@0
|
1723 Permission is granted to copy, distribute and/or modify
|
wolffd@0
|
1724 this document under the terms of the GNU Free Documenta-
|
wolffd@0
|
1725 tion License, Version 1.2 or any later version published
|
wolffd@0
|
1726 by the Free Software Foundation; with no Invariant Sec-
|
wolffd@0
|
1727 tions, no Front-Cover Texts, and no Back-Cover Texts. A
|
wolffd@0
|
1728 copy of the license is included in the section entitled
|
wolffd@0
|
1729 "GNU Free Documentation License".
|
wolffd@0
|
1730
|
wolffd@0
|
1731
|
wolffd@0
|
1732
|
wolffd@0
|
1733 GNU Wget 1.11.4 2008-06-29 WGET(1)
|