wolffd@0: WGET(1)                     GNU Wget                    WGET(1)
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: NAME
wolffd@0:        Wget - The non-interactive network downloader.
wolffd@0: 
wolffd@0: SYNOPSIS
wolffd@0:        wget [option]... [URL]...
wolffd@0: 
wolffd@0: DESCRIPTION
wolffd@0:        GNU Wget is a free utility for non-interactive download
wolffd@0:        of files from the Web.  It supports HTTP, HTTPS, and FTP
wolffd@0:        protocols, as well as retrieval through HTTP proxies.
wolffd@0: 
wolffd@0:        Wget is non-interactive, meaning that it can work in the
wolffd@0:        background, while the user is not logged on.  This
wolffd@0:        allows you to start a retrieval and disconnect from the
wolffd@0:        system, letting Wget finish the work.  By contrast, most
wolffd@0:        of the Web browsers require constant user's presence,
wolffd@0:        which can be a great hindrance when transferring a lot
wolffd@0:        of data.
wolffd@0: 
wolffd@0:        Wget can follow links in HTML and XHTML pages and create
wolffd@0:        local versions of remote web sites, fully recreating the
wolffd@0:        directory structure of the original site.  This is some-
wolffd@0:        times referred to as "recursive downloading."  While
wolffd@0:        doing that, Wget respects the Robot Exclusion Standard
wolffd@0:        (/robots.txt).  Wget can be instructed to convert the
wolffd@0:        links in downloaded HTML files to the local files for
wolffd@0:        offline viewing.
wolffd@0: 
wolffd@0:        Wget has been designed for robustness over slow or
wolffd@0:        unstable network connections; if a download fails due to
wolffd@0:        a network problem, it will keep retrying until the whole
wolffd@0:        file has been retrieved.  If the server supports reget-
wolffd@0:        ting, it will instruct the server to continue the down-
wolffd@0:        load from where it left off.
wolffd@0: 
wolffd@0: OPTIONS
wolffd@0:        Option Syntax
wolffd@0: 
wolffd@0:        Since Wget uses GNU getopt to process command-line argu-
wolffd@0:        ments, every option has a long form along with the short
wolffd@0:        one.  Long options are more convenient to remember, but
wolffd@0:        take time to type.  You may freely mix different option
wolffd@0:        styles, or specify options after the command-line argu-
wolffd@0:        ments.  Thus you may write:
wolffd@0: 
wolffd@0:                wget -r --tries=10 http://fly.srk.fer.hr/ -o log
wolffd@0: 
wolffd@0:        The space between the option accepting an argument and
wolffd@0:        the argument may be omitted.  Instead of -o log you can
wolffd@0:        write -olog.
wolffd@0: 
wolffd@0:        You may put several options that do not require argu-
wolffd@0:        ments together, like:
wolffd@0: 
wolffd@0:                wget -drc <URL>
wolffd@0: 
wolffd@0:        This is a complete equivalent of:
wolffd@0: 
wolffd@0:                wget -d -r -c <URL>
wolffd@0: 
wolffd@0:        Since the options can be specified after the arguments,
wolffd@0:        you may terminate them with --.  So the following will
wolffd@0:        try to download URL -x, reporting failure to log:
wolffd@0: 
wolffd@0:                wget -o log -- -x
wolffd@0: 
wolffd@0:        The options that accept comma-separated lists all
wolffd@0:        respect the convention that specifying an empty list
wolffd@0:        clears its value.  This can be useful to clear the
wolffd@0:        .wgetrc settings.  For instance, if your .wgetrc sets
wolffd@0:        "exclude_directories" to /cgi-bin, the following example
wolffd@0:        will first reset it, and then set it to exclude /~nobody
wolffd@0:        and /~somebody.  You can also clear the lists in
wolffd@0:        .wgetrc.
wolffd@0: 
wolffd@0:                wget -X " -X /~nobody,/~somebody
wolffd@0: 
wolffd@0:        Most options that do not accept arguments are boolean
wolffd@0:        options, so named because their state can be captured
wolffd@0:        with a yes-or-no ("boolean") variable.  For example,
wolffd@0:        --follow-ftp tells Wget to follow FTP links from HTML
wolffd@0:        files and, on the other hand, --no-glob tells it not to
wolffd@0:        perform file globbing on FTP URLs.  A boolean option is
wolffd@0:        either affirmative or negative (beginning with --no).
wolffd@0:        All such options share several properties.
wolffd@0: 
wolffd@0:        Unless stated otherwise, it is assumed that the default
wolffd@0:        behavior is the opposite of what the option accom-
wolffd@0:        plishes.  For example, the documented existence of
wolffd@0:        --follow-ftp assumes that the default is to not follow
wolffd@0:        FTP links from HTML pages.
wolffd@0: 
wolffd@0:        Affirmative options can be negated by prepending the
wolffd@0:        --no- to the option name; negative options can be
wolffd@0:        negated by omitting the --no- prefix.  This might seem
wolffd@0:        superfluous---if the default for an affirmative option
wolffd@0:        is to not do something, then why provide a way to
wolffd@0:        explicitly turn it off?  But the startup file may in
wolffd@0:        fact change the default.  For instance, using "fol-
wolffd@0:        low_ftp = off" in .wgetrc makes Wget not follow FTP
wolffd@0:        links by default, and using --no-follow-ftp is the only
wolffd@0:        way to restore the factory default from the command
wolffd@0:        line.
wolffd@0: 
wolffd@0:        Basic Startup Options
wolffd@0: 
wolffd@0: 
wolffd@0:        -V
wolffd@0:        --version
wolffd@0:            Display the version of Wget.
wolffd@0: 
wolffd@0:        -h
wolffd@0:        --help
wolffd@0:            Print a help message describing all of Wget's com-
wolffd@0:            mand-line options.
wolffd@0: 
wolffd@0:        -b
wolffd@0:        --background
wolffd@0:            Go to background immediately after startup.  If no
wolffd@0:            output file is specified via the -o, output is redi-
wolffd@0:            rected to wget-log.
wolffd@0: 
wolffd@0:        -e command
wolffd@0:        --execute command
wolffd@0:            Execute command as if it were a part of .wgetrc.  A
wolffd@0:            command thus invoked will be executed after the com-
wolffd@0:            mands in .wgetrc, thus taking precedence over them.
wolffd@0:            If you need to specify more than one wgetrc command,
wolffd@0:            use multiple instances of -e.
wolffd@0: 
wolffd@0:        Logging and Input File Options
wolffd@0: 
wolffd@0: 
wolffd@0:        -o logfile
wolffd@0:        --output-file=logfile
wolffd@0:            Log all messages to logfile.  The messages are nor-
wolffd@0:            mally reported to standard error.
wolffd@0: 
wolffd@0:        -a logfile
wolffd@0:        --append-output=logfile
wolffd@0:            Append to logfile.  This is the same as -o, only it
wolffd@0:            appends to logfile instead of overwriting the old
wolffd@0:            log file.  If logfile does not exist, a new file is
wolffd@0:            created.
wolffd@0: 
wolffd@0:        -d
wolffd@0:        --debug
wolffd@0:            Turn on debug output, meaning various information
wolffd@0:            important to the developers of Wget if it does not
wolffd@0:            work properly.  Your system administrator may have
wolffd@0:            chosen to compile Wget without debug support, in
wolffd@0:            which case -d will not work.  Please note that com-
wolffd@0:            piling with debug support is always safe---Wget com-
wolffd@0:            piled with the debug support will not print any
wolffd@0:            debug info unless requested with -d.
wolffd@0: 
wolffd@0:        -q
wolffd@0:        --quiet
wolffd@0:            Turn off Wget's output.
wolffd@0: 
wolffd@0:        -v
wolffd@0:        --verbose
wolffd@0:            Turn on verbose output, with all the available data.
wolffd@0:            The default output is verbose.
wolffd@0: 
wolffd@0:        -nv
wolffd@0:        --no-verbose
wolffd@0:            Turn off verbose without being completely quiet (use
wolffd@0:            -q for that), which means that error messages and
wolffd@0:            basic information still get printed.
wolffd@0: 
wolffd@0:        -i file
wolffd@0:        --input-file=file
wolffd@0:            Read URLs from file.  If - is specified as file,
wolffd@0:            URLs are read from the standard input.  (Use ./- to
wolffd@0:            read from a file literally named -.)
wolffd@0: 
wolffd@0:            If this function is used, no URLs need be present on
wolffd@0:            the command line.  If there are URLs both on the
wolffd@0:            command line and in an input file, those on the com-
wolffd@0:            mand lines will be the first ones to be retrieved.
wolffd@0:            The file need not be an HTML document (but no harm
wolffd@0:            if it is)---it is enough if the URLs are just listed
wolffd@0:            sequentially.
wolffd@0: 
wolffd@0:            However, if you specify --force-html, the document
wolffd@0:            will be regarded as html.  In that case you may have
wolffd@0:            problems with relative links, which you can solve
wolffd@0:            either by adding "<base href="url">" to the docu-
wolffd@0:            ments or by specifying --base=url on the command
wolffd@0:            line.
wolffd@0: 
wolffd@0:        -F
wolffd@0:        --force-html
wolffd@0:            When input is read from a file, force it to be
wolffd@0:            treated as an HTML file.  This enables you to
wolffd@0:            retrieve relative links from existing HTML files on
wolffd@0:            your local disk, by adding "<base href="url">" to
wolffd@0:            HTML, or using the --base command-line option.
wolffd@0: 
wolffd@0:        -B URL
wolffd@0:        --base=URL
wolffd@0:            Prepends URL to relative links read from the file
wolffd@0:            specified with the -i option.
wolffd@0: 
wolffd@0:        Download Options
wolffd@0: 
wolffd@0: 
wolffd@0:        --bind-address=ADDRESS
wolffd@0:            When making client TCP/IP connections, bind to
wolffd@0:            ADDRESS on the local machine.  ADDRESS may be speci-
wolffd@0:            fied as a hostname or IP address.  This option can
wolffd@0:            be useful if your machine is bound to multiple IPs.
wolffd@0: 
wolffd@0:        -t number
wolffd@0:        --tries=number
wolffd@0:            Set number of retries to number.  Specify 0 or inf
wolffd@0:            for infinite retrying.  The default is to retry 20
wolffd@0:            times, with the exception of fatal errors like "con-
wolffd@0:            nection refused" or "not found" (404), which are not
wolffd@0:            retried.
wolffd@0: 
wolffd@0:        -O file
wolffd@0:        --output-document=file
wolffd@0:            The documents will not be written to the appropriate
wolffd@0:            files, but all will be concatenated together and
wolffd@0:            written to file.  If - is used as file, documents
wolffd@0:            will be printed to standard output, disabling link
wolffd@0:            conversion.  (Use ./- to print to a file literally
wolffd@0:            named -.)
wolffd@0: 
wolffd@0:            Use of -O is not intended to mean simply "use the
wolffd@0:            name file instead of the one in the URL;" rather, it
wolffd@0:            is analogous to shell redirection: wget -O file
wolffd@0:            http://foo is intended to work like wget -O -
wolffd@0:            http://foo > file; file will be truncated immedi-
wolffd@0:            ately, and all downloaded content will be written
wolffd@0:            there.
wolffd@0: 
wolffd@0:            For this reason, -N (for timestamp-checking) is not
wolffd@0:            supported in combination with -O: since file is
wolffd@0:            always newly created, it will always have a very new
wolffd@0:            timestamp. A warning will be issued if this combina-
wolffd@0:            tion is used.
wolffd@0: 
wolffd@0:            Similarly, using -r or -p with -O may not work as
wolffd@0:            you expect: Wget won't just download the first file
wolffd@0:            to file and then download the rest to their normal
wolffd@0:            names: all downloaded content will be placed in
wolffd@0:            file. This was disabled in version 1.11, but has
wolffd@0:            been reinstated (with a warning) in 1.11.2, as there
wolffd@0:            are some cases where this behavior can actually have
wolffd@0:            some use.
wolffd@0: 
wolffd@0:            Note that a combination with -k is only permitted
wolffd@0:            when downloading a single document, as in that case
wolffd@0:            it will just convert all relative URIs to external
wolffd@0:            ones; -k makes no sense for multiple URIs when
wolffd@0:            they're all being downloaded to a single file.
wolffd@0: 
wolffd@0:        -nc
wolffd@0:        --no-clobber
wolffd@0:            If a file is downloaded more than once in the same
wolffd@0:            directory, Wget's behavior depends on a few options,
wolffd@0:            including -nc.  In certain cases, the local file
wolffd@0:            will be clobbered, or overwritten, upon repeated
wolffd@0:            download.  In other cases it will be preserved.
wolffd@0: 
wolffd@0:            When running Wget without -N, -nc, -r, or p, down-
wolffd@0:            loading the same file in the same directory will
wolffd@0:            result in the original copy of file being preserved
wolffd@0:            and the second copy being named file.1.  If that
wolffd@0:            file is downloaded yet again, the third copy will be
wolffd@0:            named file.2, and so on.  When -nc is specified,
wolffd@0:            this behavior is suppressed, and Wget will refuse to
wolffd@0:            download newer copies of file.  Therefore,
wolffd@0:            ""no-clobber"" is actually a misnomer in this
wolffd@0:            mode---it's not clobbering that's prevented (as the
wolffd@0:            numeric suffixes were already preventing clobber-
wolffd@0:            ing), but rather the multiple version saving that's
wolffd@0:            prevented.
wolffd@0: 
wolffd@0:            When running Wget with -r or -p, but without -N or
wolffd@0:            -nc, re-downloading a file will result in the new
wolffd@0:            copy simply overwriting the old.  Adding -nc will
wolffd@0:            prevent this behavior, instead causing the original
wolffd@0:            version to be preserved and any newer copies on the
wolffd@0:            server to be ignored.
wolffd@0: 
wolffd@0:            When running Wget with -N, with or without -r or -p,
wolffd@0:            the decision as to whether or not to download a
wolffd@0:            newer copy of a file depends on the local and remote
wolffd@0:            timestamp and size of the file.  -nc may not be
wolffd@0:            specified at the same time as -N.
wolffd@0: 
wolffd@0:            Note that when -nc is specified, files with the suf-
wolffd@0:            fixes .html or .htm will be loaded from the local
wolffd@0:            disk and parsed as if they had been retrieved from
wolffd@0:            the Web.
wolffd@0: 
wolffd@0:        -c
wolffd@0:        --continue
wolffd@0:            Continue getting a partially-downloaded file.  This
wolffd@0:            is useful when you want to finish up a download
wolffd@0:            started by a previous instance of Wget, or by
wolffd@0:            another program.  For instance:
wolffd@0: 
wolffd@0:                    wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
wolffd@0: 
wolffd@0:            If there is a file named ls-lR.Z in the current
wolffd@0:            directory, Wget will assume that it is the first
wolffd@0:            portion of the remote file, and will ask the server
wolffd@0:            to continue the retrieval from an offset equal to
wolffd@0:            the length of the local file.
wolffd@0: 
wolffd@0:            Note that you don't need to specify this option if
wolffd@0:            you just want the current invocation of Wget to
wolffd@0:            retry downloading a file should the connection be
wolffd@0:            lost midway through.  This is the default behavior.
wolffd@0:            -c only affects resumption of downloads started
wolffd@0:            prior to this invocation of Wget, and whose local
wolffd@0:            files are still sitting around.
wolffd@0: 
wolffd@0:            Without -c, the previous example would just download
wolffd@0:            the remote file to ls-lR.Z.1, leaving the truncated
wolffd@0:            ls-lR.Z file alone.
wolffd@0: 
wolffd@0:            Beginning with Wget 1.7, if you use -c on a non-
wolffd@0:            empty file, and it turns out that the server does
wolffd@0:            not support continued downloading, Wget will refuse
wolffd@0:            to start the download from scratch, which would
wolffd@0:            effectively ruin existing contents.  If you really
wolffd@0:            want the download to start from scratch, remove the
wolffd@0:            file.
wolffd@0: 
wolffd@0:            Also beginning with Wget 1.7, if you use -c on a
wolffd@0:            file which is of equal size as the one on the
wolffd@0:            server, Wget will refuse to download the file and
wolffd@0:            print an explanatory message.  The same happens when
wolffd@0:            the file is smaller on the server than locally (pre-
wolffd@0:            sumably because it was changed on the server since
wolffd@0:            your last download attempt)---because "continuing"
wolffd@0:            is not meaningful, no download occurs.
wolffd@0: 
wolffd@0:            On the other side of the coin, while using -c, any
wolffd@0:            file that's bigger on the server than locally will
wolffd@0:            be considered an incomplete download and only
wolffd@0:            "(length(remote) - length(local))" bytes will be
wolffd@0:            downloaded and tacked onto the end of the local
wolffd@0:            file.  This behavior can be desirable in certain
wolffd@0:            cases---for instance, you can use wget -c to down-
wolffd@0:            load just the new portion that's been appended to a
wolffd@0:            data collection or log file.
wolffd@0: 
wolffd@0:            However, if the file is bigger on the server because
wolffd@0:            it's been changed, as opposed to just appended to,
wolffd@0:            you'll end up with a garbled file.  Wget has no way
wolffd@0:            of verifying that the local file is really a valid
wolffd@0:            prefix of the remote file.  You need to be espe-
wolffd@0:            cially careful of this when using -c in conjunction
wolffd@0:            with -r, since every file will be considered as an
wolffd@0:            "incomplete download" candidate.
wolffd@0: 
wolffd@0:            Another instance where you'll get a garbled file if
wolffd@0:            you try to use -c is if you have a lame HTTP proxy
wolffd@0:            that inserts a "transfer interrupted" string into
wolffd@0:            the local file.  In the future a "rollback" option
wolffd@0:            may be added to deal with this case.
wolffd@0: 
wolffd@0:            Note that -c only works with FTP servers and with
wolffd@0:            HTTP servers that support the "Range" header.
wolffd@0: 
wolffd@0:        --progress=type
wolffd@0:            Select the type of the progress indicator you wish
wolffd@0:            to use.  Legal indicators are "dot" and "bar".
wolffd@0: 
wolffd@0:            The "bar" indicator is used by default.  It draws an
wolffd@0:            ASCII progress bar graphics (a.k.a "thermometer"
wolffd@0:            display) indicating the status of retrieval.  If the
wolffd@0:            output is not a TTY, the "dot" bar will be used by
wolffd@0:            default.
wolffd@0: 
wolffd@0:            Use --progress=dot to switch to the "dot" display.
wolffd@0:            It traces the retrieval by printing dots on the
wolffd@0:            screen, each dot representing a fixed amount of
wolffd@0:            downloaded data.
wolffd@0: 
wolffd@0:            When using the dotted retrieval, you may also set
wolffd@0:            the style by specifying the type as dot:style.  Dif-
wolffd@0:            ferent styles assign different meaning to one dot.
wolffd@0:            With the "default" style each dot represents 1K,
wolffd@0:            there are ten dots in a cluster and 50 dots in a
wolffd@0:            line.  The "binary" style has a more "computer"-like
wolffd@0:            orientation---8K dots, 16-dots clusters and 48 dots
wolffd@0:            per line (which makes for 384K lines).  The "mega"
wolffd@0:            style is suitable for downloading very large
wolffd@0:            files---each dot represents 64K retrieved, there are
wolffd@0:            eight dots in a cluster, and 48 dots on each line
wolffd@0:            (so each line contains 3M).
wolffd@0: 
wolffd@0:            Note that you can set the default style using the
wolffd@0:            "progress" command in .wgetrc.  That setting may be
wolffd@0:            overridden from the command line.  The exception is
wolffd@0:            that, when the output is not a TTY, the "dot"
wolffd@0:            progress will be favored over "bar".  To force the
wolffd@0:            bar output, use --progress=bar:force.
wolffd@0: 
wolffd@0:        -N
wolffd@0:        --timestamping
wolffd@0:            Turn on time-stamping.
wolffd@0: 
wolffd@0:        -S
wolffd@0:        --server-response
wolffd@0:            Print the headers sent by HTTP servers and responses
wolffd@0:            sent by FTP servers.
wolffd@0: 
wolffd@0:        --spider
wolffd@0:            When invoked with this option, Wget will behave as a
wolffd@0:            Web spider, which means that it will not download
wolffd@0:            the pages, just check that they are there.  For
wolffd@0:            example, you can use Wget to check your bookmarks:
wolffd@0: 
wolffd@0:                    wget --spider --force-html -i bookmarks.html
wolffd@0: 
wolffd@0:            This feature needs much more work for Wget to get
wolffd@0:            close to the functionality of real web spiders.
wolffd@0: 
wolffd@0:        -T seconds
wolffd@0:        --timeout=seconds
wolffd@0:            Set the network timeout to seconds seconds.  This is
wolffd@0:            equivalent to specifying --dns-timeout, --con-
wolffd@0:            nect-timeout, and --read-timeout, all at the same
wolffd@0:            time.
wolffd@0: 
wolffd@0:            When interacting with the network, Wget can check
wolffd@0:            for timeout and abort the operation if it takes too
wolffd@0:            long.  This prevents anomalies like hanging reads
wolffd@0:            and infinite connects.  The only timeout enabled by
wolffd@0:            default is a 900-second read timeout.  Setting a
wolffd@0:            timeout to 0 disables it altogether.  Unless you
wolffd@0:            know what you are doing, it is best not to change
wolffd@0:            the default timeout settings.
wolffd@0: 
wolffd@0:            All timeout-related options accept decimal values,
wolffd@0:            as well as subsecond values.  For example, 0.1 sec-
wolffd@0:            onds is a legal (though unwise) choice of timeout.
wolffd@0:            Subsecond timeouts are useful for checking server
wolffd@0:            response times or for testing network latency.
wolffd@0: 
wolffd@0:        --dns-timeout=seconds
wolffd@0:            Set the DNS lookup timeout to seconds seconds.  DNS
wolffd@0:            lookups that don't complete within the specified
wolffd@0:            time will fail.  By default, there is no timeout on
wolffd@0:            DNS lookups, other than that implemented by system
wolffd@0:            libraries.
wolffd@0: 
wolffd@0:        --connect-timeout=seconds
wolffd@0:            Set the connect timeout to seconds seconds.  TCP
wolffd@0:            connections that take longer to establish will be
wolffd@0:            aborted.  By default, there is no connect timeout,
wolffd@0:            other than that implemented by system libraries.
wolffd@0: 
wolffd@0:        --read-timeout=seconds
wolffd@0:            Set the read (and write) timeout to seconds seconds.
wolffd@0:            The "time" of this timeout refers to idle time: if,
wolffd@0:            at any point in the download, no data is received
wolffd@0:            for more than the specified number of seconds, read-
wolffd@0:            ing fails and the download is restarted.  This
wolffd@0:            option does not directly affect the duration of the
wolffd@0:            entire download.
wolffd@0: 
wolffd@0:            Of course, the remote server may choose to terminate
wolffd@0:            the connection sooner than this option requires.
wolffd@0:            The default read timeout is 900 seconds.
wolffd@0: 
wolffd@0:        --limit-rate=amount
wolffd@0:            Limit the download speed to amount bytes per second.
wolffd@0:            Amount may be expressed in bytes, kilobytes with the
wolffd@0:            k suffix, or megabytes with the m suffix.  For exam-
wolffd@0:            ple, --limit-rate=20k will limit the retrieval rate
wolffd@0:            to 20KB/s.  This is useful when, for whatever rea-
wolffd@0:            son, you don't want Wget to consume the entire
wolffd@0:            available bandwidth.
wolffd@0: 
wolffd@0:            This option allows the use of decimal numbers, usu-
wolffd@0:            ally in conjunction with power suffixes; for exam-
wolffd@0:            ple, --limit-rate=2.5k is a legal value.
wolffd@0: 
wolffd@0:            Note that Wget implements the limiting by sleeping
wolffd@0:            the appropriate amount of time after a network read
wolffd@0:            that took less time than specified by the rate.
wolffd@0:            Eventually this strategy causes the TCP transfer to
wolffd@0:            slow down to approximately the specified rate.  How-
wolffd@0:            ever, it may take some time for this balance to be
wolffd@0:            achieved, so don't be surprised if limiting the rate
wolffd@0:            doesn't work well with very small files.
wolffd@0: 
wolffd@0:        -w seconds
wolffd@0:        --wait=seconds
wolffd@0:            Wait the specified number of seconds between the
wolffd@0:            retrievals.  Use of this option is recommended, as
wolffd@0:            it lightens the server load by making the requests
wolffd@0:            less frequent.  Instead of in seconds, the time can
wolffd@0:            be specified in minutes using the "m" suffix, in
wolffd@0:            hours using "h" suffix, or in days using "d" suffix.
wolffd@0: 
wolffd@0:            Specifying a large value for this option is useful
wolffd@0:            if the network or the destination host is down, so
wolffd@0:            that Wget can wait long enough to reasonably expect
wolffd@0:            the network error to be fixed before the retry.  The
wolffd@0:            waiting interval specified by this function is
wolffd@0:            influenced by "--random-wait", which see.
wolffd@0: 
wolffd@0:        --waitretry=seconds
wolffd@0:            If you don't want Wget to wait between every
wolffd@0:            retrieval, but only between retries of failed down-
wolffd@0:            loads, you can use this option.  Wget will use lin-
wolffd@0:            ear backoff, waiting 1 second after the first fail-
wolffd@0:            ure on a given file, then waiting 2 seconds after
wolffd@0:            the second failure on that file, up to the maximum
wolffd@0:            number of seconds you specify.  Therefore, a value
wolffd@0:            of 10 will actually make Wget wait up to (1 + 2 +
wolffd@0:            ... + 10) = 55 seconds per file.
wolffd@0: 
wolffd@0:            Note that this option is turned on by default in the
wolffd@0:            global wgetrc file.
wolffd@0: 
wolffd@0:        --random-wait
wolffd@0:            Some web sites may perform log analysis to identify
wolffd@0:            retrieval programs such as Wget by looking for sta-
wolffd@0:            tistically significant similarities in the time
wolffd@0:            between requests. This option causes the time
wolffd@0:            between requests to vary between 0.5 and 1.5 * wait
wolffd@0:            seconds, where wait was specified using the --wait
wolffd@0:            option, in order to mask Wget's presence from such
wolffd@0:            analysis.
wolffd@0: 
wolffd@0:            A 2001 article in a publication devoted to develop-
wolffd@0:            ment on a popular consumer platform provided code to
wolffd@0:            perform this analysis on the fly.  Its author sug-
wolffd@0:            gested blocking at the class C address level to
wolffd@0:            ensure automated retrieval programs were blocked
wolffd@0:            despite changing DHCP-supplied addresses.
wolffd@0: 
wolffd@0:            The --random-wait option was inspired by this ill-
wolffd@0:            advised recommendation to block many unrelated users
wolffd@0:            from a web site due to the actions of one.
wolffd@0: 
wolffd@0:        --no-proxy
wolffd@0:            Don't use proxies, even if the appropriate *_proxy
wolffd@0:            environment variable is defined.
wolffd@0: 
wolffd@0:        -Q quota
wolffd@0:        --quota=quota
wolffd@0:            Specify download quota for automatic retrievals.
wolffd@0:            The value can be specified in bytes (default), kilo-
wolffd@0:            bytes (with k suffix), or megabytes (with m suffix).
wolffd@0: 
wolffd@0:            Note that quota will never affect downloading a sin-
wolffd@0:            gle file.  So if you specify wget -Q10k
wolffd@0:            ftp://wuarchive.wustl.edu/ls-lR.gz, all of the
wolffd@0:            ls-lR.gz will be downloaded.  The same goes even
wolffd@0:            when several URLs are specified on the command-line.
wolffd@0:            However, quota is respected when retrieving either
wolffd@0:            recursively, or from an input file.  Thus you may
wolffd@0:            safely type wget -Q2m -i sites---download will be
wolffd@0:            aborted when the quota is exceeded.
wolffd@0: 
wolffd@0:            Setting quota to 0 or to inf unlimits the download
wolffd@0:            quota.
wolffd@0: 
wolffd@0:        --no-dns-cache
wolffd@0:            Turn off caching of DNS lookups.  Normally, Wget
wolffd@0:            remembers the IP addresses it looked up from DNS so
wolffd@0:            it doesn't have to repeatedly contact the DNS server
wolffd@0:            for the same (typically small) set of hosts it
wolffd@0:            retrieves from.  This cache exists in memory only; a
wolffd@0:            new Wget run will contact DNS again.
wolffd@0: 
wolffd@0:            However, it has been reported that in some situa-
wolffd@0:            tions it is not desirable to cache host names, even
wolffd@0:            for the duration of a short-running application like
wolffd@0:            Wget.  With this option Wget issues a new DNS lookup
wolffd@0:            (more precisely, a new call to "gethostbyname" or
wolffd@0:            "getaddrinfo") each time it makes a new connection.
wolffd@0:            Please note that this option will not affect caching
wolffd@0:            that might be performed by the resolving library or
wolffd@0:            by an external caching layer, such as NSCD.
wolffd@0: 
wolffd@0:            If you don't understand exactly what this option
wolffd@0:            does, you probably won't need it.
wolffd@0: 
wolffd@0:        --restrict-file-names=mode
wolffd@0:            Change which characters found in remote URLs may
wolffd@0:            show up in local file names generated from those
wolffd@0:            URLs.  Characters that are restricted by this option
wolffd@0:            are escaped, i.e. replaced with %HH, where HH is the
wolffd@0:            hexadecimal number that corresponds to the
wolffd@0:            restricted character.
wolffd@0: 
wolffd@0:            By default, Wget escapes the characters that are not
wolffd@0:            valid as part of file names on your operating sys-
wolffd@0:            tem, as well as control characters that are typi-
wolffd@0:            cally unprintable.  This option is useful for chang-
wolffd@0:            ing these defaults, either because you are download-
wolffd@0:            ing to a non-native partition, or because you want
wolffd@0:            to disable escaping of the control characters.
wolffd@0: 
wolffd@0:            When mode is set to "unix", Wget escapes the charac-
wolffd@0:            ter / and the control characters in the ranges 0--31
wolffd@0:            and 128--159.  This is the default on Unix-like
wolffd@0:            OS'es.
wolffd@0: 
wolffd@0:            When mode is set to "windows", Wget escapes the
wolffd@0:            characters \, |, /, :, ?, ", *, <, >, and the con-
wolffd@0:            trol characters in the ranges 0--31 and 128--159.
wolffd@0:            In addition to this, Wget in Windows mode uses +
wolffd@0:            instead of : to separate host and port in local file
wolffd@0:            names, and uses @ instead of ? to separate the query
wolffd@0:            portion of the file name from the rest.  Therefore,
wolffd@0:            a URL that would be saved as
wolffd@0:            www.xemacs.org:4300/search.pl?input=blah in Unix
wolffd@0:            mode would be saved as
wolffd@0:            www.xemacs.org+4300/search.pl@input=blah in Windows
wolffd@0:            mode.  This mode is the default on Windows.
wolffd@0: 
wolffd@0:            If you append ,nocontrol to the mode, as in
wolffd@0:            unix,nocontrol, escaping of the control characters
wolffd@0:            is also switched off.  You can use
wolffd@0:            --restrict-file-names=nocontrol to turn off escaping
wolffd@0:            of control characters without affecting the choice
wolffd@0:            of the OS to use as file name restriction mode.
wolffd@0: 
wolffd@0:        -4
wolffd@0:        --inet4-only
wolffd@0:        -6
wolffd@0:        --inet6-only
wolffd@0:            Force connecting to IPv4 or IPv6 addresses.  With
wolffd@0:            --inet4-only or -4, Wget will only connect to IPv4
wolffd@0:            hosts, ignoring AAAA records in DNS, and refusing to
wolffd@0:            connect to IPv6 addresses specified in URLs.  Con-
wolffd@0:            versely, with --inet6-only or -6, Wget will only
wolffd@0:            connect to IPv6 hosts and ignore A records and IPv4
wolffd@0:            addresses.
wolffd@0: 
wolffd@0:            Neither options should be needed normally.  By
wolffd@0:            default, an IPv6-aware Wget will use the address
wolffd@0:            family specified by the host's DNS record.  If the
wolffd@0:            DNS responds with both IPv4 and IPv6 addresses, Wget
wolffd@0:            will try them in sequence until it finds one it can
wolffd@0:            connect to.  (Also see "--prefer-family" option
wolffd@0:            described below.)
wolffd@0: 
wolffd@0:            These options can be used to deliberately force the
wolffd@0:            use of IPv4 or IPv6 address families on dual family
wolffd@0:            systems, usually to aid debugging or to deal with
wolffd@0:            broken network configuration.  Only one of
wolffd@0:            --inet6-only and --inet4-only may be specified at
wolffd@0:            the same time.  Neither option is available in Wget
wolffd@0:            compiled without IPv6 support.
wolffd@0: 
wolffd@0:        --prefer-family=IPv4/IPv6/none
wolffd@0:            When given a choice of several addresses, connect to
wolffd@0:            the addresses with specified address family first.
wolffd@0:            IPv4 addresses are preferred by default.
wolffd@0: 
wolffd@0:            This avoids spurious errors and connect attempts
wolffd@0:            when accessing hosts that resolve to both IPv6 and
wolffd@0:            IPv4 addresses from IPv4 networks.  For example,
wolffd@0:            www.kame.net resolves to
wolffd@0:            2001:200:0:8002:203:47ff:fea5:3085 and to
wolffd@0:            203.178.141.194.  When the preferred family is
wolffd@0:            "IPv4", the IPv4 address is used first; when the
wolffd@0:            preferred family is "IPv6", the IPv6 address is used
wolffd@0:            first; if the specified value is "none", the address
wolffd@0:            order returned by DNS is used without change.
wolffd@0: 
wolffd@0:            Unlike -4 and -6, this option doesn't inhibit access
wolffd@0:            to any address family, it only changes the order in
wolffd@0:            which the addresses are accessed.  Also note that
wolffd@0:            the reordering performed by this option is sta-
wolffd@0:            ble---it doesn't affect order of addresses of the
wolffd@0:            same family.  That is, the relative order of all
wolffd@0:            IPv4 addresses and of all IPv6 addresses remains
wolffd@0:            intact in all cases.
wolffd@0: 
wolffd@0:        --retry-connrefused
wolffd@0:            Consider "connection refused" a transient error and
wolffd@0:            try again.  Normally Wget gives up on a URL when it
wolffd@0:            is unable to connect to the site because failure to
wolffd@0:            connect is taken as a sign that the server is not
wolffd@0:            running at all and that retries would not help.
wolffd@0:            This option is for mirroring unreliable sites whose
wolffd@0:            servers tend to disappear for short periods of time.
wolffd@0: 
wolffd@0:        --user=user
wolffd@0:        --password=password
wolffd@0:            Specify the username user and password password for
wolffd@0:            both FTP and HTTP file retrieval.  These parameters
wolffd@0:            can be overridden using the --ftp-user and
wolffd@0:            --ftp-password options for FTP connections and the
wolffd@0:            --http-user and --http-password options for HTTP
wolffd@0:            connections.
wolffd@0: 
wolffd@0:        Directory Options
wolffd@0: 
wolffd@0: 
wolffd@0:        -nd
wolffd@0:        --no-directories
wolffd@0:            Do not create a hierarchy of directories when
wolffd@0:            retrieving recursively.  With this option turned on,
wolffd@0:            all files will get saved to the current directory,
wolffd@0:            without clobbering (if a name shows up more than
wolffd@0:            once, the filenames will get extensions .n).
wolffd@0: 
wolffd@0:        -x
wolffd@0:        --force-directories
wolffd@0:            The opposite of -nd---create a hierarchy of directo-
wolffd@0:            ries, even if one would not have been created other-
wolffd@0:            wise.  E.g. wget -x http://fly.srk.fer.hr/robots.txt
wolffd@0:            will save the downloaded file to fly.srk.fer.hr/ro-
wolffd@0:            bots.txt.
wolffd@0: 
wolffd@0:        -nH
wolffd@0:        --no-host-directories
wolffd@0:            Disable generation of host-prefixed directories.  By
wolffd@0:            default, invoking Wget with -r
wolffd@0:            http://fly.srk.fer.hr/ will create a structure of
wolffd@0:            directories beginning with fly.srk.fer.hr/.  This
wolffd@0:            option disables such behavior.
wolffd@0: 
wolffd@0:        --protocol-directories
wolffd@0:            Use the protocol name as a directory component of
wolffd@0:            local file names.  For example, with this option,
wolffd@0:            wget -r http://host will save to http/host/...
wolffd@0:            rather than just to host/....
wolffd@0: 
wolffd@0:        --cut-dirs=number
wolffd@0:            Ignore number directory components.  This is useful
wolffd@0:            for getting a fine-grained control over the direc-
wolffd@0:            tory where recursive retrieval will be saved.
wolffd@0: 
wolffd@0:            Take, for example, the directory at
wolffd@0:            ftp://ftp.xemacs.org/pub/xemacs/.  If you retrieve
wolffd@0:            it with -r, it will be saved locally under
wolffd@0:            ftp.xemacs.org/pub/xemacs/.  While the -nH option
wolffd@0:            can remove the ftp.xemacs.org/ part, you are still
wolffd@0:            stuck with pub/xemacs.  This is where --cut-dirs
wolffd@0:            comes in handy; it makes Wget not "see" number
wolffd@0:            remote directory components.  Here are several exam-
wolffd@0:            ples of how --cut-dirs option works.
wolffd@0: 
wolffd@0:                    No options        -> ftp.xemacs.org/pub/xemacs/
wolffd@0:                    -nH               -> pub/xemacs/
wolffd@0:                    -nH --cut-dirs=1  -> xemacs/
wolffd@0:                    -nH --cut-dirs=2  -> .
wolffd@0: 
wolffd@0:                    --cut-dirs=1      -> ftp.xemacs.org/xemacs/
wolffd@0:                    ...
wolffd@0: 
wolffd@0:            If you just want to get rid of the directory struc-
wolffd@0:            ture, this option is similar to a combination of -nd
wolffd@0:            and -P.  However, unlike -nd, --cut-dirs does not
wolffd@0:            lose with subdirectories---for instance, with -nH
wolffd@0:            --cut-dirs=1, a beta/ subdirectory will be placed to
wolffd@0:            xemacs/beta, as one would expect.
wolffd@0: 
wolffd@0:        -P prefix
wolffd@0:        --directory-prefix=prefix
wolffd@0:            Set directory prefix to prefix.  The directory pre-
wolffd@0:            fix is the directory where all other files and sub-
wolffd@0:            directories will be saved to, i.e. the top of the
wolffd@0:            retrieval tree.  The default is . (the current
wolffd@0:            directory).
wolffd@0: 
wolffd@0:        HTTP Options
wolffd@0: 
wolffd@0: 
wolffd@0:        -E
wolffd@0:        --html-extension
wolffd@0:            If a file of type application/xhtml+xml or text/html
wolffd@0:            is downloaded and the URL does not end with the reg-
wolffd@0:            exp \.[Hh][Tt][Mm][Ll]?, this option will cause the
wolffd@0:            suffix .html to be appended to the local filename.
wolffd@0:            This is useful, for instance, when you're mirroring
wolffd@0:            a remote site that uses .asp pages, but you want the
wolffd@0:            mirrored pages to be viewable on your stock Apache
wolffd@0:            server.  Another good use for this is when you're
wolffd@0:            downloading CGI-generated materials.  A URL like
wolffd@0:            http://site.com/article.cgi?25 will be saved as
wolffd@0:            article.cgi?25.html.
wolffd@0: 
wolffd@0:            Note that filenames changed in this way will be re-
wolffd@0:            downloaded every time you re-mirror a site, because
wolffd@0:            Wget can't tell that the local X.html file corre-
wolffd@0:            sponds to remote URL X (since it doesn't yet know
wolffd@0:            that the URL produces output of type text/html or
wolffd@0:            application/xhtml+xml.  To prevent this re-download-
wolffd@0:            ing, you must use -k and -K so that the original
wolffd@0:            version of the file will be saved as X.orig.
wolffd@0: 
wolffd@0:        --http-user=user
wolffd@0:        --http-password=password
wolffd@0:            Specify the username user and password password on
wolffd@0:            an HTTP server.  According to the type of the chal-
wolffd@0:            lenge, Wget will encode them using either the
wolffd@0:            "basic" (insecure), the "digest", or the Windows
wolffd@0:            "NTLM" authentication scheme.
wolffd@0: 
wolffd@0:            Another way to specify username and password is in
wolffd@0:            the URL itself.  Either method reveals your password
wolffd@0:            to anyone who bothers to run "ps".  To prevent the
wolffd@0:            passwords from being seen, store them in .wgetrc or
wolffd@0:            .netrc, and make sure to protect those files from
wolffd@0:            other users with "chmod".  If the passwords are
wolffd@0:            really important, do not leave them lying in those
wolffd@0:            files either---edit the files and delete them after
wolffd@0:            Wget has started the download.
wolffd@0: 
wolffd@0:        --no-cache
wolffd@0:            Disable server-side cache.  In this case, Wget will
wolffd@0:            send the remote server an appropriate directive
wolffd@0:            (Pragma: no-cache) to get the file from the remote
wolffd@0:            service, rather than returning the cached version.
wolffd@0:            This is especially useful for retrieving and flush-
wolffd@0:            ing out-of-date documents on proxy servers.
wolffd@0: 
wolffd@0:            Caching is allowed by default.
wolffd@0: 
wolffd@0:        --no-cookies
wolffd@0:            Disable the use of cookies.  Cookies are a mechanism
wolffd@0:            for maintaining server-side state.  The server sends
wolffd@0:            the client a cookie using the "Set-Cookie" header,
wolffd@0:            and the client responds with the same cookie upon
wolffd@0:            further requests.  Since cookies allow the server
wolffd@0:            owners to keep track of visitors and for sites to
wolffd@0:            exchange this information, some consider them a
wolffd@0:            breach of privacy.  The default is to use cookies;
wolffd@0:            however, storing cookies is not on by default.
wolffd@0: 
wolffd@0:        --load-cookies file
wolffd@0:            Load cookies from file before the first HTTP
wolffd@0:            retrieval.  file is a textual file in the format
wolffd@0:            originally used by Netscape's cookies.txt file.
wolffd@0: 
wolffd@0:            You will typically use this option when mirroring
wolffd@0:            sites that require that you be logged in to access
wolffd@0:            some or all of their content.  The login process
wolffd@0:            typically works by the web server issuing an HTTP
wolffd@0:            cookie upon receiving and verifying your creden-
wolffd@0:            tials.  The cookie is then resent by the browser
wolffd@0:            when accessing that part of the site, and so proves
wolffd@0:            your identity.
wolffd@0: 
wolffd@0:            Mirroring such a site requires Wget to send the same
wolffd@0:            cookies your browser sends when communicating with
wolffd@0:            the site.  This is achieved by --load-cookies---sim-
wolffd@0:            ply point Wget to the location of the cookies.txt
wolffd@0:            file, and it will send the same cookies your browser
wolffd@0:            would send in the same situation.  Different
wolffd@0:            browsers keep textual cookie files in different
wolffd@0:            locations:
wolffd@0: 
wolffd@0:            @asis<Netscape 4.x.>
wolffd@0:                The cookies are in ~/.netscape/cookies.txt.
wolffd@0: 
wolffd@0:            @asis<Mozilla and Netscape 6.x.>
wolffd@0:                Mozilla's cookie file is also named cookies.txt,
wolffd@0:                located somewhere under ~/.mozilla, in the
wolffd@0:                directory of your profile.  The full path usu-
wolffd@0:                ally ends up looking somewhat like
wolffd@0:                ~/.mozilla/default/some-weird-string/cook-
wolffd@0:                ies.txt.
wolffd@0: 
wolffd@0:            @asis<Internet Explorer.>
wolffd@0:                You can produce a cookie file Wget can use by
wolffd@0:                using the File menu, Import and Export, Export
wolffd@0:                Cookies.  This has been tested with Internet
wolffd@0:                Explorer 5; it is not guaranteed to work with
wolffd@0:                earlier versions.
wolffd@0: 
wolffd@0:            @asis<Other browsers.>
wolffd@0:                If you are using a different browser to create
wolffd@0:                your cookies, --load-cookies will only work if
wolffd@0:                you can locate or produce a cookie file in the
wolffd@0:                Netscape format that Wget expects.
wolffd@0: 
wolffd@0:            If you cannot use --load-cookies, there might still
wolffd@0:            be an alternative.  If your browser supports a
wolffd@0:            "cookie manager", you can use it to view the cookies
wolffd@0:            used when accessing the site you're mirroring.
wolffd@0:            Write down the name and value of the cookie, and
wolffd@0:            manually instruct Wget to send those cookies,
wolffd@0:            bypassing the "official" cookie support:
wolffd@0: 
wolffd@0:                    wget --no-cookies --header "Cookie: <name>=<value>"
wolffd@0: 
wolffd@0:        --save-cookies file
wolffd@0:            Save cookies to file before exiting.  This will not
wolffd@0:            save cookies that have expired or that have no
wolffd@0:            expiry time (so-called "session cookies"), but also
wolffd@0:            see --keep-session-cookies.
wolffd@0: 
wolffd@0:        --keep-session-cookies
wolffd@0:            When specified, causes --save-cookies to also save
wolffd@0:            session cookies.  Session cookies are normally not
wolffd@0:            saved because they are meant to be kept in memory
wolffd@0:            and forgotten when you exit the browser.  Saving
wolffd@0:            them is useful on sites that require you to log in
wolffd@0:            or to visit the home page before you can access some
wolffd@0:            pages.  With this option, multiple Wget runs are
wolffd@0:            considered a single browser session as far as the
wolffd@0:            site is concerned.
wolffd@0: 
wolffd@0:            Since the cookie file format does not normally carry
wolffd@0:            session cookies, Wget marks them with an expiry
wolffd@0:            timestamp of 0.  Wget's --load-cookies recognizes
wolffd@0:            those as session cookies, but it might confuse other
wolffd@0:            browsers.  Also note that cookies so loaded will be
wolffd@0:            treated as other session cookies, which means that
wolffd@0:            if you want --save-cookies to preserve them again,
wolffd@0:            you must use --keep-session-cookies again.
wolffd@0: 
wolffd@0:        --ignore-length
wolffd@0:            Unfortunately, some HTTP servers (CGI programs, to
wolffd@0:            be more precise) send out bogus "Content-Length"
wolffd@0:            headers, which makes Wget go wild, as it thinks not
wolffd@0:            all the document was retrieved.  You can spot this
wolffd@0:            syndrome if Wget retries getting the same document
wolffd@0:            again and again, each time claiming that the (other-
wolffd@0:            wise normal) connection has closed on the very same
wolffd@0:            byte.
wolffd@0: 
wolffd@0:            With this option, Wget will ignore the "Con-
wolffd@0:            tent-Length" header---as if it never existed.
wolffd@0: 
wolffd@0:        --header=header-line
wolffd@0:            Send header-line along with the rest of the headers
wolffd@0:            in each HTTP request.  The supplied header is sent
wolffd@0:            as-is, which means it must contain name and value
wolffd@0:            separated by colon, and must not contain newlines.
wolffd@0: 
wolffd@0:            You may define more than one additional header by
wolffd@0:            specifying --header more than once.
wolffd@0: 
wolffd@0:                    wget --header='Accept-Charset: iso-8859-2' \
wolffd@0:                         --header='Accept-Language: hr'        \
wolffd@0:                           http://fly.srk.fer.hr/
wolffd@0: 
wolffd@0:            Specification of an empty string as the header value
wolffd@0:            will clear all previous user-defined headers.
wolffd@0: 
wolffd@0:            As of Wget 1.10, this option can be used to override
wolffd@0:            headers otherwise generated automatically.  This
wolffd@0:            example instructs Wget to connect to localhost, but
wolffd@0:            to specify foo.bar in the "Host" header:
wolffd@0: 
wolffd@0:                    wget --header="Host: foo.bar" http://localhost/
wolffd@0: 
wolffd@0:            In versions of Wget prior to 1.10 such use of
wolffd@0:            --header caused sending of duplicate headers.
wolffd@0: 
wolffd@0:        --max-redirect=number
wolffd@0:            Specifies the maximum number of redirections to fol-
wolffd@0:            low for a resource.  The default is 20, which is
wolffd@0:            usually far more than necessary. However, on those
wolffd@0:            occasions where you want to allow more (or fewer),
wolffd@0:            this is the option to use.
wolffd@0: 
wolffd@0:        --proxy-user=user
wolffd@0:        --proxy-password=password
wolffd@0:            Specify the username user and password password for
wolffd@0:            authentication on a proxy server.  Wget will encode
wolffd@0:            them using the "basic" authentication scheme.
wolffd@0: 
wolffd@0:            Security considerations similar to those with
wolffd@0:            --http-password pertain here as well.
wolffd@0: 
wolffd@0:        --referer=url
wolffd@0:            Include `Referer: url' header in HTTP request.  Use-
wolffd@0:            ful for retrieving documents with server-side pro-
wolffd@0:            cessing that assume they are always being retrieved
wolffd@0:            by interactive web browsers and only come out prop-
wolffd@0:            erly when Referer is set to one of the pages that
wolffd@0:            point to them.
wolffd@0: 
wolffd@0:        --save-headers
wolffd@0:            Save the headers sent by the HTTP server to the
wolffd@0:            file, preceding the actual contents, with an empty
wolffd@0:            line as the separator.
wolffd@0: 
wolffd@0:        -U agent-string
wolffd@0:        --user-agent=agent-string
wolffd@0:            Identify as agent-string to the HTTP server.
wolffd@0: 
wolffd@0:            The HTTP protocol allows the clients to identify
wolffd@0:            themselves using a "User-Agent" header field.  This
wolffd@0:            enables distinguishing the WWW software, usually for
wolffd@0:            statistical purposes or for tracing of protocol vio-
wolffd@0:            lations.  Wget normally identifies as Wget/version,
wolffd@0:            version being the current version number of Wget.
wolffd@0: 
wolffd@0:            However, some sites have been known to impose the
wolffd@0:            policy of tailoring the output according to the
wolffd@0:            "User-Agent"-supplied information.  While this is
wolffd@0:            not such a bad idea in theory, it has been abused by
wolffd@0:            servers denying information to clients other than
wolffd@0:            (historically) Netscape or, more frequently,
wolffd@0:            Microsoft Internet Explorer.  This option allows you
wolffd@0:            to change the "User-Agent" line issued by Wget.  Use
wolffd@0:            of this option is discouraged, unless you really
wolffd@0:            know what you are doing.
wolffd@0: 
wolffd@0:            Specifying empty user agent with --user-agent=""
wolffd@0:            instructs Wget not to send the "User-Agent" header
wolffd@0:            in HTTP requests.
wolffd@0: 
wolffd@0:        --post-data=string
wolffd@0:        --post-file=file
wolffd@0:            Use POST as the method for all HTTP requests and
wolffd@0:            send the specified data in the request body.
wolffd@0:            "--post-data" sends string as data, whereas
wolffd@0:            "--post-file" sends the contents of file.  Other
wolffd@0:            than that, they work in exactly the same way.
wolffd@0: 
wolffd@0:            Please be aware that Wget needs to know the size of
wolffd@0:            the POST data in advance.  Therefore the argument to
wolffd@0:            "--post-file" must be a regular file; specifying a
wolffd@0:            FIFO or something like /dev/stdin won't work.  It's
wolffd@0:            not quite clear how to work around this limitation
wolffd@0:            inherent in HTTP/1.0.  Although HTTP/1.1 introduces
wolffd@0:            chunked transfer that doesn't require knowing the
wolffd@0:            request length in advance, a client can't use chun-
wolffd@0:            ked unless it knows it's talking to an HTTP/1.1
wolffd@0:            server.  And it can't know that until it receives a
wolffd@0:            response, which in turn requires the request to have
wolffd@0:            been completed -- a chicken-and-egg problem.
wolffd@0: 
wolffd@0:            Note: if Wget is redirected after the POST request
wolffd@0:            is completed, it will not send the POST data to the
wolffd@0:            redirected URL.  This is because URLs that process
wolffd@0:            POST often respond with a redirection to a regular
wolffd@0:            page, which does not desire or accept POST.  It is
wolffd@0:            not completely clear that this behavior is optimal;
wolffd@0:            if it doesn't work out, it might be changed in the
wolffd@0:            future.
wolffd@0: 
wolffd@0:            This example shows how to log to a server using POST
wolffd@0:            and then proceed to download the desired pages, pre-
wolffd@0:            sumably only accessible to authorized users:
wolffd@0: 
wolffd@0:                    # Log in to the server.  This can be done only once.
wolffd@0:                    wget --save-cookies cookies.txt \
wolffd@0:                         --post-data 'user=foo&password=bar' \
wolffd@0:                         http://server.com/auth.php
wolffd@0: 
wolffd@0:                    # Now grab the page or pages we care about.
wolffd@0:                    wget --load-cookies cookies.txt \
wolffd@0:                         -p http://server.com/interesting/article.php
wolffd@0: 
wolffd@0:            If the server is using session cookies to track user
wolffd@0:            authentication, the above will not work because
wolffd@0:            --save-cookies will not save them (and neither will
wolffd@0:            browsers) and the cookies.txt file will be empty.
wolffd@0:            In that case use --keep-session-cookies along with
wolffd@0:            --save-cookies to force saving of session cookies.
wolffd@0: 
wolffd@0:        --content-disposition
wolffd@0:            If this is set to on, experimental (not fully-func-
wolffd@0:            tional) support for "Content-Disposition" headers is
wolffd@0:            enabled. This can currently result in extra round-
wolffd@0:            trips to the server for a "HEAD" request, and is
wolffd@0:            known to suffer from a few bugs, which is why it is
wolffd@0:            not currently enabled by default.
wolffd@0: 
wolffd@0:            This option is useful for some file-downloading CGI
wolffd@0:            programs that use "Content-Disposition" headers to
wolffd@0:            describe what the name of a downloaded file should
wolffd@0:            be.
wolffd@0: 
wolffd@0:        --auth-no-challenge
wolffd@0:            If this option is given, Wget will send Basic HTTP
wolffd@0:            authentication information (plaintext username and
wolffd@0:            password) for all requests, just like Wget 1.10.2
wolffd@0:            and prior did by default.
wolffd@0: 
wolffd@0:            Use of this option is not recommended, and is
wolffd@0:            intended only to support some few obscure servers,
wolffd@0:            which never send HTTP authentication challenges, but
wolffd@0:            accept unsolicited auth info, say, in addition to
wolffd@0:            form-based authentication.
wolffd@0: 
wolffd@0:        HTTPS (SSL/TLS) Options
wolffd@0: 
wolffd@0:        To support encrypted HTTP (HTTPS) downloads, Wget must
wolffd@0:        be compiled with an external SSL library, currently
wolffd@0:        OpenSSL.  If Wget is compiled without SSL support, none
wolffd@0:        of these options are available.
wolffd@0: 
wolffd@0:        --secure-protocol=protocol
wolffd@0:            Choose the secure protocol to be used.  Legal values
wolffd@0:            are auto, SSLv2, SSLv3, and TLSv1.  If auto is used,
wolffd@0:            the SSL library is given the liberty of choosing the
wolffd@0:            appropriate protocol automatically, which is
wolffd@0:            achieved by sending an SSLv2 greeting and announcing
wolffd@0:            support for SSLv3 and TLSv1.  This is the default.
wolffd@0: 
wolffd@0:            Specifying SSLv2, SSLv3, or TLSv1 forces the use of
wolffd@0:            the corresponding protocol.  This is useful when
wolffd@0:            talking to old and buggy SSL server implementations
wolffd@0:            that make it hard for OpenSSL to choose the correct
wolffd@0:            protocol version.  Fortunately, such servers are
wolffd@0:            quite rare.
wolffd@0: 
wolffd@0:        --no-check-certificate
wolffd@0:            Don't check the server certificate against the
wolffd@0:            available certificate authorities.  Also don't
wolffd@0:            require the URL host name to match the common name
wolffd@0:            presented by the certificate.
wolffd@0: 
wolffd@0:            As of Wget 1.10, the default is to verify the
wolffd@0:            server's certificate against the recognized certifi-
wolffd@0:            cate authorities, breaking the SSL handshake and
wolffd@0:            aborting the download if the verification fails.
wolffd@0:            Although this provides more secure downloads, it
wolffd@0:            does break interoperability with some sites that
wolffd@0:            worked with previous Wget versions, particularly
wolffd@0:            those using self-signed, expired, or otherwise
wolffd@0:            invalid certificates.  This option forces an "inse-
wolffd@0:            cure" mode of operation that turns the certificate
wolffd@0:            verification errors into warnings and allows you to
wolffd@0:            proceed.
wolffd@0: 
wolffd@0:            If you encounter "certificate verification" errors
wolffd@0:            or ones saying that "common name doesn't match
wolffd@0:            requested host name", you can use this option to
wolffd@0:            bypass the verification and proceed with the down-
wolffd@0:            load.  Only use this option if you are otherwise
wolffd@0:            convinced of the site's authenticity, or if you
wolffd@0:            really don't care about the validity of its certifi-
wolffd@0:            cate.  It is almost always a bad idea not to check
wolffd@0:            the certificates when transmitting confidential or
wolffd@0:            important data.
wolffd@0: 
wolffd@0:        --certificate=file
wolffd@0:            Use the client certificate stored in file.  This is
wolffd@0:            needed for servers that are configured to require
wolffd@0:            certificates from the clients that connect to them.
wolffd@0:            Normally a certificate is not required and this
wolffd@0:            switch is optional.
wolffd@0: 
wolffd@0:        --certificate-type=type
wolffd@0:            Specify the type of the client certificate.  Legal
wolffd@0:            values are PEM (assumed by default) and DER, also
wolffd@0:            known as ASN1.
wolffd@0: 
wolffd@0:        --private-key=file
wolffd@0:            Read the private key from file.  This allows you to
wolffd@0:            provide the private key in a file separate from the
wolffd@0:            certificate.
wolffd@0: 
wolffd@0:        --private-key-type=type
wolffd@0:            Specify the type of the private key.  Accepted val-
wolffd@0:            ues are PEM (the default) and DER.
wolffd@0: 
wolffd@0:        --ca-certificate=file
wolffd@0:            Use file as the file with the bundle of certificate
wolffd@0:            authorities ("CA") to verify the peers.  The cer-
wolffd@0:            tificates must be in PEM format.
wolffd@0: 
wolffd@0:            Without this option Wget looks for CA certificates
wolffd@0:            at the system-specified locations, chosen at OpenSSL
wolffd@0:            installation time.
wolffd@0: 
wolffd@0:        --ca-directory=directory
wolffd@0:            Specifies directory containing CA certificates in
wolffd@0:            PEM format.  Each file contains one CA certificate,
wolffd@0:            and the file name is based on a hash value derived
wolffd@0:            from the certificate.  This is achieved by process-
wolffd@0:            ing a certificate directory with the "c_rehash"
wolffd@0:            utility supplied with OpenSSL.  Using --ca-directory
wolffd@0:            is more efficient than --ca-certificate when many
wolffd@0:            certificates are installed because it allows Wget to
wolffd@0:            fetch certificates on demand.
wolffd@0: 
wolffd@0:            Without this option Wget looks for CA certificates
wolffd@0:            at the system-specified locations, chosen at OpenSSL
wolffd@0:            installation time.
wolffd@0: 
wolffd@0:        --random-file=file
wolffd@0:            Use file as the source of random data for seeding
wolffd@0:            the pseudo-random number generator on systems with-
wolffd@0:            out /dev/random.
wolffd@0: 
wolffd@0:            On such systems the SSL library needs an external
wolffd@0:            source of randomness to initialize.  Randomness may
wolffd@0:            be provided by EGD (see --egd-file below) or read
wolffd@0:            from an external source specified by the user.  If
wolffd@0:            this option is not specified, Wget looks for random
wolffd@0:            data in $RANDFILE or, if that is unset, in
wolffd@0:            $HOME/.rnd.  If none of those are available, it is
wolffd@0:            likely that SSL encryption will not be usable.
wolffd@0: 
wolffd@0:            If you're getting the "Could not seed OpenSSL PRNG;
wolffd@0:            disabling SSL."  error, you should provide random
wolffd@0:            data using some of the methods described above.
wolffd@0: 
wolffd@0:        --egd-file=file
wolffd@0:            Use file as the EGD socket.  EGD stands for Entropy
wolffd@0:            Gathering Daemon, a user-space program that collects
wolffd@0:            data from various unpredictable system sources and
wolffd@0:            makes it available to other programs that might need
wolffd@0:            it.  Encryption software, such as the SSL library,
wolffd@0:            needs sources of non-repeating randomness to seed
wolffd@0:            the random number generator used to produce crypto-
wolffd@0:            graphically strong keys.
wolffd@0: 
wolffd@0:            OpenSSL allows the user to specify his own source of
wolffd@0:            entropy using the "RAND_FILE" environment variable.
wolffd@0:            If this variable is unset, or if the specified file
wolffd@0:            does not produce enough randomness, OpenSSL will
wolffd@0:            read random data from EGD socket specified using
wolffd@0:            this option.
wolffd@0: 
wolffd@0:            If this option is not specified (and the equivalent
wolffd@0:            startup command is not used), EGD is never con-
wolffd@0:            tacted.  EGD is not needed on modern Unix systems
wolffd@0:            that support /dev/random.
wolffd@0: 
wolffd@0:        FTP Options
wolffd@0: 
wolffd@0: 
wolffd@0:        --ftp-user=user
wolffd@0:        --ftp-password=password
wolffd@0:            Specify the username user and password password on
wolffd@0:            an FTP server.  Without this, or the corresponding
wolffd@0:            startup option, the password defaults to -wget@,
wolffd@0:            normally used for anonymous FTP.
wolffd@0: 
wolffd@0:            Another way to specify username and password is in
wolffd@0:            the URL itself.  Either method reveals your password
wolffd@0:            to anyone who bothers to run "ps".  To prevent the
wolffd@0:            passwords from being seen, store them in .wgetrc or
wolffd@0:            .netrc, and make sure to protect those files from
wolffd@0:            other users with "chmod".  If the passwords are
wolffd@0:            really important, do not leave them lying in those
wolffd@0:            files either---edit the files and delete them after
wolffd@0:            Wget has started the download.
wolffd@0: 
wolffd@0:        --no-remove-listing
wolffd@0:            Don't remove the temporary .listing files generated
wolffd@0:            by FTP retrievals.  Normally, these files contain
wolffd@0:            the raw directory listings received from FTP
wolffd@0:            servers.  Not removing them can be useful for debug-
wolffd@0:            ging purposes, or when you want to be able to easily
wolffd@0:            check on the contents of remote server directories
wolffd@0:            (e.g. to verify that a mirror you're running is com-
wolffd@0:            plete).
wolffd@0: 
wolffd@0:            Note that even though Wget writes to a known file-
wolffd@0:            name for this file, this is not a security hole in
wolffd@0:            the scenario of a user making .listing a symbolic
wolffd@0:            link to /etc/passwd or something and asking "root"
wolffd@0:            to run Wget in his or her directory.  Depending on
wolffd@0:            the options used, either Wget will refuse to write
wolffd@0:            to .listing, making the globbing/recur-
wolffd@0:            sion/time-stamping operation fail, or the symbolic
wolffd@0:            link will be deleted and replaced with the actual
wolffd@0:            .listing file, or the listing will be written to a
wolffd@0:            .listing.number file.
wolffd@0: 
wolffd@0:            Even though this situation isn't a problem, though,
wolffd@0:            "root" should never run Wget in a non-trusted user's
wolffd@0:            directory.  A user could do something as simple as
wolffd@0:            linking index.html to /etc/passwd and asking "root"
wolffd@0:            to run Wget with -N or -r so the file will be over-
wolffd@0:            written.
wolffd@0: 
wolffd@0:        --no-glob
wolffd@0:            Turn off FTP globbing.  Globbing refers to the use
wolffd@0:            of shell-like special characters (wildcards), like
wolffd@0:            *, ?, [ and ] to retrieve more than one file from
wolffd@0:            the same directory at once, like:
wolffd@0: 
wolffd@0:                    wget ftp://gnjilux.srk.fer.hr/*.msg
wolffd@0: 
wolffd@0:            By default, globbing will be turned on if the URL
wolffd@0:            contains a globbing character.  This option may be
wolffd@0:            used to turn globbing on or off permanently.
wolffd@0: 
wolffd@0:            You may have to quote the URL to protect it from
wolffd@0:            being expanded by your shell.  Globbing makes Wget
wolffd@0:            look for a directory listing, which is system-spe-
wolffd@0:            cific.  This is why it currently works only with
wolffd@0:            Unix FTP servers (and the ones emulating Unix "ls"
wolffd@0:            output).
wolffd@0: 
wolffd@0:        --no-passive-ftp
wolffd@0:            Disable the use of the passive FTP transfer mode.
wolffd@0:            Passive FTP mandates that the client connect to the
wolffd@0:            server to establish the data connection rather than
wolffd@0:            the other way around.
wolffd@0: 
wolffd@0:            If the machine is connected to the Internet
wolffd@0:            directly, both passive and active FTP should work
wolffd@0:            equally well.  Behind most firewall and NAT configu-
wolffd@0:            rations passive FTP has a better chance of working.
wolffd@0:            However, in some rare firewall configurations,
wolffd@0:            active FTP actually works when passive FTP doesn't.
wolffd@0:            If you suspect this to be the case, use this option,
wolffd@0:            or set "passive_ftp=off" in your init file.
wolffd@0: 
wolffd@0:        --retr-symlinks
wolffd@0:            Usually, when retrieving FTP directories recursively
wolffd@0:            and a symbolic link is encountered, the linked-to
wolffd@0:            file is not downloaded.  Instead, a matching sym-
wolffd@0:            bolic link is created on the local filesystem.  The
wolffd@0:            pointed-to file will not be downloaded unless this
wolffd@0:            recursive retrieval would have encountered it sepa-
wolffd@0:            rately and downloaded it anyway.
wolffd@0: 
wolffd@0:            When --retr-symlinks is specified, however, symbolic
wolffd@0:            links are traversed and the pointed-to files are
wolffd@0:            retrieved.  At this time, this option does not cause
wolffd@0:            Wget to traverse symlinks to directories and recurse
wolffd@0:            through them, but in the future it should be
wolffd@0:            enhanced to do this.
wolffd@0: 
wolffd@0:            Note that when retrieving a file (not a directory)
wolffd@0:            because it was specified on the command-line, rather
wolffd@0:            than because it was recursed to, this option has no
wolffd@0:            effect.  Symbolic links are always traversed in this
wolffd@0:            case.
wolffd@0: 
wolffd@0:        --no-http-keep-alive
wolffd@0:            Turn off the "keep-alive" feature for HTTP down-
wolffd@0:            loads.  Normally, Wget asks the server to keep the
wolffd@0:            connection open so that, when you download more than
wolffd@0:            one document from the same server, they get trans-
wolffd@0:            ferred over the same TCP connection.  This saves
wolffd@0:            time and at the same time reduces the load on the
wolffd@0:            server.
wolffd@0: 
wolffd@0:            This option is useful when, for some reason, persis-
wolffd@0:            tent (keep-alive) connections don't work for you,
wolffd@0:            for example due to a server bug or due to the
wolffd@0:            inability of server-side scripts to cope with the
wolffd@0:            connections.
wolffd@0: 
wolffd@0:        Recursive Retrieval Options
wolffd@0: 
wolffd@0: 
wolffd@0:        -r
wolffd@0:        --recursive
wolffd@0:            Turn on recursive retrieving.
wolffd@0: 
wolffd@0:        -l depth
wolffd@0:        --level=depth
wolffd@0:            Specify recursion maximum depth level depth.  The
wolffd@0:            default maximum depth is 5.
wolffd@0: 
wolffd@0:        --delete-after
wolffd@0:            This option tells Wget to delete every single file
wolffd@0:            it downloads, after having done so.  It is useful
wolffd@0:            for pre-fetching popular pages through a proxy,
wolffd@0:            e.g.:
wolffd@0: 
wolffd@0:                    wget -r -nd --delete-after http://whatever.com/~popular/page/
wolffd@0: 
wolffd@0:            The -r option is to retrieve recursively, and -nd to
wolffd@0:            not create directories.
wolffd@0: 
wolffd@0:            Note that --delete-after deletes files on the local
wolffd@0:            machine.  It does not issue the DELE command to
wolffd@0:            remote FTP sites, for instance.  Also note that when
wolffd@0:            --delete-after is specified, --convert-links is
wolffd@0:            ignored, so .orig files are simply not created in
wolffd@0:            the first place.
wolffd@0: 
wolffd@0:        -k
wolffd@0:        --convert-links
wolffd@0:            After the download is complete, convert the links in
wolffd@0:            the document to make them suitable for local view-
wolffd@0:            ing.  This affects not only the visible hyperlinks,
wolffd@0:            but any part of the document that links to external
wolffd@0:            content, such as embedded images, links to style
wolffd@0:            sheets, hyperlinks to non-HTML content, etc.
wolffd@0: 
wolffd@0:            Each link will be changed in one of the two ways:
wolffd@0: 
wolffd@0:            *   The links to files that have been downloaded by
wolffd@0:                Wget will be changed to refer to the file they
wolffd@0:                point to as a relative link.
wolffd@0: 
wolffd@0:                Example: if the downloaded file /foo/doc.html
wolffd@0:                links to /bar/img.gif, also downloaded, then the
wolffd@0:                link in doc.html will be modified to point to
wolffd@0:                ../bar/img.gif.  This kind of transformation
wolffd@0:                works reliably for arbitrary combinations of
wolffd@0:                directories.
wolffd@0: 
wolffd@0:            *   The links to files that have not been downloaded
wolffd@0:                by Wget will be changed to include host name and
wolffd@0:                absolute path of the location they point to.
wolffd@0: 
wolffd@0:                Example: if the downloaded file /foo/doc.html
wolffd@0:                links to /bar/img.gif (or to ../bar/img.gif),
wolffd@0:                then the link in doc.html will be modified to
wolffd@0:                point to http://hostname/bar/img.gif.
wolffd@0: 
wolffd@0:            Because of this, local browsing works reliably: if a
wolffd@0:            linked file was downloaded, the link will refer to
wolffd@0:            its local name; if it was not downloaded, the link
wolffd@0:            will refer to its full Internet address rather than
wolffd@0:            presenting a broken link.  The fact that the former
wolffd@0:            links are converted to relative links ensures that
wolffd@0:            you can move the downloaded hierarchy to another
wolffd@0:            directory.
wolffd@0: 
wolffd@0:            Note that only at the end of the download can Wget
wolffd@0:            know which links have been downloaded.  Because of
wolffd@0:            that, the work done by -k will be performed at the
wolffd@0:            end of all the downloads.
wolffd@0: 
wolffd@0:        -K
wolffd@0:        --backup-converted
wolffd@0:            When converting a file, back up the original version
wolffd@0:            with a .orig suffix.  Affects the behavior of -N.
wolffd@0: 
wolffd@0:        -m
wolffd@0:        --mirror
wolffd@0:            Turn on options suitable for mirroring.  This option
wolffd@0:            turns on recursion and time-stamping, sets infinite
wolffd@0:            recursion depth and keeps FTP directory listings.
wolffd@0:            It is currently equivalent to -r -N -l inf
wolffd@0:            --no-remove-listing.
wolffd@0: 
wolffd@0:        -p
wolffd@0:        --page-requisites
wolffd@0:            This option causes Wget to download all the files
wolffd@0:            that are necessary to properly display a given HTML
wolffd@0:            page.  This includes such things as inlined images,
wolffd@0:            sounds, and referenced stylesheets.
wolffd@0: 
wolffd@0:            Ordinarily, when downloading a single HTML page, any
wolffd@0:            requisite documents that may be needed to display it
wolffd@0:            properly are not downloaded.  Using -r together with
wolffd@0:            -l can help, but since Wget does not ordinarily dis-
wolffd@0:            tinguish between external and inlined documents, one
wolffd@0:            is generally left with "leaf documents" that are
wolffd@0:            missing their requisites.
wolffd@0: 
wolffd@0:            For instance, say document 1.html contains an
wolffd@0:            "<IMG>" tag referencing 1.gif and an "<A>" tag
wolffd@0:            pointing to external document 2.html.  Say that
wolffd@0:            2.html is similar but that its image is 2.gif and it
wolffd@0:            links to 3.html.  Say this continues up to some
wolffd@0:            arbitrarily high number.
wolffd@0: 
wolffd@0:            If one executes the command:
wolffd@0: 
wolffd@0:                    wget -r -l 2 http://<site>/1.html
wolffd@0: 
wolffd@0:            then 1.html, 1.gif, 2.html, 2.gif, and 3.html will
wolffd@0:            be downloaded.  As you can see, 3.html is without
wolffd@0:            its requisite 3.gif because Wget is simply counting
wolffd@0:            the number of hops (up to 2) away from 1.html in
wolffd@0:            order to determine where to stop the recursion.
wolffd@0:            However, with this command:
wolffd@0: 
wolffd@0:                    wget -r -l 2 -p http://<site>/1.html
wolffd@0: 
wolffd@0:            all the above files and 3.html's requisite 3.gif
wolffd@0:            will be downloaded.  Similarly,
wolffd@0: 
wolffd@0:                    wget -r -l 1 -p http://<site>/1.html
wolffd@0: 
wolffd@0:            will cause 1.html, 1.gif, 2.html, and 2.gif to be
wolffd@0:            downloaded.  One might think that:
wolffd@0: 
wolffd@0:                    wget -r -l 0 -p http://<site>/1.html
wolffd@0: 
wolffd@0:            would download just 1.html and 1.gif, but unfortu-
wolffd@0:            nately this is not the case, because -l 0 is equiva-
wolffd@0:            lent to -l inf---that is, infinite recursion.  To
wolffd@0:            download a single HTML page (or a handful of them,
wolffd@0:            all specified on the command-line or in a -i URL
wolffd@0:            input file) and its (or their) requisites, simply
wolffd@0:            leave off -r and -l:
wolffd@0: 
wolffd@0:                    wget -p http://<site>/1.html
wolffd@0: 
wolffd@0:            Note that Wget will behave as if -r had been speci-
wolffd@0:            fied, but only that single page and its requisites
wolffd@0:            will be downloaded.  Links from that page to exter-
wolffd@0:            nal documents will not be followed.  Actually, to
wolffd@0:            download a single page and all its requisites (even
wolffd@0:            if they exist on separate websites), and make sure
wolffd@0:            the lot displays properly locally, this author likes
wolffd@0:            to use a few options in addition to -p:
wolffd@0: 
wolffd@0:                    wget -E -H -k -K -p http://<site>/<document>
wolffd@0: 
wolffd@0:            To finish off this topic, it's worth knowing that
wolffd@0:            Wget's idea of an external document link is any URL
wolffd@0:            specified in an "<A>" tag, an "<AREA>" tag, or a
wolffd@0:            "<LINK>" tag other than "<LINK REL="stylesheet">".
wolffd@0: 
wolffd@0:        --strict-comments
wolffd@0:            Turn on strict parsing of HTML comments.  The
wolffd@0:            default is to terminate comments at the first occur-
wolffd@0:            rence of -->.
wolffd@0: 
wolffd@0:            According to specifications, HTML comments are
wolffd@0:            expressed as SGML declarations.  Declaration is spe-
wolffd@0:            cial markup that begins with <! and ends with >,
wolffd@0:            such as <!DOCTYPE ...>, that may contain comments
wolffd@0:            between a pair of -- delimiters.  HTML comments are
wolffd@0:            "empty declarations", SGML declarations without any
wolffd@0:            non-comment text.  Therefore, <!--foo--> is a valid
wolffd@0:            comment, and so is <!--one-- --two-->, but
wolffd@0:            <!--1--2--> is not.
wolffd@0: 
wolffd@0:            On the other hand, most HTML writers don't perceive
wolffd@0:            comments as anything other than text delimited with
wolffd@0:            <!-- and -->, which is not quite the same.  For
wolffd@0:            example, something like <!------------> works as a
wolffd@0:            valid comment as long as the number of dashes is a
wolffd@0:            multiple of four (!).  If not, the comment techni-
wolffd@0:            cally lasts until the next --, which may be at the
wolffd@0:            other end of the document.  Because of this, many
wolffd@0:            popular browsers completely ignore the specification
wolffd@0:            and implement what users have come to expect: com-
wolffd@0:            ments delimited with <!-- and -->.
wolffd@0: 
wolffd@0:            Until version 1.9, Wget interpreted comments
wolffd@0:            strictly, which resulted in missing links in many
wolffd@0:            web pages that displayed fine in browsers, but had
wolffd@0:            the misfortune of containing non-compliant comments.
wolffd@0:            Beginning with version 1.9, Wget has joined the
wolffd@0:            ranks of clients that implements "naive" comments,
wolffd@0:            terminating each comment at the first occurrence of
wolffd@0:            -->.
wolffd@0: 
wolffd@0:            If, for whatever reason, you want strict comment
wolffd@0:            parsing, use this option to turn it on.
wolffd@0: 
wolffd@0:        Recursive Accept/Reject Options
wolffd@0: 
wolffd@0: 
wolffd@0:        -A acclist --accept acclist
wolffd@0:        -R rejlist --reject rejlist
wolffd@0:            Specify comma-separated lists of file name suffixes
wolffd@0:            or patterns to accept or reject. Note that if any of
wolffd@0:            the wildcard characters, *, ?, [ or ], appear in an
wolffd@0:            element of acclist or rejlist, it will be treated as
wolffd@0:            a pattern, rather than a suffix.
wolffd@0: 
wolffd@0:        -D domain-list
wolffd@0:        --domains=domain-list
wolffd@0:            Set domains to be followed.  domain-list is a comma-
wolffd@0:            separated list of domains.  Note that it does not
wolffd@0:            turn on -H.
wolffd@0: 
wolffd@0:        --exclude-domains domain-list
wolffd@0:            Specify the domains that are not to be followed..
wolffd@0: 
wolffd@0:        --follow-ftp
wolffd@0:            Follow FTP links from HTML documents.  Without this
wolffd@0:            option, Wget will ignore all the FTP links.
wolffd@0: 
wolffd@0:        --follow-tags=list
wolffd@0:            Wget has an internal table of HTML tag / attribute
wolffd@0:            pairs that it considers when looking for linked doc-
wolffd@0:            uments during a recursive retrieval.  If a user
wolffd@0:            wants only a subset of those tags to be considered,
wolffd@0:            however, he or she should be specify such tags in a
wolffd@0:            comma-separated list with this option.
wolffd@0: 
wolffd@0:        --ignore-tags=list
wolffd@0:            This is the opposite of the --follow-tags option.
wolffd@0:            To skip certain HTML tags when recursively looking
wolffd@0:            for documents to download, specify them in a comma-
wolffd@0:            separated list.
wolffd@0: 
wolffd@0:            In the past, this option was the best bet for down-
wolffd@0:            loading a single page and its requisites, using a
wolffd@0:            command-line like:
wolffd@0: 
wolffd@0:                    wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>
wolffd@0: 
wolffd@0:            However, the author of this option came across a
wolffd@0:            page with tags like "<LINK REL="home" HREF="/">" and
wolffd@0:            came to the realization that specifying tags to
wolffd@0:            ignore was not enough.  One can't just tell Wget to
wolffd@0:            ignore "<LINK>", because then stylesheets will not
wolffd@0:            be downloaded.  Now the best bet for downloading a
wolffd@0:            single page and its requisites is the dedicated
wolffd@0:            --page-requisites option.
wolffd@0: 
wolffd@0:        --ignore-case
wolffd@0:            Ignore case when matching files and directories.
wolffd@0:            This influences the behavior of -R, -A, -I, and -X
wolffd@0:            options, as well as globbing implemented when down-
wolffd@0:            loading from FTP sites.  For example, with this
wolffd@0:            option, -A *.txt will match file1.txt, but also
wolffd@0:            file2.TXT, file3.TxT, and so on.
wolffd@0: 
wolffd@0:        -H
wolffd@0:        --span-hosts
wolffd@0:            Enable spanning across hosts when doing recursive
wolffd@0:            retrieving.
wolffd@0: 
wolffd@0:        -L
wolffd@0:        --relative
wolffd@0:            Follow relative links only.  Useful for retrieving a
wolffd@0:            specific home page without any distractions, not
wolffd@0:            even those from the same hosts.
wolffd@0: 
wolffd@0:        -I list
wolffd@0:        --include-directories=list
wolffd@0:            Specify a comma-separated list of directories you
wolffd@0:            wish to follow when downloading.  Elements of list
wolffd@0:            may contain wildcards.
wolffd@0: 
wolffd@0:        -X list
wolffd@0:        --exclude-directories=list
wolffd@0:            Specify a comma-separated list of directories you
wolffd@0:            wish to exclude from download.  Elements of list may
wolffd@0:            contain wildcards.
wolffd@0: 
wolffd@0:        -np
wolffd@0:        --no-parent
wolffd@0:            Do not ever ascend to the parent directory when
wolffd@0:            retrieving recursively.  This is a useful option,
wolffd@0:            since it guarantees that only the files below a cer-
wolffd@0:            tain hierarchy will be downloaded.
wolffd@0: 
wolffd@0: FILES
wolffd@0:        /usr/local/etc/wgetrc
wolffd@0:            Default location of the global startup file.
wolffd@0: 
wolffd@0:        .wgetrc
wolffd@0:            User startup file.
wolffd@0: 
wolffd@0: BUGS
wolffd@0:        You are welcome to submit bug reports via the GNU Wget
wolffd@0:        bug tracker (see <http://wget.addictivecode.org/Bug-
wolffd@0:        Tracker>).
wolffd@0: 
wolffd@0:        Before actually submitting a bug report, please try to
wolffd@0:        follow a few simple guidelines.
wolffd@0: 
wolffd@0:        1.  Please try to ascertain that the behavior you see
wolffd@0:            really is a bug.  If Wget crashes, it's a bug.  If
wolffd@0:            Wget does not behave as documented, it's a bug.  If
wolffd@0:            things work strange, but you are not sure about the
wolffd@0:            way they are supposed to work, it might well be a
wolffd@0:            bug, but you might want to double-check the documen-
wolffd@0:            tation and the mailing lists.
wolffd@0: 
wolffd@0:        2.  Try to repeat the bug in as simple circumstances as
wolffd@0:            possible.  E.g. if Wget crashes while downloading
wolffd@0:            wget -rl0 -kKE -t5 --no-proxy http://yoyodyne.com -o
wolffd@0:            /tmp/log, you should try to see if the crash is
wolffd@0:            repeatable, and if will occur with a simpler set of
wolffd@0:            options.  You might even try to start the download
wolffd@0:            at the page where the crash occurred to see if that
wolffd@0:            page somehow triggered the crash.
wolffd@0: 
wolffd@0:            Also, while I will probably be interested to know
wolffd@0:            the contents of your .wgetrc file, just dumping it
wolffd@0:            into the debug message is probably a bad idea.
wolffd@0:            Instead, you should first try to see if the bug
wolffd@0:            repeats with .wgetrc moved out of the way.  Only if
wolffd@0:            it turns out that .wgetrc settings affect the bug,
wolffd@0:            mail me the relevant parts of the file.
wolffd@0: 
wolffd@0:        3.  Please start Wget with -d option and send us the
wolffd@0:            resulting output (or relevant parts thereof).  If
wolffd@0:            Wget was compiled without debug support, recompile
wolffd@0:            it---it is much easier to trace bugs with debug sup-
wolffd@0:            port on.
wolffd@0: 
wolffd@0:            Note: please make sure to remove any potentially
wolffd@0:            sensitive information from the debug log before
wolffd@0:            sending it to the bug address.  The "-d" won't go
wolffd@0:            out of its way to collect sensitive information, but
wolffd@0:            the log will contain a fairly complete transcript of
wolffd@0:            Wget's communication with the server, which may
wolffd@0:            include passwords and pieces of downloaded data.
wolffd@0:            Since the bug address is publically archived, you
wolffd@0:            may assume that all bug reports are visible to the
wolffd@0:            public.
wolffd@0: 
wolffd@0:        4.  If Wget has crashed, try to run it in a debugger,
wolffd@0:            e.g. "gdb `which wget` core" and type "where" to get
wolffd@0:            the backtrace.  This may not work if the system
wolffd@0:            administrator has disabled core files, but it is
wolffd@0:            safe to try.
wolffd@0: 
wolffd@0: SEE ALSO
wolffd@0:        This is not the complete manual for GNU Wget.  For more
wolffd@0:        complete information, including more detailed explana-
wolffd@0:        tions of some of the options, and a number of commands
wolffd@0:        available for use with .wgetrc files and the -e option,
wolffd@0:        see the GNU Info entry for wget.
wolffd@0: 
wolffd@0: AUTHOR
wolffd@0:        Originally written by Hrvoje Niksic
wolffd@0:        <hniksic@xemacs.org>.  Currently maintained by Micah
wolffd@0:        Cowan <micah@cowan.name>.
wolffd@0: 
wolffd@0: COPYRIGHT
wolffd@0:        Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002,
wolffd@0:        2003, 2004, 2005, 2006, 2007, 2008 Free Software Founda-
wolffd@0:        tion, Inc.
wolffd@0: 
wolffd@0:        Permission is granted to copy, distribute and/or modify
wolffd@0:        this document under the terms of the GNU Free Documenta-
wolffd@0:        tion License, Version 1.2 or any later version published
wolffd@0:        by the Free Software Foundation; with no Invariant Sec-
wolffd@0:        tions, no Front-Cover Texts, and no Back-Cover Texts.  A
wolffd@0:        copy of the license is included in the section entitled
wolffd@0:        "GNU Free Documentation License".
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: GNU Wget 1.11.4            2008-06-29                   WGET(1)