largefiles

Contents

track large binary files

Description

Large binary files tend to be not very compressible, not very diffable, and not at all mergeable. Such files are not handled efficiently by Mercurial's storage format (revlog), which is based on compressed binary deltas; storing large binary files as regular Mercurial files wastes bandwidth and disk space and increases Mercurial's memory usage. The largefiles extension addresses these problems by adding a centralized client-server layer on top of Mercurial: largefiles live in a central store out on the network somewhere, and you only fetch the revisions that you need when you need them.

largefiles works by maintaining a "standin file" in .hglf/ for each largefile. The standins are small (41 bytes: an SHA-1 hash plus newline) and are tracked by Mercurial. Largefile revisions are identified by the SHA-1 hash of their contents, which is written to the standin. largefiles uses that revision ID to get/put largefile revisions from/to the central store. This saves both disk space and bandwidth, since you don't need to retrieve all historical revisions of large files when you clone or pull.

To start a new repository or add new large binary files, just add --large to your hg add command. For example:

$ dd if=/dev/urandom of=randomdata count=2000
$ hg add --large randomdata
$ hg commit -m "add randomdata as a largefile"

When you push a changeset that adds/modifies largefiles to a remote repository, its largefile revisions will be uploaded along with it. Note that the remote Mercurial must also have the largefiles extension enabled for this to work.

When you pull a changeset that affects largefiles from a remote repository, the largefiles for the changeset will by default not be pulled down. However, when you update to such a revision, any largefiles needed by that revision are downloaded and cached (if they have never been downloaded before). One way to pull largefiles when pulling is thus to use --update, which will update your working copy to the latest pulled revision (and thereby downloading any new largefiles).

If you want to pull largefiles you don't need for update yet, then you can use pull with the --lfrev option or the hg lfpull command.

If you know you are pulling from a non-default location and want to download all the largefiles that correspond to the new changesets at the same time, then you can pull with --lfrev "pulled()".

If you just want to ensure that you will have the largefiles needed to merge or rebase with new heads that you are pulling, then you can pull with --lfrev "head(pulled())" flag to pre-emptively download any largefiles that are new in the heads you are pulling.

Keep in mind that network access may now be required to update to changesets that you have not previously updated to. The nature of the largefiles extension means that updating is no longer guaranteed to be a local-only operation.

If you already have large files tracked by Mercurial without the largefiles extension, you will need to convert your repository in order to benefit from largefiles. This is done with the hg lfconvert command:

$ hg lfconvert --size 10 oldrepo newrepo

In repositories that already have largefiles in them, any new file over 10MB will automatically be added as a largefile. To change this threshold, set largefiles.minsize in your Mercurial config file to the minimum size in megabytes to track as a largefile, or use the --lfsize option to the add command (also in megabytes):

[largefiles]
minsize = 2

$ hg add --lfsize 2

The largefiles.patterns config option allows you to specify a list of filename patterns (see hg help patterns) that should always be tracked as largefiles:

[largefiles]
patterns =
  *.jpg
  re:.*\.(png|bmp)$
  library.zip
  content/audio/*

Files that match one of these patterns will be added as largefiles regardless of their size.

The largefiles.minsize and largefiles.patterns config options will be ignored for any repositories not already containing a largefile. To add the first largefile to a repository, you must explicitly do so with the --large flag passed to the hg add command.

Commands

Uncategorized commands

lfconvert

convert a normal repository to a largefiles repository:

hg lfconvert SOURCE DEST [FILE ...]

Convert repository SOURCE to a new repository DEST, identical to SOURCE except that certain files will be converted as largefiles: specifically, any file that matches any PATTERN or whose size is above the minimum size threshold is converted as a largefile. The size used to determine whether or not to track a file as a largefile is the size of the first version of the file. The minimum size can be specified either with --size or in configuration as largefiles.size.

After running this command you will need to make sure that largefiles is enabled anywhere you intend to push the new repository.

Use --to-normal to convert largefiles back to normal files; after this, the DEST repository can be used without largefiles at all.

Options:

-s, --size <SIZE>
 minimum size (MB) for files to be converted as largefiles
--to-normal convert from a largefiles repo to a normal repo

lfpull

pull largefiles for the specified revisions from the specified source:

hg lfpull -r REV... [-e CMD] [--remotecmd CMD] [SOURCE]

Pull largefiles that are referenced from local changesets but missing locally, pulling from a remote repository to the local cache.

If SOURCE is omitted, the 'default' path will be used. See hg help urls for more information.

Some examples:

  • pull largefiles for all branch heads:

    hg lfpull -r "head() and not closed()"
    
  • pull largefiles on the default branch:

    hg lfpull -r "branch(default)"
    

Options:

-r, --rev <VALUE[+]>
 pull largefiles for these revisions
-e, --ssh <CMD>
 specify ssh command to use
--remotecmd <CMD>
 specify hg command to run on the remote side
--insecure do not verify server certificate (ignoring web.cacerts config)

[+] marked option can be specified multiple times