cp - Copy files and objects | Cloud Storage

Synopsis

gsutil cp [OPTION]... src_url dst_urlgsutil cp [OPTION]... src_url... dst_urlgsutil cp [OPTION]... -I dst_url

Description

The gsutil cp command allows you to copy data between your local filesystem and the cloud, within the cloud, and betweencloud storage providers. For example, to upload all text files from thelocal directory to a bucket, you can run:

gsutil cp *.txt gs://my-bucket

You can also download data from a bucket. The following command downloadsall text files from the top-level of a bucket to your current directory:

gsutil cp gs://my-bucket/*.txt .

You can use the -n option to prevent overwriting the content ofexisting files. The following example downloads text files from a bucketwithout clobbering the data in your directory:

gsutil cp -n gs://my-bucket/*.txt .

Use the -r option to copy an entire directory tree.For example, to upload the directory tree dir:

gsutil cp -r dir gs://my-bucket

If you have a large number of files to transfer, you can perform a parallelmulti-threaded/multi-processing copy using thetop-level gsutil -m option:

gsutil -m cp -r dir gs://my-bucket

You can use the -I option with stdin to specify a list of URLs tocopy, one per line. This allows you to use gsutilin a pipeline to upload or download objects as generated by a program:

cat filelist | gsutil -m cp -I gs://my-bucket

or:

cat filelist | gsutil -m cp -I ./download_dir

where the output of cat filelist is a list of files, cloud URLs, andwildcards of files and cloud URLs.

How Names Are Constructed

The gsutil cp command attempts to name objects in ways that are consistent with theLinux cp command. This means that names are constructed dependingon whether you're performing a recursive directory copy or copyingindividually-named objects, or whether you're copying to an existing ornon-existent directory.

When you perform recursive directory copies, object names are constructed tomirror the source directory structure starting at the point of recursiveprocessing. For example, if dir1/dir2 contains the file a/b/c, then thefollowing command creates the object gs://my-bucket/dir2/a/b/c:

gsutil cp -r dir1/dir2 gs://my-bucket

In contrast, copying individually-named files results in objects named bythe final path component of the source files. For example, assuming again thatdir1/dir2 contains a/b/c, the following command creates the objectgs://my-bucket/c:

gsutil cp dir1/dir2/** gs://my-bucket

Note that in the above example, the '**' wildcard matches all namesanywhere under dir. The wildcard '*' matches names just one level deep. Formore details, see URI wildcards.

The same rules apply for uploads and downloads: recursive copies of buckets andbucket subdirectories produce a mirrored filename structure, while copyingindividually or wildcard-named objects produce flatly-named files.

In addition, the resulting names depend on whether the destination subdirectoryexists. For example, if gs://my-bucket/subdir exists as a subdirectory,the following command creates the object gs://my-bucket/subdir/dir2/a/b/c:

gsutil cp -r dir1/dir2 gs://my-bucket/subdir

Copying To/From Subdirectories; Distributing Transfers Across Machines

You can use gsutil to copy to and from subdirectories by using a commandlike this:

gsutil cp -r dir gs://my-bucket/data

This causes dir and all of its files and nested subdirectories to becopied under the specified destination, resulting in objects with names likegs://my-bucket/data/dir/a/b/c. Similarly, you can download from bucketsubdirectories using the following command:

gsutil cp -r gs://my-bucket/data dir

This causes everything nested under gs://my-bucket/data to be downloadedinto dir, resulting in files with names like dir/data/a/b/c.

Copying subdirectories is useful if you want to add data to an existingbucket directory structure over time. It's also useful if you wantto parallelize uploads and downloads across multiple machines (potentiallyreducing overall transfer time compared with running gsutil -mcp on one machine). For example, if your bucket contains this structure:

gs://my-bucket/data/result_set_01/gs://my-bucket/data/result_set_02/...gs://my-bucket/data/result_set_99/

you can perform concurrent downloads across 3 machines by running thesecommands on each machine, respectively:

gsutil -m cp -r gs://my-bucket/data/result_set_[0-3]* dirgsutil -m cp -r gs://my-bucket/data/result_set_[4-6]* dirgsutil -m cp -r gs://my-bucket/data/result_set_[7-9]* dir

Note that dir could be a local directory on each machine, or adirectory mounted off of a shared file server. The performance of the latterdepends on several factors, so we recommend experimentingto find out what works best for your computing environment.

If both the source and destination URL are cloud URLs from the sameprovider, gsutil copies data "in the cloud" (without downloadingto and uploading from the machine where you run gsutil). In addition tothe performance and cost advantages of doing this, copying in the cloudpreserves metadata such as Content-Type and Cache-Control. In contrast,when you download data from the cloud, it ends up in a file withno associated metadata, unless you have some way to keepor re-create that metadata.

Copies spanning locations and/or storage classes cause data to be rewrittenin the cloud, which may take some time (but is still faster thandownloading and re-uploading). Such operations can be resumed with the samecommand if they are interrupted, so long as the command parameters areidentical.

Note that by default, the gsutil cp command does not copy the objectACL to the new object, and instead uses the default bucket ACL (seegsutil help defacl). You can override this behavior with the -poption.

When copying in the cloud, if the destination bucket has Object Versioningenabled, by default gsutil cp copies only live versions of thesource object. For example, the following command causes only the single liveversion of gs://bucket1/obj to be copied to gs://bucket2, even if thereare noncurrent versions of gs://bucket1/obj:

gsutil cp gs://bucket1/obj gs://bucket2

To also copy noncurrent versions, use the -A flag:

gsutil cp -A gs://bucket1/obj gs://bucket2

The top-level gsutil -m flag is not allowed when using the cp -A flag.

Checksum Validation

gsutil automatically performs checksum validation for copies to and from Cloud Storage. For more information, see Hashes and ETags.

Retry Handling

The cp command retries when failures occur, but if enough failures happenduring a particular copy or delete operation, or if a failure isn't retryable,the cp command skips that object and moves on. If any failures were notsuccessfully retried by the end of the copy run, the cp command reports thenumber of failures and exits with a non-zero status.

For details about gsutil's overall retry handling, see Retry strategy.

Resumable Transfers

gsutil automatically resumes interrupted downloads and interrupted resumableuploads,except when performing streaming transfers. In the case of an interrupteddownload, a partially downloaded temporary file is visible in the destinationdirectory with the suffix _.gstmp in its name. Upon completion, theoriginal file is deleted and replaced with the downloaded contents.

Resumable transfers store state information in files under~/.gsutil, named by the destination object or file.

Use the gsutil help prod command for details on using resumable transfersin production.

Streaming Transfers

Use '-' in place of src_url or dst_url to perform a streaming transfer.

Streaming uploads using the JSON API are bufferedin memory part-way back into the file and can thus sometimes resume in the eventof network or service problems.

gsutil does not support resuming streaming uploads using the XML API orresuming streaming downloads for either JSON or XML. If you have a large amountof data to transfer in these cases, we recommend that you write the data to alocal file and copy that file rather than streaming it.

Sliced Object Downloads

gsutil can automatically use ranged GET requests to perform downloads inparallel for large files being downloaded from Cloud Storage. See sliced objectdownload documentationfor a complete discussion.

Parallel Composite Uploads

gsutil can automatically useobject compositionto perform uploads in parallel for large, local files being uploaded toCloud Storage. See the parallel composite uploads documentation for acomplete discussion.

Changing Temp Directories

gsutil writes data to a temporary directory in several cases:

when compressing data to be uploaded (see the -z and -Z options)
when decompressing data being downloaded (for example, when the data hasContent-Encoding:gzip as a result of being uploadedusing gsutil cp -z or gsutil cp -Z)
when running integration tests using the gsutil test command

In these cases, it's possible the temporary file location on your system thatgsutil selects by default may not have enough space. If gsutil runs out ofspace during one of these operations (for example, raising"CommandException: Inadequate temp space available to compress <your file>"during a gsutil cp -z operation), you can change where it writes thesetemp files by setting the TMPDIR environment variable. On Linux and macOS,you can set the variable as follows:

TMPDIR=/some/directory gsutil cp ...

You can also add this line to your ~/.bashrc file and restart the shellbefore running gsutil:

export TMPDIR=/some/directory

On Windows 7, you can change the TMPDIR environment variable from Start ->Computer -> System -> Advanced System Settings -> Environment Variables.You need to reboot after making this change for it to take effect. Rebootingis not necessary after running the export command on Linux and macOS.

Synchronizing Over Os-Specific File Types (Such As Symlinks And Devices)

Please see the section about OS-specific file types in gsutil help rsync.While that section refers to the rsync command, analogouspoints apply to the cp command.

Options

-a predef_acl

Applies the specific predefined ACL to uploaded objects. See"gsutil help acls" for further details.

-A

Copy all source versions from a source bucket or folder.If not set, only the live version of each source object iscopied.

-c

If an error occurs, continue attempting to copy the remainingfiles. If any copies are unsuccessful, gsutil's exit statusis non-zero, even if this flag is set. This option isimplicitly set when running gsutil -m cp....

-D

Copy in "daisy chain" mode, which means copying between two bucketsby first downloading to the machine where gsutil is run, thenuploading to the destination bucket. The default mode is a"copy in the cloud," where data is copied between two buckets withoutuploading or downloading.

During a "copy in the cloud," a source composite object remains compositeat its destination. However, you can use "daisy chain" mode to change acomposite object into a non-composite object. For example:

gsutil cp -D gs://bucket/obj gs://bucket/obj_tmpgsutil mv gs://bucket/obj_tmp gs://bucket/obj

-e

Exclude symlinks. When specified, symbolic links are not copied.

-I

Use stdin to specify a list of files or objects to copy. You can usegsutil in a pipeline to upload or download objects as generated by a program.For example:

cat filelist | gsutil -m cp -I gs://my-bucket

where the output of cat filelist is a one-per-line list offiles, cloud URLs, and wildcards of files and cloud URLs.

-j <ext,...>

Applies gzip transport encoding to any file upload whoseextension matches the -j extension list. This is useful whenuploading files with compressible content such as .js, .css,or .html files. This also saves network bandwidth whileleaving the data uncompressed in Cloud Storage.

When you specify the -j option, files being uploaded arecompressed in-memory and on-the-wire only. Both the localfiles and Cloud Storage objects remain uncompressed. Theuploaded objects retain the Content-Type and name of theoriginal files.

Note that if you want to use the -m top-level optionto parallelize copies along with the -j/-J options, yourperformance may be bottlenecked by the"max_upload_compression_buffer_size" boto config option,which is set to 2 GiB by default. You can change thiscompression buffer size to a higher limit. For example:

gsutil -o "GSUtil:max_upload_compression_buffer_size=8G" \ -m cp -j html,txt -r /local/source/dir gs://bucket/path

-J

Applies gzip transport encoding to file uploads. This optionworks like the -j option described above, but it applies toall uploaded files, regardless of extension.

-L <file>

Outputs a manifest log file with detailed information abouteach item that was copied. This manifest contains the followinginformation for each item:

Source path.
Destination path.
Source size.
Bytes transferred.
MD5 hash.
Transfer start time and date in UTC and ISO 8601 format.
Transfer completion time and date in UTC and ISO 8601 format.
Upload id, if a resumable upload was performed.
Final result of the attempted transfer, either success or failure.
Failure details, if any.

If the log file already exists, gsutil uses the file as aninput to the copy process, and appends log items tothe existing file. Objects that are marked in theexisting log file as having been successfully copied orskipped are ignored. Objects without entries arecopied and ones previously marked as unsuccessful areretried. This option can be used in conjunction with the -c option tobuild a script that copies a large number of objects reliably,using a bash script like the following:

until gsutil cp -c -L cp.log -r ./dir gs://bucket; do sleep 1done

The -c option enables copying to continue after failuresoccur, and the -L option allows gsutil to pick up where itleft off without duplicating work. The loop continuesrunning as long as gsutil exits with a non-zero status. A non-zerostatus indicates there was at least one failure during the copyoperation.

-n

No-clobber. When specified, existing files or objects at thedestination are not replaced. Any items that are skippedby this option are reported as skipped. gsutilperforms an additional GET request to check if an itemexists before attempting to upload the data. This saves gsutilfrom retransmitting data, but the additional HTTP requests may makesmall object transfers slower and more expensive.

-p

Preserves ACLs when copying in the cloud. Notethat this option has performance and cost implications only whenusing the XML API, as the XML API requires separate HTTP calls forinteracting with ACLs. You can mitigate thisperformance issue using gsutil -m cp to perform parallelcopying. Note that this option only works if you have OWNER accessto all objects that are copied. If you want all objects in thedestination bucket to end up with the same ACL, you can avoid theseperformance issues by setting a default object ACL on that bucketinstead of using cp -p. See gsutil help defacl.

Note that it's not valid to specify both the -a and -p optionstogether.

-P

Enables POSIX attributes to be preserved when objects arecopied. gsutil cp copies fields provided by stat. These fieldsare the user ID of the owner, the groupID of the owning group, the mode or permissions of the file, andthe access and modification time of the file. For downloads, theseattributes are only set if the source objects were uploadedwith this flag enabled.

On Windows, this flag only sets and restores access time andmodification time. This is because Windows doesn't supportPOSIX uid/gid/mode.

-R, -r

The -R and -r options are synonymous. They enable directories,buckets, and bucket subdirectories to be copied recursively.If you don't use this option for an upload, gsutil copies objectsit finds and skips directories. Similarly, if you don'tspecify this option for a download, gsutil copiesobjects at the current bucket directory level and skips subdirectories.

-s <class>

Specifies the storage class of the destination object. If notspecified, the default storage class of the destination bucketis used. This option is not valid for copying to non-cloud destinations.

-U

Skips objects with unsupported object types instead of failing.Unsupported object types include Amazon S3 objects in the GLACIERstorage class.

-v

Prints the version-specific URL for each uploaded object. You canuse these URLs to safely make concurrent upload requests, becauseCloud Storage refuses to perform an update if the currentobject version doesn't match the version-specific URL. Seegeneration numbersfor more details.

-z <ext,...>

Applies gzip content-encoding to any file upload whoseextension matches the -z extension list. This is useful whenuploading files with compressible content such as .js, .css,or .html files, because it reduces network bandwidth and storagesizes. This can both improve performance and reduce costs.

When you specify the -z option, the data from your files iscompressed before it is uploaded, but your actual files areleft uncompressed on the local disk. The uploaded objectsretain the Content-Type and name of the original files, buthave their Content-Encoding metadata set to gzip toindicate that the object data stored are compressed on theCloud Storage servers and have their Cache-Control metadataset to no-transform.

For example, the following command:

gsutil cp -z html \ cattypes.html tabby.jpeg gs://mycats

does the following:

The cp command uploads the files cattypes.html andtabby.jpeg to the bucket gs://mycats.
Based on the file extensions, gsutil sets the Content-Typeof cattypes.html to text/html and tabby.jpeg toimage/jpeg.
The -z option compresses the data in the file cattypes.html.
The -z option also sets the Content-Encoding forcattypes.html to gzip and the Cache-Control forcattypes.html to no-transform.

Because the -z/-Z options compress data prior to upload, theyare not subject to the same compression buffer bottleneck thatcan affect the -j/-J options.

Note that if you download an object with Content-Encoding:gzip,gsutil decompresses the content before writing the local file.

-Z

Applies gzip content-encoding to file uploads. This optionworks like the -z option described above, but it applies toall uploaded files, regardless of extension.

--stet

If the STET binary can be found in boto or PATH, cp willuse the split-trust encryption tool for end-to-end encryption.

cp - Copy files and objects | Cloud Storage | Google Cloud (2024)