AWS Glacier Backups

A short list of commands to help zip or image data and content into mid-size chunks that can be stored in S3 buckets. Since we are storing backups on AWS S3 Glacier, which needs to be restores before access, we want to keep the number of files reasonable. AWS charges for each API request, so incentive to store bigger files. The max limit on single file is 4GB. So lets go with file size in the range of 100MB - 3GB.

Common aws commands:

# Determine size of dir
du -h /path/to/folder1

# Create encrypted zip
zip -er file.zip /path/to/folder

# Create tar.gz file
# list current mounts
df - h /Volumes/tmp
tar -czf /Volumes/tmp/folders.tar.gz folder1 folder2

# Encrypt using openssl
openssl aes-256-cbc -pass file:storage.enc_key -in file.txt -out file.enc
openssl aes-256-cbc -d -pass file:storage.enc_key -in file.enc -out file.txt

# Create encrypted tars
tar -cz folder1 folder2 | openssl aes-256-cbc -pass file:storage.enc_key -out /Volumes/tmp/folders.tar.gz.enc
openssl aes-256-cbc -d -pass file:storage.enc_key -in /Volumes/tmp/folders.tar.gz.enc | tar -xzC /Volumes/tmp/abx

# Install & configure awscli
# config is stored in .aws dir
brew install awscli
aws configure
    AWS Access Key ID:      <access-id>
    AWS Secret Access Key:  <secret-access>
    Default region name:    us-west-2
    Default output format:  json

# List vaults in Glacier
aws glacier list-vaults --account-id -

# List jobs in Glacier Vault (generic)
# After "initiate-job", check status using "list-jobs"
aws glacier list-jobs \
    --account-id - \
    --vault-name photos

# Get list of archives in Glacier Vault
# Job takes 3-5 hours to complete
aws glacier initiate-job \
    --account-id - \
    --vault photos \
    --job-parameters '{ "Type": "inventory-retrieval" }'

# Use "list-jobs" to get job-id
# File archiveList.json contains details of archives in vault
aws glacier get-job-output \
    --account-id - \
    --vault-name photos \
    --job-id "j6ig7qCeJ4Ortc-D83EgHsNxm3RriaAkyEFma37EU07Wxc_5BQfwllggqsgH_JfLusxIV" \
    archiveList.json

# Upload archive to Glacier (small files)
aws glacier upload-archive \
    --account-id - \
    --vault-name photos \
    --body pics-2008.tar.gz

# Upload archive to Glacier (large files)
# Include aws-sdk for java
# Upload archive using high level api
# Script file aws-archive.sh for java program
aws-archive.sh photos pics-2008.tar.gz

# Download/Retrieve archive from Glacier
# Job takes 3-5 hours to complete
aws glacier initiate-job \
    --account-id - \
    --vault-name photos \
    --job-parameters file://archive-retrieval.json

# Use "list-jobs" to get job-id
aws glacier get-job-output \
    --account-id - \
    --vault-name photos \
    --job-id "xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD3qadoOkMGo-FYaLJ5psLKhhcFDjC1n" \
    pics-2008.tar.gz


archive-retrieval.json

{
  "Type": "archive-retrieval",
  "ArchiveId": "AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUj",
  "Description": "Retrieve SQL dump for audit team",
  "SNSTopic":"arn:aws:sns:us-west-2:112233445566:glacier-sandbox"
}


aws-archive.sh

#!/bin/bash

CLASSPATH="/Users/bhira/Code/programming"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/aws-java-sdk-1.11.245.jar"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/commons-logging-1.2.jar"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/jackson-databind.jar"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/jackson-core-2.2.3.jar"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/jackson-annotations-2.1.2.jar"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/httpclient-4.5.4.jar"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/httpcore-4.4.7.jar"
CLASSPATH="$CLASSPATH:/Users/bhira/Code/lib/joda-time-2.9.9.jar"

CLASSNAME="AWSArchiveUpload"

java -cp $CLASSPATH $CLASSNAME $@


References:
http://docs.aws.amazon.com/cli/latest/reference/glacier/index.html
https://www.madboa.com/blog/2016/09/23/glacier-cli-intro/
http://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-an-archive-single-op-using-java.html