The first part of this article covers preliminary considerations for configuring the Slingshot engine for replication jobs. If you're already sure how you'd like it configured, click here to skip to the Configuration section.
Summary of Slingshot's replication features in EVO
EVO's Slingshot automation engine has been expanded to include replication functionality. A new type of job can be created specifically for backup and other redundancy goals where complex directory structures need to be handled.
EVO's replication jobs are flexible: The source files can be located on EVO, but it is not required. In fact, a job can be created to handle files from a source outside of EVO to a destination outside of EVO. Jobs can also be created to handle file replication completely internally to EVO.
Sources and destinations can be shares hosted by EVO, USB drives hosted by EVO, remote SMB storage or an AWS S3 bucket.
Replication jobs are intended to be master-slave.
Differences between sync and copy
Each replication job can be set to either "Sync" mode or "Copy" mode. These options let you choose how replication jobs will react to files deleted at the source. When you delete files locally the replication job will do one of two things: either it will ignore deleted files (keeping them on the destination) or it will notice when a file has been deleted and make the destination match.
Use "Sync" when the destination should represent the current state of the source. Any changes you make to the source directory or share will be carried over. New files will be replicated, files that were moved will only appear in the new path and files that are deleted from the source will be removed from the destination as well.
Use "Copy" (default) when the destination should retain everything in the state it was last sent. New files will be added just like in "Sync" mode, but when a file is deleted from the source it will be kept on the destination. Files that are moved or renamed will be copied to the destination again using the new name and path, and the original file name and path will also be retained on the destination. This can result in multiple duplicate copies on the destination if files are frequently moved at the source.
Choosing the path for your destination and why it might matter
As you can imagine, Sync mode and Copy mode each have their own benefits and possible drawbacks.
As with any backup plan, it's important to review the data at the destination to make sure it's what is expected to be there before and after running a job. It's important to be aware of what is on the destination before running a job, and you should consider creating a subdirectory to use as a destination for replication jobs whenever possible, rather than using the root of a share. This is especially important if you intend to use "Sync" mode. "Sync" mode will make the destination match the source, which means that if a destination is selected that already contains data that is not present on the source it could be removed. Again... "Sync" mode will delete all existing data from the destination that is not on the source! Unless you're using an empty volume as the destination, this can be very destructive. Creating a new directory as the destination will ensure that this doesn't create a problem. Replication jobs will not interact with anything "above" the destination path.
Consider how much free space is available on the destination.
It's important to ensure that the destination does not fill to 100% capacity. The free space requirements differ somewhat depending on if Sync mode or Copy mode is selected. Copy mode has the potential to need much more capacity on the destination as the source. As files are removed on the source they are retained on the destination. Sync mode will remove files from the destination that have been deleted on the source, but there still needs to be some additional capacity in some situations. By default, new files will be copied before the Replication job checks for files that need to be removed. This means that the total size of the destination can temporarily exceed the source while a sync job is running. If your workflow involves replacing lots of large files, the capacity demand at the destination could be up to twice the capacity of the source. It is possible to reverse this logic by enabling "Remove deleted files (at destination) before sync'ing new files" in the Preferences section of the job.
Use a common time server for sources and destinations
Replication jobs evaluate what data should be synchronized or copied based on a number of factors, including the modification time and the file size. It's important to ensure that the reference clock on EVO matches the workstations and any other server being used as a source or destination. The easiest way to keep everything synchronized is to have all systems configured to use the same NTP (time) server. Also be sure to check that the time zone matches on each system.
Set the destination to be Read Only for everything but the sending system
If a file is modified at the source it will need to be updated on the destination. That means that an existing file on the destination will need to be overwritten. Before doing that, the replication job will check to make sure that the file being overwritten isn't actually newer than the file at the source. If the destination file is newer than the source, this creates a conflict in which there's no way to know which is the "correct" file to preserve. If this situation occurs, the replication job will skip overwriting the destination to avoid data loss. A user must manually resolve this conflict, but it's not always obvious that a conflict has occurred unless the transfer logs are being monitored. One way to avoid this situation is to only grant write permission on the destination to the EVO handling replication jobs. In this configuration, other users can safely read files on the destination, but modification of the files at the destination is then prevented outside of administrator maintenance.
Consider how to handle your Recycle Bin when using Sync mode
If you're synchronizing a source share from an EVO that has its recycle bin enabled, consider whether the contents of that recycle bin should be replicated to the destination or not. For example if your destination is metered, you may want to set your job to not include the recycle bin's contents.
In general we suggest not including the recycle bin in the replication, but there are some situations where it may be desired.
Here are some recommendations based on typical cases:
|Syncing EVO to S3||Enable recycle bin on EVO (source), do not include the recycle bin in the replication job.||S3 is metered storage, so including the recycle bin in the replication job will result in higher data usage. It may also be difficult to interact with the recycle bin via S3.|
|Syncing EVO to another EVO||Enable recycle bin on the source EVO, do not include the recycle bin in the replication job, do not enable recycle bin on the destination EVO.||If both EVO systems are being used in production, system resources including network and disk should be available to editors as much as possible. Replicating the recycle bin would consume more of those resources for files that are likely not needed (because they were deleted by someone).|
|Syncing EVO to EVO Nearline||Disable recycle bin on the source EVO, do not include the recycle bin in the replication job, enable recycle bin on the destination EVO.||This configuration allows users to free up space on their tier 1 EVO storage more quickly. Deleted files will be handled by the recycle bin on the Nearline. Restoring files in this configuration will require using the Nearline's trash browser and manually copying the file back to the original share.|
|Syncing EVO to a local non-EVO storage||Enable recycle bin on the EVO (source), include the recycle bin in the replication job.||This option highlights where recycle bin redundancy can be useful. This configuration extends some of EVO's recycle bin functionality outside of EVO and results in an additional measure of data protection.|
Verify data integrity
Features like replication are convenient automatic processes that reduce the need for regular human intervention in normal system operations. However, from time to time it's good check in on the process to make sure everything is working as expected. This will be unique for every situation, but periodic verification should include at least the following regular tasks:
- Check the Replication job summary file for any errors or warnings that might indicate trouble
- Check that the directory structure matches expectations
- Check that modification times for recently edited files are correct
It's also recommended that file integrity be confirmed from time to time via a full hash of all files. Depending on the features available, this may require all files be "read" by the system performing the verification.
If there are any changes to network connectivity or server credentials, make sure to re-verify that replication jobs can complete successfully. If a job fails, the summary or detailed logs can assist in determining what went wrong.
Be aware of file attributes
Replication jobs can preserve ShareBrowser metadata (tags and comments) by retaining that information in an internal database.
However, it is not currently possible to preserve extended metadata (i.e. extended attributes/xattr)—such as Finder-level colors and Finder-level tags associated with files and folders—in the replication process. This is a common consideration when moving data from one file system to another. If preservation of extended metadata is important, it will be necessary to add files to a compressed archive that preserves this information while the file is still on the source. To test, make a replication job that sends a sample file to the destination and a second replication job that pulls it back to the original source (in a different directory) or another location where the file can be analyzed. It should be possible to review the resulting file to verify if significant attributes have been retained.
Filter files manually for Replication Job
It is possible to include or exclude files from a job based on their name. These options are available in the Preferences section of each job and use regular expressions to match whole filenames or patterns of partial filename. More information about regular expressions, their use and a utility to test custom examples is available at https://regexr.com/.
If you'd like to exclude a full filename like ".DS_Store" only the filename itself is needed. Check the box next to "Exclude files matching pattern" and fill in the field to the right with any filename or Regular Expression Pattern desired.
Note that this specific example only applies when using an external share as the source. When using a local EVO share as the source .DS_Store files are ignored automatically.
Configuring Replication Jobs
Replication Jobs are a subset of Slingshot automations that are geared toward backing up, restoring, or transferring files or entire shares. These jobs can be scheduled to run automatically and even pause during normal working hours. In additional to traditional backup tasks this feature can be used to migrate data from systems that are being replaced or maintain a synchronized nearline share. It is even possible to initiate jobs between two external shares (server to server operation), without the need to tie up a workstation’s resources.
First, ensure the Slingshot database is configured for automatic backup. This will ensure any configured jobs can be restored in the event plugins are upgraded or the OS disk ever needs to be replaced.
This is configured at the Slingshot > Administration page:
Enable the scheduled backup to select and save the frequency, and set and save the Backup and Restore Locations:
Depending on what kind of Replication Job you'd like to configure, there may be some tasks that need to be completed prior to actually creating a job. Login credentials aren’t used when defining the job itself, they are handled differently depending on the type of server outside of EVO that will be used.
Local EVO to EVO jobs
If data is being transferred between two locations on the same EVO, no special preparation is needed. The user creating the Replication job will need access to the share(s).
Using External NAS (SMB) shares
Replication jobs can interact with remote SMB shares very much like a local share by adding a remote connection. You can define a new remote connection from the “External NAS Shares” section at the bottom of the NAS & Project Sharing page or from the link at the top of any “Create job” page. These connections will enable EVO to act as an SMB client and mount the remote share locally. This enables any replication job using the remote share to take advantage of high speed connectivity if it is available. It also allows users to manage the protocol version, security mode, and monitor the current status of external connections.
To add a connection:
- Enter a unique Connection Name for the external share
- Enter the IP address or hostname of the remote server hosting an SMB share
- Enter a valid username and password
- Click the “Refresh” button to retrieve a list of available shares from the remote server
- Select the desired share
- Select either the root of the share or a root level directory
- Click “Create” to attempt to connect to the external share
Note: The best practice is to make the Connection Name match the Share Name, but this is not required. The Connection Name defined may also be displayed in ShareBrowser if metadata is copied to the remote share.
External NAS Examples:
One important consideration is if the remote server is a second EVO that has had AD (Active Directory) configured. To mount an EVO share using an EVO-local user when joined to AD, the client (in this case the EVO performing Replication tasks) needs to be instructed to use a non-domain user, by prepending something else to the user's name, like `evo\username`.
Amazon AWS S3
Replicating data between EVO and an S3 bucket is possible by storing the required credentials in an Alias. An Alias is a set of stored credentials that are used across various Slingshot features to connect to other servers. For Replication Jobs they are only needed if you will be using Amazon S3.
Create an Alias using the “Alias” option under the “Slingshot” menu in the sidebar of EVOs web interface or from the link at the top of any “Create job” page. Using this utility you can add as many aliases as you need:
- Create a unique name for each Alias
- Select the Schema type “Amazon S3” from the dropdown menu
- Select the Region where your Amazon S3 bucket is accessible (endpoint)
- Enter the Bucket name (this must match exactly)
- Enter your AWS AccessKey and AWS SecretKey
Note: The access key must be for a user who has full write access to the bucket in order to add any data to the bucket. Also, only Schema “Amazon S3” will be seen by Replication Jobs.
Configuring a new job
Replication Jobs can be configured using EVO’s web GUI. You can access this interface by entering EVO’s IP address or domain name into a browser address bar.
Any user can create a job; jobs will only be visible to the user who created them.
Click to begin.
Job names must be unique alphanumeric names, so avoid special characters and spaces. Underscores and dashes are allowed.
Select type of source to be used:
- This-EVO - Any share hosted locally to which the current user has (at least) read access
- Remote SMB - External NAS shares defined and currently displayed as connected on the external shares page
- Amazon-S3 - Any Amazon AWS S3 bucket that has been defined by an Alias
Local and remote SMB shares are accessible by clicking the “Browse…” button. This allows the user to select a directory without the need to manually enter the relative path.
Choose the Replication behavior for this job:
The most significant difference between the two methods is how the job will handle deleted files.
In Copy/Replace files deleted on the source will be ignored, this will result in these files being preserved at the destination.
In Sync/Remove files deleted on the source will also be removed at the destination.
In most cases these can remain at defaults. More details about these options, and a deeper discussion about Replication in general can be found in the first section of this article.
Any location that can be used as a source can be used as a destination. Any additional directories specified in the Destination relative path will be created by the job.
Configure the Schedule
There are four options for scheduling Replication jobs:
- Manual - No scheduled time, job will only run when a user clicks “Save & Run Now”
- Hourly - Job will run every hour of every day, no additional configuration is needed
- Daily - Job will run at a specific time on the hour every day, the hour must be selected
- Weekly - Job will run only on the Day and Hour selected
Force End at (day/hour)
This option allows the job stop running during hours where EVO or network bandwidth is in high demand for editors or other tasks. If the job has not completed by the “Force end” time it will stop and wait for the next scheduled start time.
Once the email section on the Remote Notification page has been configured, it is possible to receive a job summary each time the job completes a scheduled run.
Save / Save & Run Now
After the job has been configured you can chose to run the job immediately or save the job in an “enabled” state that will run at the next scheduled start time.
Once a job is enabled it will run only at the scheduled times. In order to manually trigger a job to start it must first be disabled using the toggle at the top of the page.
As always, please contact SNS support with any questions or if there's any trouble tracking down an issue with running jobs.