Using OneDrive for Business to Sync Large Volumes of Data

Do you use Office 365? If you do, you can also use 1TB cloud storage at no additional cost, courtesy of OneDrive for Business. This gives Microsoft a leg up on competitors: Dropbox, Box Sync, Google Drive, iCloud, etc. As we learned during the First Browser War (IE vs Netscape Navigator), it’s hard for a paid product to compete with a free one. On the other hand, OneDrive places a lot of restrictions on file names, file path lengths and so on and might be a bit hard around the edges as of this writing (February 2017). Nevertheless, why not give it a shot?

I took a plunge on my Mac, put about 150 GB of data into OneDrive on Office 365 and has been syncing it for several months. The road was bumpy, to say the least, but in the end, I could resolve all the issues I faced so far.

Here is what I learned. (These tips apply to macOS or Windows. Included scripts are Bash scripts for Mac, but they could be potentially ported to PowerShell for Windows.)

The information below applies to syncing your local files to OneDrive cloud and between computers. If you need to sync existing (e.g. on premises) SharePoint libraries, you would face additional restrictions.

Tip 0. Give up on syncing OS-specific data.

On Macs, the following artifacts either don’t sync or sync incorrectly:

  1. Symbolic links
  2. Aliases
  3. Extended file attributes: tags, etc.

Tip 1. Scan your files and folders for long path names.

File path length restriction: The entire path, including the file name, must contain fewer than 255 characters.

Long file paths were immediately killing my OneDrive client (with a pop-up dialog message). They were fatal.

You can scan for these problems before putting your files into OneDrive (see shell script in the Tip 2 section). Once you identify long file paths, shorten your folder structure or archive files as necessary.

Tip 2. Scan your files and folder names for problematic characters.

OneDrive on Office 365 does not sync files or folders that:

(BTW, practically all characters are allowed in macOS filenames except ‘/’ at the Unix layer. ‘:’ is not allowed at the Carbon layer, but it is possible to create filenames with ‘:’.)

The problematic files or folders can’t be synced and manifest themselves as sync errors (that is as those badges with numbers on a OneDrive Dock icon on Mac). If you have many of them, they could cause other issues, for example, an unresponsive OneDrive client:

1-onedrive-busy

Or very wide OneDrive UI windows:

2 - OneDrive - Very Wide Window.png

Here is the combined shell script to detect Tip 1 and Tip 2 problems:

#!/usr/bin/env bash
#set -x
if [[ "$1" == "-h" ]]; then
exec 1>&2
cat << EOF
Finds files and directories in OneDrive that are invalid to sync

Practically all characters are allowed in macOS filenames except / at the Unix layer.
: is not allowed at the Carbon layer, but it is possible to create filenames with :. See http://superuser.com/questions/326103/what-are-invalid-characters-for-a-file-name-under-os-x and https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations.

OneDrive does not sync files or folders that:
* Include any of the following characters: \#%:"|?*/
* Begin or end with a space
* End with .
* Begin with ..
* Have very long pathnames (>= 255 characters)

Usage: $0 [-h]

Parameters:
-h Help
Notes:
Ignores unreachable directories (and other errors)

Examples:
$0
EOF
exit 127
fi

# Comment out the following line to search prior to putting files into OneDrive
cd ~/OneDrive*
echo "Searching for files and directories that are invalid to sync in $PWD..."
echo

find -E . -regex '(.*[\#%:"|?*].*)|.{255,}' -or -name ' *' -or -name '* ' -or -name '?*.' -or -name '..*' 2>/dev/null
# -regex pattern is true if the whole path of the file matches pattern.
# For example: to match a file named `./foo/xyzzy`, you can use the regular expression `.∗/[xyz]∗` or `.∗/foo/.∗`, but not `xyzzy` or `/foo/`.
#
# -name '*/*' could be added to test for filenames including /, but those are illegal in macOS.
#
# \ (backslash) inside a pattern character class is used as escape only when followed by any of these: ^-]\
#
# '?*.' glob pattern prevents the root directory (.) from matching

echo "Done."

Once you identify problematic names, you have several choices on how to resolve them:

  1. Rename
  2. Archive (into ZIP, RAR, etc.)
  3. Move out of OneDrive
  4. Delete if not needed.

Tip 3. Watch for sync conflicts.

If you sync among two or more computers, be prepared for conflicts to occur sooner or later. A conflict is a situation where OneDrive can’t decide on the definitive version of a file to use. In this case it’d silently create the second file in the same directory using naming convention {original name}-{computer name}.{original extension}. Often file contents would be the same, but to avoid any data or storage loss you need to detect and manually resolve these conflicts.

The following shell script watches for conflicts in real time. When a conflict is found, the script logs the conflict and notifies via Growl. This script is designed to run in the background. You can launch it when your computer starts, e.g. using Login Items or launchd:

#!/usr/bin/env bash
#set -x
exec 1>>~/Library/Logs/conflict-watches.log 2>>~/Library/Logs/conflict-watches.log
echo "`date` conflict-watch-onedrive> Starting ..."

watchedPath=~/OneDrive*

# Base file name marker, e.g. "Communications-MyComputer.msf" or "Communications-MyComputer"
conflictMarker=-`hostname -s`

echo "`date` conflict-watch-onedrive> Watching" $watchedPath/ "..."

# Enters the waiting loop
# Includes only Created and Renamed (= moved) events. Deletions, updates and other events are ignored.
fswatch -r0 --event Created --event Renamed -e ".*" -i "$conflictMarker" $watchedPath | while read -d "" filename
do
echo "`date` OneDrive conflict: $filename"
growlnotify "OneDrive conflict" -m "$filename"
done

Tip 4. Reboot less often.

After OneDrive starts, it takes hours on my machine to finish scanning of about 100 GB of data. During all that time OneDrive consumes about 100% of one core’s CPU:

3-onedrive-cpu-load

Temperature goes up and fans spin all that time. This is a lot wasted resources. To avoid this, I changed my habits and stopped shutting down my laptop daily.

Tip 5. Sync only what needs to be synced. Don’t sync large, frequently changed files.

If your data already lives on some servers and get synced among your devices, you don’t need to put it into OneDrive. For example, corporate Exchange email is stored on Exchange servers and any changes in it get automatically synced among Outlook clients. On the other hand, personal Outlook archives might be locally stored files, which you might want to put into OneDrive.

I have private email stored in a Mozilla Thunderbird profile, in a combination of Thunderbird IMAP and POP mailboxes and email archives. I tried to put the entire profile into OneDrive to sync all email along with all the Thunderbird settings. It turned out to be a bad idea because of a way Thunderbird stores and caches mail. Even for IMAP mailboxes, I have, for example, 1.7 GB (after compacting) GMail All Mail folder, which is stored in a single file.

In OneDrive for Business, unlimited versioning is usually turned on by default (this might depend on how your IT administrators configured your Office365 Sharepoint). Unlike, for example Dropbox, previous versions in OneDrive consume storage and count towards 1TB storage limit.

Frequent updates in 1.7 GB and my other large mailbox folders were stored as separate versions and they quickly consumed all 1 TB of OneDrive space. The solution for this issue was to move Thunderbird profile out of OneDrive and sync it by other means.

Tip 6. Watch for phantom syncs.

A couple of my files caused OneDrive to never finish: files were always shown in the process of being synced, e.g.:

4-onedrive-phantom-download

Earlier versions of OneDrive client didn’t have a detail progress window to see what’s going on. After the progress window had been added in the later versions I was able to catch onto these phantoms and dig deeper. It turns out that OneDrive server was showing some sort of virus problem in a couple of my old mailbox folders and that’s probably why phantoms were occurring. Once I removed offending files, the problem was resolved.

Tip 7. Use selective sync to solve round-robin crashes.

OneDrive synced all the files, but after some time entered round-robin crash cycle that went like this:

  1. OneDrive keeps crashing after start on computer A and prompts to reset it. Reset OneDrive on computer A.
  2. Wait hours for initial sync to complete on computer A.
  3. OneDrive crashes on computer B immediately and keeps crashing after start. Reset OneDrive on computer B.
  4. Wait hours for initial sync to complete on computer B.
  5. Go to step 1.

I resolved this situation by selectively syncing only some files on both computers and gradually increasing sync scope until the whole drive is covered and full sync (sync all files) could be turned on.

When you sync selectively, OneDrive folder is emptied initially. Hence, if you need files from it in the meantime, move them aside. Full backup is highly recommended.

And here it is: a successfully synced OneDrive.

5-onedrive-success

July 17, 2017 Updates: Added tips 0 and tip 7.

March 29, 2017 Update: conflict-watch-onedrive.sh script has been updated to respond only to Created and Renamed events.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s