11 May 2019

Cloning APFS Volumes & Containers ("APFS inverter failed")

(This is an older article that I hadn't published back then because it might not be fully accurate, i.e. the steps may not be applicable. Yet, it contains some valuable information so I decided to publish it now. Read with a grain of salt. If you have corrections, don't hesitate to comment or email me.)

TL;DR

If Disk Utility fails to clone an APFS container, giving "APFS inverter failed" as error message, a work-around solution is to copy the partition with the "dd" command into a same-sized partition, then fix the cloned container's UUIDs.

About cloning (copying) entire Mac volumes

Usually, when you want to clone a Mac volume, you'd use Disk Utility's Restore operation. It performs a sector-by-sector copy (while skipping unused sector . The alternative would be to perform a file-by-file copy, as it's done by 3rd party tools like Carbon Copy Cloner.

A clone operation by copying individual files has several disadvantages over a sector copy:
  • It's much slower.
  • If you're using Finder Alias files, these may not work any more afterwards (that's because they rely on each file's unique ID, and those IDs change when copying files over to the destination).
  • More programs may request re-activation (re-registration).
However, when I recently tried to make an identical copy of my macOS Mojave system from my MacBook Air (2015), copying it to an external SSD, I ran into problems:

After the copy and verification operation was already apparently finished, an additional inverter process needs to be run on cloned APFS volumes. While I do understand that that's necessary when I copy only a single volume out of an APFS container, or copy the volume into a target APFS container without replacing it entirely, but even if I try to clone the entire container, with erasing the target, it still wants to run an inversion process - and that makes no sense to me.

Now, the error that I kept seeing is: APFS inverter failed to invert the volume

And I'm not the only one, see here and here.

I tried many things, including First Aid and removal of all snapshots, and running the cmdline tool "air", which showed me more detailed error messages. Still, no success. I keep getting variations of the same issue.

What I'm showing now is a way to clone a complete APFS container (with all contained volumes) the way it should work.

How to clone an APFS container

Note: This may not work with encrypted volumes. Or it might. I have not tried.

We simply copy the entire partition (which contains the APFS container) sector by sector. (Small disadvantage over Disk Utility's Restore operation: This will also copy unused sectors, so it'll take a bit longer.)

Afterwards, we need to change the UUIDs of the cloned container and its volumes, or the Mac (and especially Disk Utility) may get confused when both the original and the cloned volumes are present on the same Mac.

Perform the sector copy operation

In case you want to copy your bootable macOS system, you will have to first start up from a different macOS system. If you have no other external disk or partition with another macOS system, simply start up from the Recover system.

When ready, connect the target disk, then start Terminal.app.

Get an overview of our disk names by entering (and always pressing Return afterwards):
diskutil list
Here's a sample output of my Mac that has four partitions, two of which use HFS+ ("AirElCap", "AirData") and the other two use APFS ("AirMojave", "AirHighSierra"):

/dev/disk0 (internal):

   #:                       TYPE NAME                    SIZE       IDENTIFIER

   0:      GUID_partition_scheme                         121.3 GB   disk0

   1:                        EFI EFI                     314.6 MB   disk0s1
   2:                 Apple_APFS Container disk1         40.9 GB    disk0s2
   3:                  Apple_HFS AirElCap                20.1 GB    disk0s3
   4:                 Apple_Boot Boot OS X               134.2 MB   disk0s4
   5:                 Apple_APFS Container disk2         39.8 GB    disk0s5
   6:                  Apple_HFS AirData                 19.9 GB    disk0s6

/dev/disk1 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +40.9 GB    disk1
                                 Physical Store disk0s2
   1:                APFS Volume AirMojave               20.7 GB    disk1s1
   2:                APFS Volume Preboot                 47.1 MB    disk1s2
   3:                APFS Volume Recovery                512.7 MB   disk1s3
   4:                APFS Volume VM                      2.1 GB     disk1s4

/dev/disk2 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +39.8 GB    disk2
                                 Physical Store disk0s5
   1:                APFS Volume AirHighSierra           23.2 GB    disk2s1
   2:                APFS Volume Preboot                 20.9 MB    disk2s2
   3:                APFS Volume Recovery                519.0 MB   disk2s3
   4:                APFS Volume VM                      3.2 GB     disk2s4

I plan to clone "AirMojave", so the disk identifier would be disk1, because that is the container for that volume. In your case it may be a different identifier. I will write the commands below using diskS for the source container and diskTsP for the target partition.

Now unmount all volumes of our source container. In Terminal, enter:
diskutil unmountDisk diskS
If this does not succeed, then you either have programs or files open on one of the volumes or you're trying to unmount the boot volume (in that case, you should have restarted into a different system as explained above). Do not continue if the mount was not successful or you're likely to end up with a corrupted clone.
diskutil apfs list diskS
+-- Container disk2 5099518D-36C0-47CC-B034-0CDA52C51CE8
    ====================================================
    APFS Container Reference:     disk2
    Size (Capacity Ceiling):      39838416896 B (39.8 GB)
    Capacity In Use By Volumes:   27042205696 B (27.0 GB) (67.9% used)
    Capacity Not Allocated:       12796211200 B (12.8 GB) (32.1% free)
sudo diskutil resizeVolume diskTsP size
In place of size, use the number that's show after "Size (Capacity Ceiling):".
diskutil unmount diskTsP
sudo dd bs=65536 if=/dev/diskS of=/dev/diskTsP

Fix the UUIDs

Enter in Terminal:
sudo /System/Library/Filesystems/apfs.fs/Contents/Resources/apfs.util -s /dev/diskT
You'll likely not get any response from this command, telling you success or failure.

To check whether the UUIDs have been changed successfully, enter in Terminal:
diskutil apfs list
In the output, find your source and target disks, and compare the UUIDs of their container and container volumes. They should all be different. If the UUIDs of source and target volumes show the same code, then the fix did not work. Try again.

Make the system volume bootable

The last step is to make the cloned volume bootable again (in case it was before). For that, you need to mount the Preboot volume, then rename the single folder inside, which contains the old UUID, into the new UUID of the main volume.

To mount it:
diskutil mount diskTsX
(X stands for the partition number containing the "Preboot" volume)

To show it in Finder:
open /Volumes/Preboot
Now you should see a Finder window with a folder named after the UUID of the original volume you copied. Rename that folder into the new UUID. To verify that you used the correct UUID, open System Preferences, Startup Disk, select the cloned bootable volume and click the Restart button. If it says that it can't bless it, you got it wrong. Otherwise, the system should now restart, meaning the UUID was correctly set to a bootable volume.

Understanding bugs in Xojo, or not getting them fixed

A former employee of Xojo Inc. once met with an Apple Developer Support (DTS) engineer, looking over some code. The Apple engineer saw a note mentioning my name and told the Xojo employee: You know Thomas Tempelmann? I know him, too: "He's the best type of user, the one that debugs a problem so far so that he tells you exactly what you're doing wrong, and how to fix it."

I'm not infallible, but I believe I can claim that I have quite some experience and understanding of a lot of things under a computer's hood. After all, I've been doing this for nearly 40 years. I'm not so great with abstract algorithms, but when it comes to writing efficient code or debugging it, I'm surely not the best, but have skills that are well above average.


And I've proven that a lot of times with a development system I really love to use: Xojo, formerly known as REALbasic.


I've been using Xojo (or RB) for about 20 years now. I've been one of the first to write plugins for it, and they were quite popular (the plugins eventually became part of the MBS plugins).


Here are a few examples of bugs and solutions I found in Xojo:


  • Back when we still used 68k CPUs, there was a serious issue that only a few customers had: Their apps crashed when they got large. I had some suspicions, looked at the compiled code and soon found the issue - an overflow with a 16 bit offset in a jmp instruction. Worth noting here was that previously the Xojo engineers were not able to find the bug. Yet I, without even having the source code, found it within an hour or so. Because I wrote a 68k compiler once myself and knew what could go wrong. I also like to point out that Xojo's current EULA prohibits us from looking at the compiled code the way I did, in order to learn how it works. I later argued against that, even mentioning this example where I had to do that because no one else was able to do find the bug, to no avail. This was not the last time I had to dig into Xojo's code in order to work around a bug when Xojo refused to look into it, and solved the issue for myself, but I can't publish them any more to the benefit of others because, well, Xojo's EULA and the threat it emcompasses.
  • In 2008, I ran into a very rare case where I'd lose some data when I had 10,000s of objects in a particular data structure (WeakRefs, IIRC). Turned out that the internal dictionary code did not handle collisions correctly. After proposing a code change, the reproducible issue was gone. I had to personally pursue this issue down into the code because, again, the resposible engineer (who was, admittedly, not its original author) did not even believe in the problem I described. Regardless, I was able to get the fix into the framework due to some lucky circumstances, for all of us.
  • The Xojo IDE's Back (History) button does not work reliably, since 2013 when this IDE was introduced. More often than not, the Back button simply does not go back to previously visited locations. I have then, as a proof-of-concept, written an external program that talks to the IDE via the IDE communication socket, regularly requesting the current location, and offering a list with the history. You can then click any history item and the IDE actually jumps back to it. Surprisingly, this works more reliably than the IDE's own back button. Yet, when Xojo CEO Geoff Perlman was recently asked at the MBS conference in Munich about this shortcoming, he insisted that this is a very complex matter that is not easy to solve. I find that hard to believe if even I can do better with an external program.
  • Xojo code can use Threads, but can run them only cooperatively, not concurrently. That's because the runtime functions are not using locking to protect the sensitive operations such as object creation against interruption by another thread. Thus, Xojo's runtime has its own thread scheduler that uses semaphores to make sure only one Xojo thread runs at any time. Now, there are cases where we users need to use Declare statements to invoke OS-provided functions. Some of them may even call back into our own Xojo functions. But if those callbacks happen on threads that Xojo does not control, this can lead to crashes when our Xojo code then accesses Xojo objects. I've come up with a proposal to make this safe, effectively by using locks that suspect the callback task until Xojo is in a safe state. Apart from the possibility of creating a deadlock (which is under the control of the programmer), I was able to supply a demo project that yet has to be proven not to be stable. Xojo, however, ignores all my explanations and demonstrations and simply keeps telling their users that this can't ever be safe.
  • Related to callbacks, there's also a long-known issue with passing function addresses to the OS. This is done by creating a so-called delegate object via the AddressOf operator. The delegate object can be used inside Xojo like a function variable, i.e. one can store the address of a function in a property and call it later. This even works for object instance methods, i.e. methods that are part of an object and have a "self" reference. This self reference is simply a pointer ot the object that gets passed to it when it's invoked from an object. A delegate stores this object reference and passes it to the function if necessary. However, when passing such as delegate to a OS function (via Declare), then Xojo does not pass a pointer into the stub function that sets up the self reference but instead passes the target function address. That means that if the callback is invoked, the self reference is not properly set up, leading to a crash. The issue is known for long, and in the past bug reports of this kind have been closed as "works as designed". Since this could be fixed, the official answer sounds like an excuse for "we don't like to deal with it" or "we don't really care" to me.


All this shows that there are a lot of things in Xojo, mostly low level, that could be fixed - but they don't - because of a lack of comprehension. Sadly, in many cases where I offered solutions, even proofs, I hit a wall. I don't understand how a company that specifically caters to developers can be so ignorant to the needs and offerings of their willing customers.

22 April 2019

Performance considerations when reading directories on macOS

(Latest update: 11 May 2019, see end of text)

I'm developing (and selling) a fairly popular file search program for the Mac called Find Any File, or just FAF.

It works differently from Spotlight, the Mac's primary search tool, in that it always scans the live file system instead of using a database. This makes it somewhat slower in many cases, but has the advantage that it looks at every file on the targeted disk (whereas Spotlight skips system files by default, for instance).

My primary goal is to make the search as fast as possible.

Fast search built into macOS


Until recently, this went quite well, because Mac disks (volumes) were formatted in HFS+ (aka Mac OS Extended), and Apple provides a special file search operation (CatalogSearch or searchfs) for these volumes, by which FAF could ask for the file name the user is looking for, and macOS would search the volume's directory on itself and only return the matching files. This is very fast.

Unfortunately, with Apple's new file system APFS, and the fact that any macOS running High Sierra or Mojave got their startup volume converted from HFS+ to APFS, search performance has decreased by factor 5 to 6! Where searching the entire startup disk for a file like "hosts" did take just 5 seconds on a fast Mac with a HFS volume, it now takes half a minute or more on APFS.

Besides, the old network file server protocol AFP does also support the fast search operation, but only on real Mac servers - some NAS systems pretend to support this as well, but my experience shows that this is very unreliable. The newer SMB protocol, OTOH, does not appear to support searchfs.

Searching the classic way


When the searchfs operation is not available, unreliable or inefficient, FAF falls back to looking at every directory entry itself, looking for matches to the search, then looking at the contents of every subdirectory, and so on. This is called a recursive search (whereas searchfs performs a flat search over all directory entries of a volume).

There are several ways to read these directories. I'll list the most interesting ones:

  • -[NSFileManager contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error:]
  • opendir() & readdir_r()
  • getattrlistbulk()
  • fts_open() && fts_read()

The first is the standard high-level (Foundation) function. It lets you choose which attributes (besides the file name) it shall fetch alongside. This is useful if you want to look at the file sizes, for instance. If you let them fetch along, it'll cache them in the NSURL objects, thereby increase performance if you call [NSURL getResourceValue] later to read the value.

readdir() is a very old UNIX / POSIX function to read a directory's file names, and nothing else, one by one.

getattrlistbulk() is a special Mac BSD function that's an extension to the older getattrlist(). It is supposed to be optimized for faster reading, as it can fetch the entire contents of a directory at once, along with attributes such as file dates and sizes. [NSFileManager contentsOfDirectoryAtURL...] supposedly uses this function, thereby making use of its performance advantage.

fts_open() is a long-existing BSD or POSIX function that is specialized on traversing directory trees. I've added this only after the initial tests, i.e. its discussion is a bit more brief below.

Test methods


I've tried to find out which of the various methods of reading directories, looking only for file names, is the fastest: I had to scan the same directory tree with every method separately.

Testing performance is a bit difficult because macOS tends to cache recently read directories for a short while. For instance, the first time I scan a directory tree with a 100,000 items, it may take 10s, and when I run the same test again withing a few seconds, it'll take only 2s. If I wait half a minute, it may again take 10s. And if I'm searching on a file server, that server may also cache the information in RAM. For instance, my NAS, equipped with Hard Disks, will be rather loud the first time I search on it, due to the HDs performing lots of seeking, whereas a repeat search will make hardly any noise due to little to no seeking, which also increases the search performance.

Therefore, I performed the tests twice in succession: Once after freshly mounting the target volume (so that the cache was clear) and once again right after. This would give me both the worst and best case performances. I repeated this several times and averaged the results.

I also had to test on different media (e.g. fast SSD vs. slower HD) and formats (HFS+, APFS, NTFS) and network protocols (AFP vs. SMB, from both a NAS and another Mac) because they all behave quite differently.

The Xcode project I used for timing all three scanning functions can be downloaded here.

Test results


Most tests were performed on macOS 10.13.6. The NAS is a Synology DS213j with firmware DSM 6.2.1, connected over 1 GBit Ethernet, and both AFP and SMB tests were made on the same NAS directory. The Terminal cmd "smbutil statshares -a" indicates that the latest SMB3 protocol was used. The remote Mac ran macOS 10.14.4, and the targeted directory on it was on a HFS+ volume so that I could compare performance between AFP and SMB (APFS vols can't be shared over AFP). I also did a few tests on the 10.14.4 Mac, though I only recorded the best case results as the others were difficult to create (I'd have had to reboot between every test and I wasn't too keen on that).

I was expecting that contentsOfDirectoryAtURL would always be as fast as its low level version getattrlistbulk, whereas readdir would be slower as it wasn't optimized for this purpose. Surprisingly, this was not always the case. (I did not include the fts method when I did this run of tests - its results will instead be discussed in a separate chapter below).

The values show passed time in seconds for completing a search of a deep folder structure. The values are only comparable in each line, but not across lines, because the folder contents were different. The exception are the network volumes, where the same folders were used for AFP and SMB.

The green fields point out the best results. The red one points out an anomaly.


contentsOfDirectoryAtURL
getattrlistbulk
opendir/readdir

worst casebest caseworst casebest caseworst casebest case
HD HFS+12.42.812.42.312.42.35
SSD HFS+4.92.84.62.264.62.47
SSD APFS11.210.6106.88.63.2
SSD NTFS28628684.7
10.14 APFS

12

10

8
10.14 HFS+

4.1

3.8

4
NAS via AFP5.62.54.82.145.62.7
NAS via SMB1515171595.7
Mac via SMB4.45.46.85.56.55
Mac via AFP5.33.65.13.75.94.3


Observations

  • HD vs. SSD shows that the initial search takes much longer on HDs, which makes sense because HDs have a higher latency. Once the data is in the cache, though, both are equally fast (which makes sense as well).
  • contentsOfDirectoryAtURL and getattrlistbulk perform equally indeed, just as predicted, with the latter usually being a bit faster once the data comes from the cache.
  • On APFS, NTFS and SMB, readdir() is significantly faster than the other methods, which is quite surprising to me.
  • SMB performance is worse than AFP (regardless, Apple declared AFP obsolete) in nearly all cases.
  • When accessing a Mac via SMB, contentsOfDirectoryAtURL is faster than the other methods, but only on the first run (see red field). Once the caches have been filled, it's slower. I can't make sense of it, but it's a very consistent effect in my tests.

The fts functions


fts_open() / fts_read() are, in most cases, faster than readdir()contentsOfDirectoryAtURL and getattrlistbulk. Exceptions are network protocols, where especially the retrieval of additional attributes makes it slower than the other methods.

Fetching additional attributes


When extra attributes such as file dates or sizes, are needed during the scan, the timing of the various methods changes as follows:

  • For contentsOfDirectoryAtURL and getattrlistbulk, there is little impact if these extra attributes are requested with the function call.
  • For readdir(), fetching additional attributes (through lstat()) turns it into the slowest method.
  • The fts functions are the least affected by getting attributes that are also available through the lstat() function if a local file system is targeted. However, for network volumes via AFP, they become about 20% slower in my tests, whereas getattrlistbulk stays faster.

Differences between macOS versions


When searching the same volumes (both HFS+ and APFS) from Sierra (10.12.6), High Sierra (10.13.6) and Mojave (10.14.4), I measure a consistent worse performance on Mojave. Meaning that scanning directories got slower in 10.14 vs. 10.13, by about 15%.

Also, getting additional attributes 10.12,  compared to 10.13 and later, takes about twice as long, across all methods. Which could mean that something improved in 10.13 regarding fetching attributes.

Conclusion


It appears that for optimal performance, I need to implement several methods, and select them depending on which file system or protocol I talk to.

Here's my current list of fastest method per file system:

  • HFS+: Always fts
  • APFS: Always fts
  • AFP: Always getattrlistbulk
  • SMB: If not attributes needed: readdir, otherwise fts or getattrlistbulk

Update on 29 Apr 2019


When traversing a directory tree, one must take care not to cross over into other volumes, which can happen if you encounted mounted file systems in your path, such as when you parse "/" into "/Volumes".

The safe way to check for this is to determine the volume a folder is on before you dive into it. To identify volumes is to get their volume or device ID. One way is to call stat(), then check its st_dev value, another is to get the NSURLVolumeIdentifierKey. Or, in the case of fts_read, it's already provided - which adds to its superior efficiency.

My testing shows an unpleasant performance impact, though:

When traversing with contentsOfDirectoryAtURL, calling stat() is less efficient than getting the value for NSURLVolumeIdentifierKey. That makes sense, because the stat() fetches more data, and that could cause additional disk I/O.

OTOH, the file system layer should know the ID of the volume without the need to perform disk I/O.

Meaning, getting the value for NSURLVolumeIdentifierKey should cost no significant time at all, because the information is known to the upper file system level, before even passing the request on to the actual file system driver for the parcular volume. Therefore the value should be readily available at a much higher level - regardless, fetching this volume ID takes about as much time as getting an actual value from the lowest level, such as a file size or date.

However, when I add fetching this volume ID to every encountered file & folder, the scan time increases by over 30%. Fortunately, for the scanning, one only has to fetch this value for directories, not for files, which makes this have a smaller overall impact. Still, the performance of this could be better if Apple engineering would consider this, I believe. After all, identifying  the volume ID is needed by almost any directory scanner.

Update on 11 May 2019


When discussing my findings on an Apple forum (actually, on one of the few remaining Apple mailing lists), Jim Luther pointed out to try enumeratorAtURL. And, indeed, this function does better than any of the others, at least with my tests on local disks, both on HFS+ and APFS. Like fts_read, it takes care of staying on the same volume, so that I do not have to check the volume ID myself.

I have updated my test project with the use of this function.

Comments, concerns?


Feel free to download the Xcode project and run your own tests.

Comments are welcome here or on Twitter to me: @tempelorg

10 August 2018

Locating and updating symlinks and Finder Aliases with FAF

Today I renamed one of my internal disks in my Mac Pro. I then realized that I had created a few symlinks to that volume, and those would now become invalid.

For example, if the disk used to be called "Data" and is now called "Backups", then symlinks I may have created would still point to "/Volumes/Data/..." but need now point to the new name instead.

Since I knew that there would only be a handful of such symlink files on my other disks, I could easily update them by hand (using Terminal.app, with the "ln -s" command).

All I needed to do was to find all those symlinks first, making sure I would not miss any.

With Find Any File, this is quite easy. Set up a search like this:


To get the "File Type Code" option, you need to hold down the option (alt) key before clicking the popup-menu. Searching for files of type code "slnk" will address symlinks, and nothing else.

This will then find all matching symlinks, which you can then reveal in the Finder and manually update accordingly.

Similarly, you can also find related Finder Aliases, by searching like this:


After renaming a disk, updating Finder Aliases pointing to that disk is usually not necessary, because Aliases use redundant information to locate moved and renamed files.

However, if you should ever copy all your content to a new (larger) disk, file by file, Finder Aliases won't work any more if the targeted files have also been moved or their disk has been renamed.

So, it can't hurt to update your Aliases right away after moving or renaming the target item. To update your aliases, simply locate or reveal them in Finder, then select the Alias file and hit cmd+R to have it reveal its target. Should the target have been moved or renamed in the mean time, macOS will automatically update all the redundant information.

19 July 2017

APFS and fast catalog search

This is about FSCatalogSearch / searchfs support in macOS with the APFS file system.

Updated 3 Oct 2017: Find Any File 1.9 will support fast search on APFS on High Sierra (10.13) by using the searchfs function. Version 1.9 is currently in open beta, see the FAF web site.

Updated 25 July 2017: Clarified why FSCatalogSearch doesn't work on APFS, adds issue about hard links and 64 bit CNID resolving.

Some background on FSCatalogSearch in general


Programs like EasyFind and my own Find Any File (FAF) are able to search for file names (as well as file dates, sizes and a few other rarely needed attributes) on disks in a quite fast manner by using a little-known function macOS offers.

This Carbon level function is known as CatSearch or (FS)CatalogSearch and has been around for more than 25 years. There's also a BSD level function called searchfs.

The advantage of this function is that it performs the search for names at the file system driver level, meaning that when you search for files containing ".png" in their name, the file system can look at the entire directory tree much faster, sorting out the matches, and only report those to the program that initiazes the search and then shows the results to the user.

Without this special function, the search program would have to start at the root of the disk, read each folder (directory) recursively, and then sort out the matches itself, which all takes much more computing time.

For example, a search on a disk with millions of files and folders on it would take only a few seconds with FSCatalogSearch, whereas a classic recursive search would take minutes.

Getting even more technical


Apple added the FSCatalogSearch function in Mac OS long ago, after introducing the HFS file system. This was supported by the fact that  HFS did, unlike Window's FAT, arrange the entire directory tree in one large file on the disk, with interlinked nodes that did not match the hierarchical folder structure. FSCatalogSearch would then iterate over the nodes in a most efficient way, not caring about the folder structure, thereby minimizing disk seek times, which was a significant factor in disk access before SSDs. This also meant that FSCatalogSearch would only work on volume formats that used a single (invisible) file for its entire directory tree, meaning that FSCatalogSearch was never available for FAT disks, for instance. It would also be optimal for NTFS volumes, but since Apple never used NTFS other than to support reading from Bootcamp partitions, they never made the effort to add FSCatalogSearch to their NTFS file system driver.

What about APFS?


Now Apple is about to replace HFS(+) with APFS on macOS. And fans of EasyFind and FAF start wondering: Will I still be able to perform fast disk-wide file name searches the way I'm used to?

The good news is: The APFS file system code has support for the lower level searchfs function, and that's been already added in 2016, apparently, for OS X 10.12. Which ultimately means: Yes, FAF and EasyFind can continue to provide fast search on APFS formatted disks, provided extra work is put into updating the apps accordingly.

However, there are still some issues:
  • The high-level FSCatalogSearch does not work on APFS. Both EasyFind and FAF rely on this function and therefore won't find files the fast way on APFS volumes right now. The reason for this is that APFS uses larger values (64 bit) than HFS+ for identifying the files, and the FSCatalogSearch function cannot handle those larger values. (rdar://33454922)
  • As of now (10.12.6, 10.13 beta 3), the searchfs function does search case-sensitive and not case-insensitive as it should. That means that searching for ".png" won't find files using ".PNG". I confirmed this with an Apple engineer - it's a known issue, just one with a low priority right now. So, there's a chance that this will get resolved eventually, and I hope it'll be done before 10.13 is released. This issue may not get fixed for 10.12.x, though. We'll have to see what Apple does in this regard. (rdar://33455597)
  • Hard links can't be identified correctly - if there are multiple hard links to the same file, then searchfs can't currently tell them apart, and the results will all point to the same directory entry. (rdar://33473247)
  • searchfs() returns CNIDs (Catalog Node IDs, 64 bit wide) instead of paths to the found items. This requires resolving these IDs to the paths later. However, there is currently no documented API provided in macOS to do so. There's a hackish way around this, but that's not a proper solution. (rdar://33507188)

What this all means


Current versions of FAF and EasyFind can't fast search on APFS. They need to be rewritten using the searchfs API.

I will be working on a quick-fix version of FAF that'll add fast search on APFS and which I hope to release before 10.13 (High Sierra) is officially released. I have quite a few other improvements for FAF in the works (64 bit app, content search, icon view, server support etc.) which will have to wait so that I can get this APFS issue resolved ASAP.

05 July 2017

Recover lost BootCamp Windows partitions on an Mac

I have installed several Windows 7 and 10 version on several of my Macs using Apple's Boot Camp feature.

Recently, I found that almost all of them have disappeared: I was not able to boot from them any more when I held down the option (alt, ⌥) key at startup - the Windows partitions would either not appear at all or not boot up.

The main reason in my case was that the MBR was reset to a plain GUID entry, and my Windows versions do not like that, because they cannot handle the EFI / GUID partition info that the Mac prefers. Why that even happened? Probably from repartitioning operations I frequently perform on my disks - and Apple's Disk Utility is quite ignorant of the needs to keep Windows bootable in this regard.

To fix that, I had to edit the MBR partition info again, in order to make the Windows NTFS visible again to the Windows systems.

In short, I used iBored to edit the partition layout of each disk that contains a BootCamp partition from something like this:


Into this:


Note that this reduces the size of the first partition (you could as well change its size to the minimum, which is 33), and adds a new partition with the start and site matching what you can inquire using the Partitions window (see Disk menu):

This modification makes the Windows partition available in the MBR, and after that, I can boot again from it. And it won't mess with macOS booting because that uses the GUID partition info which isn't getting modified by this procedure (for more info, read my older article on using BootCamp on a non-startup disk).

Let me know in the comments or by email if you find this interesting and need more instructions, and I'll see if I can improve this article.

08 April 2017

Adding fast external disks to various Mac models, with benchmarks

2nd Update on 9 Apr 17: See end of article
3rd Update on 18 Apr 17: Added more adapters

I wanted a fast external disk for my three Macs, which are now all 5 years or more old:
  • MacBook Pro Mid 2012 ("MacBookPro10,1"), 2.6 GHz Core i7
  • iMac 27" Mid 2011 ("iMac12,1"), 3.4 GHz Core i7
  • Mac Pro Late 2008 ("MacPro3,1"), 8 Core-Xeon 2.8 GHz

The Mac Pro has no Thunderbolt ports but I've installed a USB 3.0 PCI card from Inateck.

The iMac has Thunderbolt but only USB 2 built-in. I've added USB 3.0 support with the Elgato Thunderbolt 2 Dock (230 €).

I was interested to find out whether I should just go with a cheap USB 3 disk enclosure or use a significantly more expensive Thunderbolt enclosure.

The short answer is: If you have USB 3 ports on your Mac(s), you'll be fine with a USB 3 enclosure from a performance standpoint. But you won't get TRIM support, which is especially important if you plan to use the SSD as a faster boot volume - in that case, I recommend Thunderbolt instead.

For testing, I was using a new SanDisk Ultra II SSD 960GB (SATA III). I connected it to these adapters / enclosures:


Note: Even though the ICY BOX supports USB 3.1, which allows up to 10 Gbit/s, it connects to my Macs only at USB 3.0 speeds (5 Gbit/s) as they do not support 3.1. The provided USB cables are compatible with these Macs' "standard" USB ports.

The Thunderbolt enclosure by Delock requires an external power supply, and the AKiTiO comes with a second cable that needs to be plugged into a regular USB port to provide power to the disk. The USB adapters, on the other hand, do not have nor provide an extra power supply - they pull all the needed power for the disk from the USB port they're connected to.

Test Procedures


The testing was done in various ways, but I always only attempted to figure out the maximum possible transfer speed, such as for copying large single and unfragmented files, as that's the main use case for me (I often transfer entire disks, sector by sector, to another disk, e.g. for migration to a newer Mac, data analysis or quick backups).

For testing read performance, the easiest way is to use the dd command in Terminal.app:

sudo dd if=/dev/rdiskN of=/dev/null bs=512000

N has to be replaced by the disk number, which can be learned by using this command:

diskutil list

Note that this command lists the disk names with the leading "r", but when doing the test, the "r" is important for best speed results.

Write performance testing is a bit more difficult, because you need some data to copy from. You could use a file or another disk, but then you get a result that also involves the reading speed from the other disk. If that disk is an internal SSD, this may not affect the overall result too much, but in my case, it would, at least on the Mac Pro, despite having installed a Samsung SSD 850 EVO 1TB on a Velocity Solo x1 PCI card with a SATA III socket.

Copying with dd from /dev/zero may be a valid method for hard disks, but is not entirely honest for SSDs, as that would indeed test the transfer speed over the USB or TB bus, but since SSDs treat empty sectors specially, by not writing them to their flash memory, that would be cheating. I wanted to measure the true effective write speed, and therefore I'd need to write random (non-zero) data.

Copying from /dev/randon is no solution either, because generating the random data is very slow. I ended up getting unrealistic write speeds around 10 MB/s.

Being the author of a versatile disk editor, iBored, I have instead added a feature to it: Random data is generated once in memory, and then written repeatedly to disk, to consecutive sectors. That gives me realistic results that also correlated fairly well to copying from /dev/zero (i.e. it's a bit slower as expected). The version I used has not been officially released yet, but it's the 1.2b5 in the downloads. The Read/Write Speed Test commands are found in the Disk menu. If you try them yourself, be aware that this tool is not fool-proof. If you accidentally choose the wrong disk, and you could erase your precious files! So, be careful and always have an external backup, just in case.

Due to caching done by the SSD, I also had to make sure to write a large amount of data, to avoid reading and writing just through the faster cache.

Performance Test Results



The graphic does not include the Enateck, Teorder and AKiTiO adapters, as I got them later. The Teorder performed similarly to the Delock 42486, and the AKiTiO similarly to the Delock 42510. The Inateck is about 15 MB/s faster than the ICY BOX on the MacBook (didn't test on the other Macs, yet).

I also made a test on a PC with USB 3 ports, using iBored again. There, read speed was 240 MB/s with the Delock 42486 and the other USB3-SATA II adapter, and 320 MB/s with the ICY BOX. Write speed was only getting to 150 MB/s, though. That's odd, but a test with a regular Windows benchmark tool showed similar results.

Observations


  • The USB 3 performance with the Thunderbolt Dock is not as good as with the built-in USB 3 port on the MacBook Pro. That's not a weakness of the Dock, though, as far as I can tell, but a weakness of the iMac. I suspect its memory bus isn't up to it. When I connected the Dock to the MacBook Pro and then connected the ICY BOX to the Dock's USB 3 port instead of the MacBook Pro's port, I read at 330 MB/s and wrote at 300 MB/s, which is much closer to the built-in USB port.
  • The Delock 42486 USB 3 and the Teorder adapters are mislabelled in my opinion. Both supposedly support SATA III (i.e. 6Gbit/s) but perform like a SATA II (3Gbit/s) adapter.
  • My  Delock 42486 has a problem reporting incorrect power needs over USB, often just 24mA, when it needs 10 times or more, and that causes intermittend problems with some Macs but not with others. Delock support has not been helpful resolving this, insisting that it's not their fault, not even offering a replacement. This problem did not affect the performance testing, though.
  • The Thunderbolt adapters perform about as well or better as a good USB 3 adapter.
  • All the USB adapters have trouble reporting disk information such as name and serial number. While most at least report most of the disk's name ("Ultra II 960GB"), the ICY BOX only reports "ICY BOX" for any disk attached to it, the Inateck even just "2115". Only the Thunderbolt adapters are able to report the complete name ("Sandisk Ultra II 960GB") and the SDD's Serial Number to the Mac. These limitations with the USB adapters are minor, though, and will not affect performance or usability.
  • The Macs are also unable to report S.M.A.R.T. status for disks attached via USB - that's a known limitation of OS X / macOS. Again, with Thunderbolt there is no such issue.
  • UASP support may or may not be having a positive effect on performance. Obviously, the Teorder adapter, claiming UASP support, does not perform well at all. OTOH, the Inateck adapter is slightly faster than the ICY BOX, which may be due to UASP.

Conclusions


If you have a Mac that supports USB 3, there's hardly a need for the Delock Thunderbolt adapter.

The disadvantages of the Thunderbolt adapter (much higher price and having to use an extra power supply) have no advantages I can see.

The only exception I can think of is when you use disk in with that requires more power than USB 3 can supply. Though, my MacBook Pro, in System Profiler, under USB, tells me it can supply 1800mA, while the ICY BOX adapter with the SSD installed only requests 224mA.

It remains to be seen if faster disks (i.e. SSDs) are able to provide even more throughput, and if that would give the Thunderbolt adapter an advantage. However, I believe that SATA III, which is used by all three adapters as the interface to the disk, is the bottleneck here, though there is still a bit of room to fill that out (nominally, SATA III can transmit about 500 MB/s, and suppedly the SanDisk SSD can achieve that, but I have currently no way to verify this).

(9 Apr 17) After considering TRIM support, I've changed my mind on this: The lack of TRIM support in the tested USB 3 makes them not well suited for external SSDs as a Thunderbolt adapter that invariably supports TRIM. See below for more details.

If you have a Mac without USB 3 but with Thunderbolt, you still have two choices:

  • Either purchase a AKiTiO Thunderbolt adapter at about 125 € (as of 12 April 2017 at Amazon). That's probably the cheapest option in the short run.
  • Or get the Thunderbolt 2 Dock from Elgato at about twice the price, plus a cheap USB 3 enclosure for your disk. The Dock has the advantage that it equippes your Mac not only with three USB 3 ports but also with an HDMI port to connect a second monitor, an easier-to-access Audio port (which apparently is also higher-quality than that of some Macs) and even a microphone port. It is more bulky and requires a power supply, too (and here it makes actual sense). That leaves you with a slightly worse performance than with the Thunderbolt case, but therefore your external disk can be used even in older Macs that have no Thunderbolt, and even in regular PCs.

1. Update on 9 Apr 17:

@felix_schwarz pointed out to me that there's an advanced protocol for external disks called UAS (aka UASP). UAS can provide better performance USB (and, according to AKiTiO, apparently also over Thunderbolt), and Mac OS X supports UAS since 10.8, according to WP.

Of the adapters I tried, some claim UASP support, others do not. The results are inconclusive, as one adapter with UASP (Teorder) was very slow, whereas the one with it (Inateck) was slightly faster than the one without claiming UASP support (ICY BOX).

I just learned that AKiTiO makes various Thunderbolt products of interest. There is a Thunderbolt Dock with USB 3 ports as well as an External Thunderbolt + USB SSD that supports UASP. AKiTiO claims that this gives over 500 MB/s effective read speed with their 1TB SSD model, though their benchmark makes me wonder why the effective write speed is below 200 MB/s. Maybe they're using an older type of SSD that is not up to it. The Neutrino model seems to be better in this regard, but it does not offer a 1 TB SDD version. Also, the price is quite high compared to the setups I described above, although actual prices seem to be significantly lower that the given MSRP. If you need maximum performance, these SSDs may still be worth considering. (Update 18 Apr 17: I've now received and tested the AKiTiO Thunder SATA Go Adapter, see above).

2. Update on 9 Apr 17:

I had so far overlooked one important property that affects SSDs: Trim support.

It turns out that the tested USB 3 adapters provide no TRIM support. It's unclear why that is - I suspect that it's simply lack of support for it in OS X, similar to the missing S.M.A.R.T. support.

Missing TRIM support can will quickly lead to performance degradation of the installed SSDs.

If you only use hard disks with these adapters, that won't matter, but for SSDs, especially if they're used frequently for adding and deleting files, TRIM support is fairly significant to the performance. However, if the SSD is mainly used as a permanent backup device, e.g. as a Time Machine backup destination, TRIM support is not that important, even if files get replaced by TM once the disk is full.

Only the Thunderbolt adapters support TRIM. I've verified this by first checking in System Profiler whether the disk is even listed as "TRIM Support: Yes", and then by using the technique described in this Ars Technica article.

This realization inverts my earlier recommendation: Of the tested adapters, the ones with Thunderbolt are the best choice if you want lasting optimal performance, especially if you plan to boot from it regularly (which is a scenario for iMacs with only a HD installed, to speed up their system with the permanently connected SSD that'll host the system software). Of course, that still requires that you enable system-wide TRIM support in OS X / macOS, e.g. by using the "trimforce" command as explained in the Ars Technica article.



There is even an awkward work-around for missing TRIM support, in case you decide to go with an SSD and USB 3 regardless: Regularly fill the free space on the disk with a file that just contains zeros, and then delete the file again, as that'll tell the SSD that this space is actually unused, and should make the disk perform better again for a while. This can be done using iBored (menu Tools, Erase Unused Disk Space...) or with this Terminal command:

cp /dev/zero "/Volumes/YourSSDsName"

Once the copy has finished, filling up the entire disk, delete the file again:


rm "/Volumes/YourSSDsName/zero"


(Update 18 Apr 17): BTW, I now learned how to issue the TRIM command (only for supporting adapters, i.e. Thunderbolt) and will soon add this feature to my disk tool iBored.