22 April 2019

Performance considerations when reading directories on macOS

(Latest update: 11 May 2019, see end of text)

I'm developing (and selling) a fairly popular file search program for the Mac called Find Any File, or just FAF.

It works differently from Spotlight, the Mac's primary search tool, in that it always scans the live file system instead of using a database. This makes it somewhat slower in many cases, but has the advantage that it looks at every file on the targeted disk (whereas Spotlight skips system files by default, for instance).

My primary goal is to make the search as fast as possible.

Fast search built into macOS


Until recently, this went quite well, because Mac disks (volumes) were formatted in HFS+ (aka Mac OS Extended), and Apple provides a special file search operation (CatalogSearch or searchfs) for these volumes, by which FAF could ask for the file name the user is looking for, and macOS would search the volume's directory on itself and only return the matching files. This is very fast.

Unfortunately, with Apple's new file system APFS, and the fact that any macOS running High Sierra or Mojave got their startup volume converted from HFS+ to APFS, search performance has decreased by factor 5 to 6! Where searching the entire startup disk for a file like "hosts" did take just 5 seconds on a fast Mac with a HFS volume, it now takes half a minute or more on APFS.

Besides, the old network file server protocol AFP does also support the fast search operation, but only on real Mac servers - some NAS systems pretend to support this as well, but my experience shows that this is very unreliable. The newer SMB protocol, OTOH, does not appear to support searchfs.

Searching the classic way


When the searchfs operation is not available, unreliable or inefficient, FAF falls back to looking at every directory entry itself, looking for matches to the search, then looking at the contents of every subdirectory, and so on. This is called a recursive search (whereas searchfs performs a flat search over all directory entries of a volume).

There are several ways to read these directories. I'll list the most interesting ones:

  • -[NSFileManager contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error:]
  • opendir() & readdir_r()
  • getattrlistbulk()
  • fts_open() && fts_read()

The first is the standard high-level (Foundation) function. It lets you choose which attributes (besides the file name) it shall fetch alongside. This is useful if you want to look at the file sizes, for instance. If you let them fetch along, it'll cache them in the NSURL objects, thereby increase performance if you call [NSURL getResourceValue] later to read the value.

readdir() is a very old UNIX / POSIX function to read a directory's file names, and nothing else, one by one.

getattrlistbulk() is a special Mac BSD function that's an extension to the older getattrlist(). It is supposed to be optimized for faster reading, as it can fetch the entire contents of a directory at once, along with attributes such as file dates and sizes. [NSFileManager contentsOfDirectoryAtURL...] supposedly uses this function, thereby making use of its performance advantage.

fts_open() is a long-existing BSD or POSIX function that is specialized on traversing directory trees. I've added this only after the initial tests, i.e. its discussion is a bit more brief below.

Test methods


I've tried to find out which of the various methods of reading directories, looking only for file names, is the fastest: I had to scan the same directory tree with every method separately.

Testing performance is a bit difficult because macOS tends to cache recently read directories for a short while. For instance, the first time I scan a directory tree with a 100,000 items, it may take 10s, and when I run the same test again withing a few seconds, it'll take only 2s. If I wait half a minute, it may again take 10s. And if I'm searching on a file server, that server may also cache the information in RAM. For instance, my NAS, equipped with Hard Disks, will be rather loud the first time I search on it, due to the HDs performing lots of seeking, whereas a repeat search will make hardly any noise due to little to no seeking, which also increases the search performance.

Therefore, I performed the tests twice in succession: Once after freshly mounting the target volume (so that the cache was clear) and once again right after. This would give me both the worst and best case performances. I repeated this several times and averaged the results.

I also had to test on different media (e.g. fast SSD vs. slower HD) and formats (HFS+, APFS, NTFS) and network protocols (AFP vs. SMB, from both a NAS and another Mac) because they all behave quite differently.

The Xcode project I used for timing all three scanning functions can be downloaded here.

Test results


Most tests were performed on macOS 10.13.6. The NAS is a Synology DS213j with firmware DSM 6.2.1, connected over 1 GBit Ethernet, and both AFP and SMB tests were made on the same NAS directory. The Terminal cmd "smbutil statshares -a" indicates that the latest SMB3 protocol was used. The remote Mac ran macOS 10.14.4, and the targeted directory on it was on a HFS+ volume so that I could compare performance between AFP and SMB (APFS vols can't be shared over AFP). I also did a few tests on the 10.14.4 Mac, though I only recorded the best case results as the others were difficult to create (I'd have had to reboot between every test and I wasn't too keen on that).

I was expecting that contentsOfDirectoryAtURL would always be as fast as its low level version getattrlistbulk, whereas readdir would be slower as it wasn't optimized for this purpose. Surprisingly, this was not always the case. (I did not include the fts method when I did this run of tests - its results will instead be discussed in a separate chapter below).

The values show passed time in seconds for completing a search of a deep folder structure. The values are only comparable in each line, but not across lines, because the folder contents were different. The exception are the network volumes, where the same folders were used for AFP and SMB.

The green fields point out the best results. The red one points out an anomaly.


contentsOfDirectoryAtURL
getattrlistbulk
opendir/readdir

worst casebest caseworst casebest caseworst casebest case
HD HFS+12.42.812.42.312.42.35
SSD HFS+4.92.84.62.264.62.47
SSD APFS11.210.6106.88.63.2
SSD NTFS28628684.7
10.14 APFS

12

10

8
10.14 HFS+

4.1

3.8

4
NAS via AFP5.62.54.82.145.62.7
NAS via SMB1515171595.7
Mac via SMB4.45.46.85.56.55
Mac via AFP5.33.65.13.75.94.3


Observations

  • HD vs. SSD shows that the initial search takes much longer on HDs, which makes sense because HDs have a higher latency. Once the data is in the cache, though, both are equally fast (which makes sense as well).
  • contentsOfDirectoryAtURL and getattrlistbulk perform equally indeed, just as predicted, with the latter usually being a bit faster once the data comes from the cache.
  • On APFS, NTFS and SMB, readdir() is significantly faster than the other methods, which is quite surprising to me.
  • SMB performance is worse than AFP (regardless, Apple declared AFP obsolete) in nearly all cases.
  • When accessing a Mac via SMB, contentsOfDirectoryAtURL is faster than the other methods, but only on the first run (see red field). Once the caches have been filled, it's slower. I can't make sense of it, but it's a very consistent effect in my tests.

The fts functions


fts_open() / fts_read() are, in most cases, faster than readdir()contentsOfDirectoryAtURL and getattrlistbulk. Exceptions are network protocols, where especially the retrieval of additional attributes makes it slower than the other methods.

Fetching additional attributes


When extra attributes such as file dates or sizes, are needed during the scan, the timing of the various methods changes as follows:

  • For contentsOfDirectoryAtURL and getattrlistbulk, there is little impact if these extra attributes are requested with the function call.
  • For readdir(), fetching additional attributes (through lstat()) turns it into the slowest method.
  • The fts functions are the least affected by getting attributes that are also available through the lstat() function if a local file system is targeted. However, for network volumes via AFP, they become about 20% slower in my tests, whereas getattrlistbulk stays faster.

Differences between macOS versions


When searching the same volumes (both HFS+ and APFS) from Sierra (10.12.6), High Sierra (10.13.6) and Mojave (10.14.4), I measure a consistent worse performance on Mojave. Meaning that scanning directories got slower in 10.14 vs. 10.13, by about 15%.

Also, getting additional attributes 10.12,  compared to 10.13 and later, takes about twice as long, across all methods. Which could mean that something improved in 10.13 regarding fetching attributes.

Conclusion


It appears that for optimal performance, I need to implement several methods, and select them depending on which file system or protocol I talk to.

Here's my current list of fastest method per file system:

  • HFS+: Always fts
  • APFS: Always fts
  • AFP: Always getattrlistbulk
  • SMB: If not attributes needed: readdir, otherwise fts or getattrlistbulk

Update on 29 Apr 2019


When traversing a directory tree, one must take care not to cross over into other volumes, which can happen if you encounted mounted file systems in your path, such as when you parse "/" into "/Volumes".

The safe way to check for this is to determine the volume a folder is on before you dive into it. To identify volumes is to get their volume or device ID. One way is to call stat(), then check its st_dev value, another is to get the NSURLVolumeIdentifierKey. Or, in the case of fts_read, it's already provided - which adds to its superior efficiency.

My testing shows an unpleasant performance impact, though:

When traversing with contentsOfDirectoryAtURL, calling stat() is less efficient than getting the value for NSURLVolumeIdentifierKey. That makes sense, because the stat() fetches more data, and that could cause additional disk I/O.

OTOH, the file system layer should know the ID of the volume without the need to perform disk I/O.

Meaning, getting the value for NSURLVolumeIdentifierKey should cost no significant time at all, because the information is known to the upper file system level, before even passing the request on to the actual file system driver for the parcular volume. Therefore the value should be readily available at a much higher level - regardless, fetching this volume ID takes about as much time as getting an actual value from the lowest level, such as a file size or date.

However, when I add fetching this volume ID to every encountered file & folder, the scan time increases by over 30%. Fortunately, for the scanning, one only has to fetch this value for directories, not for files, which makes this have a smaller overall impact. Still, the performance of this could be better if Apple engineering would consider this, I believe. After all, identifying  the volume ID is needed by almost any directory scanner.

Update on 11 May 2019


When discussing my findings on an Apple forum (actually, on one of the few remaining Apple mailing lists), Jim Luther pointed out to try enumeratorAtURL. And, indeed, this function does better than any of the others, at least with my tests on local disks, both on HFS+ and APFS. Like fts_read, it takes care of staying on the same volume, so that I do not have to check the volume ID myself.

I have updated my test project with the use of this function.

Comments, concerns?


Feel free to download the Xcode project and run your own tests.

Comments are welcome here or on Twitter to me: @tempelorg