19 July 2017

APFS and fast catalog search

This is about FSCatalogSearch / searchfs support in macOS with the APFS file system.

Updated 25 July 2017: Clarified why FSCatalogSearch doesn't work on APFS, adds issue about hard links and 64 bit CNID resolving.

Some background on FSCatalogSearch in general


Programs like EasyFind and my own Find Any File (FAF) are able to search for file names (as well as file dates, sizes and a few other rarely needed attributes) on disks in a quite fast manner by using a little-known function macOS offers.

This Carbon level function is known as CatSearch or (FS)CatalogSearch and has been around for more than 25 years. There's also a BSD level function called searchfs.

The advantage of this function is that it performs the search for names at the file system driver level, meaning that when you search for files containing ".png" in their name, the file system can look at the entire directory tree much faster, sorting out the matches, and only report those to the program that initiazes the search and then shows the results to the user.

Without this special function, the search program would have to start at the root of the disk, read each folder (directory) recursively, and then sort out the matches itself, which all takes much more computing time.

For example, a search on a disk with millions of files and folders on it would take only a few seconds with FSCatalogSearch, whereas a classic recursive search would take minutes.

Getting even more technical


Apple added the FSCatalogSearch function in Mac OS long ago, after introducing the HFS file system. This was supported by the fact that  HFS did, unlike Window's FAT, arrange the entire directory tree in one large file on the disk, with interlinked nodes that did not match the hierarchical folder structure. FSCatalogSearch would then iterate over the nodes in a most efficient way, not caring about the folder structure, thereby minimizing disk seek times, which was a significant factor in disk access before SSDs. This also meant that FSCatalogSearch would only work on volume formats that used a single (invisible) file for its entire directory tree, meaning that FSCatalogSearch was never available for FAT disks, for instance. It would also be optimal for NTFS volumes, but since Apple never used NTFS other than to support reading from Bootcamp partitions, they never made the effort to add FSCatalogSearch to their NTFS file system driver.

What about APFS?


Now Apple is about to replace HFS(+) with APFS on macOS. And fans of EasyFind and FAF start wondering: Will I still be able to perform fast disk-wide file name searches the way I'm used to?

The good news is: The APFS file system code has support for the lower level searchfs function, and that's been already added in 2016, apparently, for OS X 10.12. Which ultimately means: Yes, FAF and EasyFind can continue to provide fast search on APFS formatted disks, provided extra work is put into updating the apps accordingly.

However, there are still some issues:
  • The high-level FSCatalogSearch does not work on APFS. Both EasyFind and FAF rely on this function and therefore won't find files the fast way on APFS volumes right now. The reason for this is that APFS uses larger values (64 bit) than HFS+ for identifying the files, and the FSCatalogSearch function cannot handle those larger values. (rdar://33454922)
  • As of now (10.12.6, 10.13 beta 3), the searchfs function does search case-sensitive and not case-insensitive as it should. That means that searching for ".png" won't find files using ".PNG". I confirmed this with an Apple engineer - it's a known issue, just one with a low priority right now. So, there's a chance that this will get resolved eventually, and I hope it'll be done before 10.13 is released. This issue may not get fixed for 10.12.x, though. We'll have to see what Apple does in this regard. (rdar://33455597)
  • Hard links can't be identified correctly - if there are multiple hard links to the same file, then searchfs can't currently tell them apart, and the results will all point to the same directory entry. (rdar://33473247)
  • searchfs() returns CNIDs (Catalog Node IDs, 64 bit wide) instead of paths to the found items. This requires resolving these IDs to the paths later. However, there is currently no documented API provided in macOS to do so. There's a hackish way around this, but that's not a proper solution. (rdar://33507188)


What this all means


Current versions of FAF and EasyFind can't fast search on APFS. They need to be rewritten using the searchfs API.

I will be working on a quick-fix version of FAF that'll add fast search on APFS and which I hope to release before 10.13 (High Sierra) is officially released. I have quite a few other improvements for FAF in the works (64 bit app, content search, icon view, server support etc.) which will have to wait so that I can get this APFS issue resolved ASAP.

10 comments:

  1. Thank you, keep up the good work!

    ReplyDelete
  2. Thanks for the info. What about the /.vol/ directory, is it available on APFS?

    ReplyDelete
    Replies
    1. I could not figure out the IDs to use with the /.vol/ dir.

      Delete
  3. The traditional way does not work VolumeId/FileId reported by stat?
    Like:
    m12:XCode chris$ stat about_xcode_and_ios_sdk.pdf
    771751958 95058 -rw-r--r-- 1 chris staff 0 ...
    m12:XCode chris$ GetFileInfo /.vol/771751958/95058
    file: "/Volumes/Files/XCode/about_xcode_and_ios_sdk.pdf"
    ...
    So somehow Apple must resolve their file refernce CFURLs there is still hope.

    Btw. 10.12.6 diskutil cannot create APFS volumes anymore or am I to stupid, I have no 10.13 around to check this myself.

    diskutil 10.12.6 does not know createContainer.

    chris$ diskutil APFS
    Usage: diskutil [quiet] ap[fs]
    where is as follows:

    list (Show status of all current APFS Containers)
    deleteContainer (Delete an APFS Container and reformat old disks to HFS)
    deleteVolume (Remove an APFS Volume from its APFS Container)
    unlockVolume (Unlock an encrypted APFS Volume which is locked)

    diskutil apfs with no options will provide help on that verb

    ReplyDelete
    Replies
    1. For diskutil in 10.12.6 see https://eclecticlight.co/2017/07/24/sierra-isnt-finished-yet-apfs-support-is-undone-in-10-12-6/

      Delete
    2. Okay, even if I use the correct IDs as shown in your example, I cannot figure out how to resolve the /.vol/ path to a normal path using BSD/POSIX functions. GetFileInfo probably uses Carbon, which isn't 64 bit ID capable.

      Delete
    3. #include
      #include
      #include
      #include
      #include
      #include
      #include

      int main(int ac, char **av) {
      int res;
      int fd;
      char buf[MAXPATHLEN];

      fprintf(stdout, "Testing FILE ...\n");
      if ((fd = open(av[1], O_RDONLY)) < 0)
      perror(av[0]);
      else {
      if ((res = fcntl(fd, F_GETPATH, buf)) < 0)
      perror("fcntl(FILE)");
      else {
      fprintf(stdout, "FILE: path='%s' res = %d\n", buf, res);
      }
      close(fd);
      }

      return 0;
      }

      compile as fullpath and test:

      m12:~ chris$ stat Simpel.atg
      16777218 9849 -rwxr-xr-x 1 chris admin 0 1955833 ...
      m12:~ chris$ ./fullpath /.vol/16777218/9849
      Testing FILE ...
      FILE: path='/Volumes/Data/Users/chris/Simpel.atg' res = 0

      Works on 10.11.6 at least.

      Delete
    4. Damn the code is broken by the blog software, please tell me if it's ok to mail to your gmail address and I will do so.

      Delete
    5. Ah, I didn't know about the fctl call - that's hard to figure out if you don't know about it. I'm about to create a APFS vol with >32bit node IDs and will see if that'll work.

      Delete
  4. Yes, everyone: Feel free to contact me via e-mail (see my About page)

    ReplyDelete