22 April 2019

Performance considerations when reading directories on macOS

I'm developing (and selling) a fairly popular file search program for the Mac called Find Any File, or just FAF.

It works differently from Spotlight, the Mac's primary search tool, in that it always scans the live file system instead of using a database. This makes it somewhat slower in many cases, but has the advantage that it looks at every file on the targeted disk (whereas Spotlight skips system files by default, for instance).

My primary goal is to make the search as fast as possible.

Fast search built into macOS


Until recently, this went quite well, because Mac disks (volumes) were formatted in HFS+ (aka Mac OS Extended), and Apple provides a special file search operation (CatalogSearch or searchfs) for these volumes, by which FAF could ask for the file name the user is looking for, and macOS would search the volume's directory on itself and only return the matching files. This is very fast.

Unfortunately, with Apple's new file system APFS, and the fact that any macOS running High Sierra or Mojave got their startup volume converted from HFS+ to APFS, search performance has decreased by factor 5 to 6! Where searching the entire startup disk for a file like "hosts" did take just 5 seconds on a fast Mac with a HFS volume, it now takes half a minute or more on APFS.

Besides, the old network file server protocol AFP does also support the fast search operation, but only on real Mac servers - some NAS systems pretend to support this as well, but my experience shows that this is very unreliable. The newer SMB protocol, OTOH, does not appear to support searchfs.

Searching the classic way


When the searchfs operation is not available, unreliable or inefficient, FAF falls back to looking at every directory entry itself, looking for matches to the search, then looking at the contents of every subdirectory, and so on. This is called a recursive search (whereas searchfs performs a flat search over all directory entries of a volume).

There are several ways to read these directories. I'll list the most interesting ones:

  • -[NSFileManager contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error:]
  • opendir() & readdir_r()
  • getattrlistbulk()
  • fts_open() && fts_read()

The first is the standard high-level (Foundation) function. It lets you choose which attributes (besides the file name) it shall fetch alongside. This is useful if you want to look at the file sizes, for instance. If you let them fetch along, it'll cache them in the NSURL objects, thereby increase performance if you call [NSURL getResourceValue] later to read the value.

readdir() is a very old UNIX / POSIX function to read a directory's file names, and nothing else, one by one.

getattrlistbulk() is a special Mac BSD function that's an extension to the older getattrlist(). It is supposed to be optimized for faster reading, as it can fetch the entire contents of a directory at once, along with attributes such as file dates and sizes. [NSFileManager contentsOfDirectoryAtURL...] supposedly uses this function, thereby making use of its performance advantage.

fts_open() is a long-existing BSD or POSIX function that is specialized on traversing directory trees. I've added this only after the initial tests, i.e. its discussion is a bit more brief below.

Test methods


I've tried to find out which of the various methods of reading directories, looking only for file names, is the fastest: I had to scan the same directory tree with every method separately.

Testing performance is a bit difficult because macOS tends to cache recently read directories for a short while. For instance, the first time I scan a directory tree with a 100,000 items, it may take 10s, and when I run the same test again withing a few seconds, it'll take only 2s. If I wait half a minute, it may again take 10s. And if I'm searching on a file server, that server may also cache the information in RAM. For instance, my NAS, equipped with Hard Disks, will be rather loud the first time I search on it, due to the HDs performing lots of seeking, whereas a repeat search will make hardly any noise due to little to no seeking, which also increases the search performance.

Therefore, I performed the tests twice in succession: Once after freshly mounting the target volume (so that the cache was clear) and once again right after. This would give me both the worst and best case performances. I repeated this several times and averaged the results.

I also had to test on different media (e.g. fast SSD vs. slower HD) and formats (HFS+, APFS, NTFS) and network protocols (AFP vs. SMB, from both a NAS and another Mac) because they all behave quite differently.

The Xcode project I used for timing all three scanning functions can be downloaded here.

Test results


Most tests were performed on macOS 10.13.6. The NAS is a Synology DS213j with firmware DSM 6.2.1, connected over 1 GBit Ethernet, and both AFP and SMB tests were made on the same NAS directory. The Terminal cmd "smbutil statshares -a" indicates that the latest SMB3 protocol was used. The remote Mac ran macOS 10.14.4, and the targeted directory on it was on a HFS+ volume so that I could compare performance between AFP and SMB (APFS vols can't be shared over AFP). I also did a few tests on the 10.14.4 Mac, though I only recorded the best case results as the others were difficult to create (I'd have had to reboot between every test and I wasn't too keen on that).

I was expecting that contentsOfDirectoryAtURL would always be as fast as its low level version getattrlistbulk, whereas readdir would be slower as it wasn't optimized for this purpose. Surprisingly, this was not always the case. (I did not include the fts method when I did this run of tests - its results will instead be discussed in a separate chapter below).

The values show passed time in seconds for completing a search of a deep folder structure. The values are only comparable in each line, but not across lines, because the folder contents were different. The exception are the network volumes, where the same folders were used for AFP and SMB.

The green fields point out the best results. The red one points out an anomaly.


contentsOfDirectoryAtURL
getattrlistbulk
opendir/readdir

worst casebest caseworst casebest caseworst casebest case
HD HFS+12.42.812.42.312.42.35
SSD HFS+4.92.84.62.264.62.47
SSD APFS11.210.6106.88.63.2
SSD NTFS28628684.7
10.14 APFS

12

10

8
10.14 HFS+

4.1

3.8

4
NAS via AFP5.62.54.82.145.62.7
NAS via SMB1515171595.7
Mac via SMB4.45.46.85.56.55
Mac via AFP5.33.65.13.75.94.3


Observations

  • HD vs. SSD shows that the initial search takes much longer on HDs, which makes sense because HDs have a higher latency. Once the data is in the cache, though, both are equally fast (which makes sense as well).
  • contentsOfDirectoryAtURL and getattrlistbulk perform equally indeed, just as predicted, with the latter usually being a bit faster once the data comes from the cache.
  • On APFS, NTFS and SMB, readdir() is significantly faster than the other methods, which is quite surprising to me.
  • SMB performance is worse than AFP (regardless, Apple declared AFP obsolete) in nearly all cases.
  • When accessing a Mac via SMB, contentsOfDirectoryAtURL is faster than the other methods, but only on the first run (see red field). Once the caches have been filled, it's slower. I can't make sense of it, but it's a very consistent effect in my tests.

The fts functions


fts_open() / fts_read() are, in most cases, faster than readdir()contentsOfDirectoryAtURL and getattrlistbulk. Exceptions are network protocols, where especially the retrieval of additional attributes makes it slower than the other methods.

Fetching additional attributes


When extra attributes such as file dates or sizes, are needed during the scan, the timing of the various methods changes as follows:

  • For contentsOfDirectoryAtURL and getattrlistbulk, there is little impact if these extra attributes are requested with the function call.
  • For readdir(), fetching additional attributes (through lstat()) turns it into the slowest method.
  • The fts functions are the least affected by getting attributes that are also available through the lstat() function if a local file system is targeted. However, for network volumes via AFP, they become about 20% slower in my tests, whereas getattrlistbulk stays faster.

Differences between macOS versions


When searching the same volumes (both HFS+ and APFS) from Sierra (10.12.6), High Sierra (10.13.6) and Mojave (10.14.4), I measure a consistent worse performance on Mojave. Meaning that scanning directories got slower in 10.14 vs. 10.13, by about 15%.

Also, getting additional attributes 10.12,  compared to 10.13 and later, takes about twice as long, across all methods. Which could mean that something improved in 10.13 regarding fetching attributes.

Conclusion


It appears that for optimal performance, I need to implement several methods, and select them depending on which file system or protocol I talk to.

Here's my current list of fastest method per file system:

  • HFS+: Always fts
  • APFS: Always fts
  • AFP: Always getattrlistbulk
  • SMB: If not attributes needed: readdir, otherwise fts or getattrlistbulk

Comments, concerns?


Feel free to download the Xcode project and run your own tests.

Comments are welcome here or on Twitter to me: @tempelorg

10 August 2018

Locating and updating symlinks and Finder Aliases with FAF

Today I renamed one of my internal disks in my Mac Pro. I then realized that I had created a few symlinks to that volume, and those would now become invalid.

For example, if the disk used to be called "Data" and is now called "Backups", then symlinks I may have created would still point to "/Volumes/Data/..." but need now point to the new name instead.

Since I knew that there would only be a handful of such symlink files on my other disks, I could easily update them by hand (using Terminal.app, with the "ln -s" command).

All I needed to do was to find all those symlinks first, making sure I would not miss any.

With Find Any File, this is quite easy. Set up a search like this:


To get the "File Type Code" option, you need to hold down the option (alt) key before clicking the popup-menu. Searching for files of type code "slnk" will address symlinks, and nothing else.

This will then find all matching symlinks, which you can then reveal in the Finder and manually update accordingly.

Similarly, you can also find related Finder Aliases, by searching like this:


After renaming a disk, updating Finder Aliases pointing to that disk is usually not necessary, because Aliases use redundant information to locate moved and renamed files.

However, if you should ever copy all your content to a new (larger) disk, file by file, Finder Aliases won't work any more if the targeted files have also been moved or their disk has been renamed.

So, it can't hurt to update your Aliases right away after moving or renaming the target item. To update your aliases, simply locate or reveal them in Finder, then select the Alias file and hit cmd+R to have it reveal its target. Should the target have been moved or renamed in the mean time, macOS will automatically update all the redundant information.

19 July 2017

APFS and fast catalog search

This is about FSCatalogSearch / searchfs support in macOS with the APFS file system.

Updated 3 Oct 2017: Find Any File 1.9 will support fast search on APFS on High Sierra (10.13) by using the searchfs function. Version 1.9 is currently in open beta, see the FAF web site.

Updated 25 July 2017: Clarified why FSCatalogSearch doesn't work on APFS, adds issue about hard links and 64 bit CNID resolving.

Some background on FSCatalogSearch in general


Programs like EasyFind and my own Find Any File (FAF) are able to search for file names (as well as file dates, sizes and a few other rarely needed attributes) on disks in a quite fast manner by using a little-known function macOS offers.

This Carbon level function is known as CatSearch or (FS)CatalogSearch and has been around for more than 25 years. There's also a BSD level function called searchfs.

The advantage of this function is that it performs the search for names at the file system driver level, meaning that when you search for files containing ".png" in their name, the file system can look at the entire directory tree much faster, sorting out the matches, and only report those to the program that initiazes the search and then shows the results to the user.

Without this special function, the search program would have to start at the root of the disk, read each folder (directory) recursively, and then sort out the matches itself, which all takes much more computing time.

For example, a search on a disk with millions of files and folders on it would take only a few seconds with FSCatalogSearch, whereas a classic recursive search would take minutes.

Getting even more technical


Apple added the FSCatalogSearch function in Mac OS long ago, after introducing the HFS file system. This was supported by the fact that  HFS did, unlike Window's FAT, arrange the entire directory tree in one large file on the disk, with interlinked nodes that did not match the hierarchical folder structure. FSCatalogSearch would then iterate over the nodes in a most efficient way, not caring about the folder structure, thereby minimizing disk seek times, which was a significant factor in disk access before SSDs. This also meant that FSCatalogSearch would only work on volume formats that used a single (invisible) file for its entire directory tree, meaning that FSCatalogSearch was never available for FAT disks, for instance. It would also be optimal for NTFS volumes, but since Apple never used NTFS other than to support reading from Bootcamp partitions, they never made the effort to add FSCatalogSearch to their NTFS file system driver.

What about APFS?


Now Apple is about to replace HFS(+) with APFS on macOS. And fans of EasyFind and FAF start wondering: Will I still be able to perform fast disk-wide file name searches the way I'm used to?

The good news is: The APFS file system code has support for the lower level searchfs function, and that's been already added in 2016, apparently, for OS X 10.12. Which ultimately means: Yes, FAF and EasyFind can continue to provide fast search on APFS formatted disks, provided extra work is put into updating the apps accordingly.

However, there are still some issues:
  • The high-level FSCatalogSearch does not work on APFS. Both EasyFind and FAF rely on this function and therefore won't find files the fast way on APFS volumes right now. The reason for this is that APFS uses larger values (64 bit) than HFS+ for identifying the files, and the FSCatalogSearch function cannot handle those larger values. (rdar://33454922)
  • As of now (10.12.6, 10.13 beta 3), the searchfs function does search case-sensitive and not case-insensitive as it should. That means that searching for ".png" won't find files using ".PNG". I confirmed this with an Apple engineer - it's a known issue, just one with a low priority right now. So, there's a chance that this will get resolved eventually, and I hope it'll be done before 10.13 is released. This issue may not get fixed for 10.12.x, though. We'll have to see what Apple does in this regard. (rdar://33455597)
  • Hard links can't be identified correctly - if there are multiple hard links to the same file, then searchfs can't currently tell them apart, and the results will all point to the same directory entry. (rdar://33473247)
  • searchfs() returns CNIDs (Catalog Node IDs, 64 bit wide) instead of paths to the found items. This requires resolving these IDs to the paths later. However, there is currently no documented API provided in macOS to do so. There's a hackish way around this, but that's not a proper solution. (rdar://33507188)

What this all means


Current versions of FAF and EasyFind can't fast search on APFS. They need to be rewritten using the searchfs API.

I will be working on a quick-fix version of FAF that'll add fast search on APFS and which I hope to release before 10.13 (High Sierra) is officially released. I have quite a few other improvements for FAF in the works (64 bit app, content search, icon view, server support etc.) which will have to wait so that I can get this APFS issue resolved ASAP.

05 July 2017

Recover lost BootCamp Windows partitions on an Mac

I have installed several Windows 7 and 10 version on several of my Macs using Apple's Boot Camp feature.

Recently, I found that almost all of them have disappeared: I was not able to boot from them any more when I held down the option (alt, ⌥) key at startup - the Windows partitions would either not appear at all or not boot up.

The main reason in my case was that the MBR was reset to a plain GUID entry, and my Windows versions do not like that, because they cannot handle the EFI / GUID partition info that the Mac prefers. Why that even happened? Probably from repartitioning operations I frequently perform on my disks - and Apple's Disk Utility is quite ignorant of the needs to keep Windows bootable in this regard.

To fix that, I had to edit the MBR partition info again, in order to make the Windows NTFS visible again to the Windows systems.

In short, I used iBored to edit the partition layout of each disk that contains a BootCamp partition from something like this:


Into this:


Note that this reduces the size of the first partition (you could as well change its size to the minimum, which is 33), and adds a new partition with the start and site matching what you can inquire using the Partitions window (see Disk menu):

This modification makes the Windows partition available in the MBR, and after that, I can boot again from it. And it won't mess with macOS booting because that uses the GUID partition info which isn't getting modified by this procedure (for more info, read my older article on using BootCamp on a non-startup disk).

Let me know in the comments or by email if you find this interesting and need more instructions, and I'll see if I can improve this article.

08 April 2017

Adding fast external disks to various Mac models, with benchmarks

2nd Update on 9 Apr 17: See end of article
3rd Update on 18 Apr 17: Added more adapters

I wanted a fast external disk for my three Macs, which are now all 5 years or more old:
  • MacBook Pro Mid 2012 ("MacBookPro10,1"), 2.6 GHz Core i7
  • iMac 27" Mid 2011 ("iMac12,1"), 3.4 GHz Core i7
  • Mac Pro Late 2008 ("MacPro3,1"), 8 Core-Xeon 2.8 GHz

The Mac Pro has no Thunderbolt ports but I've installed a USB 3.0 PCI card from Inateck.

The iMac has Thunderbolt but only USB 2 built-in. I've added USB 3.0 support with the Elgato Thunderbolt 2 Dock (230 €).

I was interested to find out whether I should just go with a cheap USB 3 disk enclosure or use a significantly more expensive Thunderbolt enclosure.

The short answer is: If you have USB 3 ports on your Mac(s), you'll be fine with a USB 3 enclosure from a performance standpoint. But you won't get TRIM support, which is especially important if you plan to use the SSD as a faster boot volume - in that case, I recommend Thunderbolt instead.

For testing, I was using a new SanDisk Ultra II SSD 960GB (SATA III). I connected it to these adapters / enclosures:


Note: Even though the ICY BOX supports USB 3.1, which allows up to 10 Gbit/s, it connects to my Macs only at USB 3.0 speeds (5 Gbit/s) as they do not support 3.1. The provided USB cables are compatible with these Macs' "standard" USB ports.

The Thunderbolt enclosure by Delock requires an external power supply, and the AKiTiO comes with a second cable that needs to be plugged into a regular USB port to provide power to the disk. The USB adapters, on the other hand, do not have nor provide an extra power supply - they pull all the needed power for the disk from the USB port they're connected to.

Test Procedures


The testing was done in various ways, but I always only attempted to figure out the maximum possible transfer speed, such as for copying large single and unfragmented files, as that's the main use case for me (I often transfer entire disks, sector by sector, to another disk, e.g. for migration to a newer Mac, data analysis or quick backups).

For testing read performance, the easiest way is to use the dd command in Terminal.app:

sudo dd if=/dev/rdiskN of=/dev/null bs=512000

N has to be replaced by the disk number, which can be learned by using this command:

diskutil list

Note that this command lists the disk names with the leading "r", but when doing the test, the "r" is important for best speed results.

Write performance testing is a bit more difficult, because you need some data to copy from. You could use a file or another disk, but then you get a result that also involves the reading speed from the other disk. If that disk is an internal SSD, this may not affect the overall result too much, but in my case, it would, at least on the Mac Pro, despite having installed a Samsung SSD 850 EVO 1TB on a Velocity Solo x1 PCI card with a SATA III socket.

Copying with dd from /dev/zero may be a valid method for hard disks, but is not entirely honest for SSDs, as that would indeed test the transfer speed over the USB or TB bus, but since SSDs treat empty sectors specially, by not writing them to their flash memory, that would be cheating. I wanted to measure the true effective write speed, and therefore I'd need to write random (non-zero) data.

Copying from /dev/randon is no solution either, because generating the random data is very slow. I ended up getting unrealistic write speeds around 10 MB/s.

Being the author of a versatile disk editor, iBored, I have instead added a feature to it: Random data is generated once in memory, and then written repeatedly to disk, to consecutive sectors. That gives me realistic results that also correlated fairly well to copying from /dev/zero (i.e. it's a bit slower as expected). The version I used has not been officially released yet, but it's the 1.2b5 in the downloads. The Read/Write Speed Test commands are found in the Disk menu. If you try them yourself, be aware that this tool is not fool-proof. If you accidentally choose the wrong disk, and you could erase your precious files! So, be careful and always have an external backup, just in case.

Due to caching done by the SSD, I also had to make sure to write a large amount of data, to avoid reading and writing just through the faster cache.

Performance Test Results



The graphic does not include the Enateck, Teorder and AKiTiO adapters, as I got them later. The Teorder performed similarly to the Delock 42486, and the AKiTiO similarly to the Delock 42510. The Inateck is about 15 MB/s faster than the ICY BOX on the MacBook (didn't test on the other Macs, yet).

I also made a test on a PC with USB 3 ports, using iBored again. There, read speed was 240 MB/s with the Delock 42486 and the other USB3-SATA II adapter, and 320 MB/s with the ICY BOX. Write speed was only getting to 150 MB/s, though. That's odd, but a test with a regular Windows benchmark tool showed similar results.

Observations


  • The USB 3 performance with the Thunderbolt Dock is not as good as with the built-in USB 3 port on the MacBook Pro. That's not a weakness of the Dock, though, as far as I can tell, but a weakness of the iMac. I suspect its memory bus isn't up to it. When I connected the Dock to the MacBook Pro and then connected the ICY BOX to the Dock's USB 3 port instead of the MacBook Pro's port, I read at 330 MB/s and wrote at 300 MB/s, which is much closer to the built-in USB port.
  • The Delock 42486 USB 3 and the Teorder adapters are mislabelled in my opinion. Both supposedly support SATA III (i.e. 6Gbit/s) but perform like a SATA II (3Gbit/s) adapter.
  • My  Delock 42486 has a problem reporting incorrect power needs over USB, often just 24mA, when it needs 10 times or more, and that causes intermittend problems with some Macs but not with others. Delock support has not been helpful resolving this, insisting that it's not their fault, not even offering a replacement. This problem did not affect the performance testing, though.
  • The Thunderbolt adapters perform about as well or better as a good USB 3 adapter.
  • All the USB adapters have trouble reporting disk information such as name and serial number. While most at least report most of the disk's name ("Ultra II 960GB"), the ICY BOX only reports "ICY BOX" for any disk attached to it, the Inateck even just "2115". Only the Thunderbolt adapters are able to report the complete name ("Sandisk Ultra II 960GB") and the SDD's Serial Number to the Mac. These limitations with the USB adapters are minor, though, and will not affect performance or usability.
  • The Macs are also unable to report S.M.A.R.T. status for disks attached via USB - that's a known limitation of OS X / macOS. Again, with Thunderbolt there is no such issue.
  • UASP support may or may not be having a positive effect on performance. Obviously, the Teorder adapter, claiming UASP support, does not perform well at all. OTOH, the Inateck adapter is slightly faster than the ICY BOX, which may be due to UASP.

Conclusions


If you have a Mac that supports USB 3, there's hardly a need for the Delock Thunderbolt adapter.

The disadvantages of the Thunderbolt adapter (much higher price and having to use an extra power supply) have no advantages I can see.

The only exception I can think of is when you use disk in with that requires more power than USB 3 can supply. Though, my MacBook Pro, in System Profiler, under USB, tells me it can supply 1800mA, while the ICY BOX adapter with the SSD installed only requests 224mA.

It remains to be seen if faster disks (i.e. SSDs) are able to provide even more throughput, and if that would give the Thunderbolt adapter an advantage. However, I believe that SATA III, which is used by all three adapters as the interface to the disk, is the bottleneck here, though there is still a bit of room to fill that out (nominally, SATA III can transmit about 500 MB/s, and suppedly the SanDisk SSD can achieve that, but I have currently no way to verify this).

(9 Apr 17) After considering TRIM support, I've changed my mind on this: The lack of TRIM support in the tested USB 3 makes them not well suited for external SSDs as a Thunderbolt adapter that invariably supports TRIM. See below for more details.

If you have a Mac without USB 3 but with Thunderbolt, you still have two choices:

  • Either purchase a AKiTiO Thunderbolt adapter at about 125 € (as of 12 April 2017 at Amazon). That's probably the cheapest option in the short run.
  • Or get the Thunderbolt 2 Dock from Elgato at about twice the price, plus a cheap USB 3 enclosure for your disk. The Dock has the advantage that it equippes your Mac not only with three USB 3 ports but also with an HDMI port to connect a second monitor, an easier-to-access Audio port (which apparently is also higher-quality than that of some Macs) and even a microphone port. It is more bulky and requires a power supply, too (and here it makes actual sense). That leaves you with a slightly worse performance than with the Thunderbolt case, but therefore your external disk can be used even in older Macs that have no Thunderbolt, and even in regular PCs.

1. Update on 9 Apr 17:

@felix_schwarz pointed out to me that there's an advanced protocol for external disks called UAS (aka UASP). UAS can provide better performance USB (and, according to AKiTiO, apparently also over Thunderbolt), and Mac OS X supports UAS since 10.8, according to WP.

Of the adapters I tried, some claim UASP support, others do not. The results are inconclusive, as one adapter with UASP (Teorder) was very slow, whereas the one with it (Inateck) was slightly faster than the one without claiming UASP support (ICY BOX).

I just learned that AKiTiO makes various Thunderbolt products of interest. There is a Thunderbolt Dock with USB 3 ports as well as an External Thunderbolt + USB SSD that supports UASP. AKiTiO claims that this gives over 500 MB/s effective read speed with their 1TB SSD model, though their benchmark makes me wonder why the effective write speed is below 200 MB/s. Maybe they're using an older type of SSD that is not up to it. The Neutrino model seems to be better in this regard, but it does not offer a 1 TB SDD version. Also, the price is quite high compared to the setups I described above, although actual prices seem to be significantly lower that the given MSRP. If you need maximum performance, these SSDs may still be worth considering. (Update 18 Apr 17: I've now received and tested the AKiTiO Thunder SATA Go Adapter, see above).

2. Update on 9 Apr 17:

I had so far overlooked one important property that affects SSDs: Trim support.

It turns out that the tested USB 3 adapters provide no TRIM support. It's unclear why that is - I suspect that it's simply lack of support for it in OS X, similar to the missing S.M.A.R.T. support.

Missing TRIM support can will quickly lead to performance degradation of the installed SSDs.

If you only use hard disks with these adapters, that won't matter, but for SSDs, especially if they're used frequently for adding and deleting files, TRIM support is fairly significant to the performance. However, if the SSD is mainly used as a permanent backup device, e.g. as a Time Machine backup destination, TRIM support is not that important, even if files get replaced by TM once the disk is full.

Only the Thunderbolt adapters support TRIM. I've verified this by first checking in System Profiler whether the disk is even listed as "TRIM Support: Yes", and then by using the technique described in this Ars Technica article.

This realization inverts my earlier recommendation: Of the tested adapters, the ones with Thunderbolt are the best choice if you want lasting optimal performance, especially if you plan to boot from it regularly (which is a scenario for iMacs with only a HD installed, to speed up their system with the permanently connected SSD that'll host the system software). Of course, that still requires that you enable system-wide TRIM support in OS X / macOS, e.g. by using the "trimforce" command as explained in the Ars Technica article.



There is even an awkward work-around for missing TRIM support, in case you decide to go with an SSD and USB 3 regardless: Regularly fill the free space on the disk with a file that just contains zeros, and then delete the file again, as that'll tell the SSD that this space is actually unused, and should make the disk perform better again for a while. This can be done using iBored (menu Tools, Erase Unused Disk Space...) or with this Terminal command:

cp /dev/zero "/Volumes/YourSSDsName"

Once the copy has finished, filling up the entire disk, delete the file again:


rm "/Volumes/YourSSDsName/zero"


(Update 18 Apr 17): BTW, I now learned how to issue the TRIM command (only for supporting adapters, i.e. Thunderbolt) and will soon add this feature to my disk tool iBored.

28 March 2017

Xojo: How to improve performance when using WeakRefs

If you're building tree structures with leaves (children) that shall keep a reference to their branch (ancestor, parent) they're attached to, the simplest (but not smart) solution is to store a direct references to the parent object in the child object.

Imagine this node class:

class Person
  property name as String
  property parent as Person
  property children() as Person
end class

And then you'd have a Constructor like this:

sub Constructor (parentIn as Person, nameIn as String)
  self.parent = parentIn
  self.name = nameIn
end sub

and create new children like this:

sub AddAChildNamed (name as String)
  dim newChild as new Person (self, name)
  self.children.Append newChild
end sub

This is not optimal because you'd get circular references (parent references children who in turn reference back their parent). And since Xojo uses ref-counting for keeping its objects alive, and does not have a separate garbage collection task, removing all references from your main code to the top parent won't free all those Person objects, because they still keep referencing each others, thereby keeping themselves alive.

One solution would be to add a "FreeAllChildren" method that you'd have to call right before are ready to release the entire tree, but that's not very elegant (it's the fatest solution, though, if you get this right).

A more natural and fool-proof way is to use weak references for the child-to-parent connections, like this:

class Person
  property name as String
  property parent as WeakRef // **changed**
  property children() as Person
end class

sub Constructor (parentIn as Person, nameIn as String)
  self.parent = new WeakRef (parentIn)
  self.name = nameIn
end sub

To access the actual parent, you'd use code like this to temporarily re-create a hard reference to the parent object:

  dim myParent as Parent = Parent (self.parent.Value)

With the above changes, you won't have circular references any more.

However, it's not very efficient. If you have many 1000s of parent objects, you'll get as many WeakRef objects. And the downside of WeakRef is that it involves a search operation to locate the actual parent object when you use the WeakRef.Value function. And the more WeakRef objects you have, the more time it takes to look up these values.

Now consider this: Usually, with these parent-child relationships, there are more children to a parent than vice versa. In the above code, every child creates a WeakRef, often for the same Parent. And that's what we can optimize for:

Instead of having the child create its own WeakRef, let us have the parent provide it, as it'll remain constant.

Change the constructor to accept a WeakRef:

sub Constructor (parentRefIn as WeakRef, nameIn as String)
  self.parent = parentRefIn
  self.name = nameIn
end sub

add a new property to the class:

class Person
  property selfRef as WeakRef
  ...

and update the way a child gets added accordingly:

sub AddAChildNamed (name as String)
  if selfRef = nil then selfRef = new WeakRef (self)
  dim newChild as new Person (selfRef, name)
  self.children.Append newChild
end sub

And that's all.

TL;DR

If you use WeakRefs, and if you create many WeakRef for the same object, consider creating the WeakRef only once and cache it, to gain runtime performance.

24 March 2016

Using USB barcode scanners on Macs with Xojo

Barcode scanners translate 1D and 2D bar codes, such as Code128 and QR Code, into readable characters.

Usually, handheld bar code scanners with a USB interface simulate a keyboard by default. With that, they work out of the box: If you scan a code with such a reader, it'll send keyboard strokes to the computer.

This makes them easy to use, but it also causes some issues:
  • If you have a program that expects input from a scanner, you have to make sure the typed characters are properly caught by your program and not go somewhere they don't belong. E.g, if the user switches accidentally to another program and then scans a bar code, not only will your program not be able to get the code from the scanner, but the keystrokes may cause odd effects in the the program that's active then. It's like your cat walking over your keyboard and wonder what'll go wrong.
  • Even worse, it can pose a security threat. Hackers have been able to show scanners customized bar codes that contained key strokes that would lead to remote controlling computers that the user was not supposed to access that way. Imagine a so-called "kiosk" terminal that's supposed to run only a particular application but gets quit by a alt-F4 or cmd-Q keystroke, then opening another program such as a Terminal to mess with the computer. In other words, using a bar code scanner in the "keyboard emulation" mode opens the computer to all kinds of abuse that you may have wanted to prevent by not giving the user a keyboard and mouse.

One way to prevent these issues is to put the scanner into a mode where it does not emulate a keyboard any more. That mode is usually called a "HID POS" or "USB OEM" mode. In such a mode,  the software needs to run special code to communicate with the scanner directly. Another option would be to not use the scanner's USB interface but rather use a serial interface (many scanners can be used with either) and then use a Serial-to-USB adapter to connect the serially operated scanner to the Mac or PC.

In case you want to use the USB HID POS mode with a scanner (e.g. GoDEX models), I've posted some details and a demo project on the Xojo forum a while ago: https://forum.xojo.com/24557

In the meantime, I've also managed to scanners from Datalogic (Gryphon 44xx models) and similar scanners using the sparely documented IBM / Toshiba "OEM" USB protocol. If you are interested in supporting those, contact me directly.

31 October 2015

Searching multiple Xojo projects

Finding that useful method or class in one of your numerous Xojo projects


If you've used Xojo (Real Studio) for a while, you'll probably have collected more than just one project.

And each of these projects contains unique code, and some of that may even be re-usable for other projects.

How do you keep track of all the code and methods you've written in the past? E.g, you do remember you've once written that nifty function to count the words in a string, but where is it?

I am going to show you several ways how you can find code across multiple projects.


Mac OS X only: Use Spotlight


For Spotlight to be able to search Xojo projects, it needs to be able to understand their content so that it can extract any text from it and add it to its searchable database. By default, textual project formats (.xojo_project, .xojo_xml_project, .xojo_script, .xml, .rbvcp, .rbs) are regarded as text files and thus be automatically scanned by Spotlight. However, it won't scan binary file formats such as .xojo_project and .rbp - to make them usable by Spotlight, a so-called Spotlight Importer is needed.

Real Studio and early Xojo IDEs did include such an importer, but recent versions don't any more. However, as I wrote this importer originally, I've now made the effort and updated it for the latest Xojo file format (as of Xojo 2015r3). So, if you're on a Mac I suggest you install this latest importer for Xojo if you like to use Spotlight:


This Spotlight importer will remember all the class and function names of all your projects, along with any other text it they contain.

Let's assume you have a method names "CountWords". If you enter "countwords" into the Spotlight search field, you should see the project file listed in the results.

However, the results may be cluttered with a lot of other types of files not related to Xojo.

So, here's how you tell Spotlight to only show Xojo project files:
  • Open the Spotlight Finder window (e.g. by pressing cmd+F in the Finder).
  • If you do not see a popup menu under the Search: row, click the [+] button on the right.
  • Click on the leftmost popup menu (which probably reads Kind) and choose Other...
  • A new sheet dialog appears in which you can select a search attribute.
  • Find the "File extension" attribute. Tick its checkbox under the In Menu column.
  • While you're at it, also find the "Xojo class" attribute and tick its checkbox as well.
  • Then click OK to dismiss the dialog.
Now, whenever you use this Find window, you can choose File extension from the popup menu to limit the found items to file types that your Real Studio projects use: rbprbbas, xojo_binary_project etc.

Furthermore, if you are looking for a particular class, you can choose the Xojo class attribute and enter the (partial) class name there, leaving the main search field empty. That way, you'll see all the projects that contain that class.


All platforms: Searching inside VCP and XML projects


If you are used to saving your projects in the textual VCP (.rbvcp or .xojo_project and related class files) or XML (.xml or .xojo_xml_project) format, then you may have success performing simple searches using Spotlight on OS X or Windows Search on Windows.

If you need more control over what files are found, so that you don't get lots of false results from non-RB files, here are a few 3rd party programs you could try:

OS X

  • TextWrangler is free and has a Multi-File search in which you can set up filters with the extensions you want to search.
  • EasyFind mainly finds files by names, but if you specify to search files ending in .rbbas, .xojo_class etc., then you can also search for their content. (I'd have liked to tell you that my program Find Any File could also be of help here, but it doesn't search file contents - yet.)

Windows

  • UltraFileSearch does a good job. You'd use the Wildcards search in the Files and Folders tab, searching for "*. xojo_*", and then enter the name of the function or code under the Containing Text tab.
Unfortunately, most of these methods (apart from TextWrangler), including Spotlight, won't show you the actual code snippets, and the Xojo IDE also won't open ".xojo_window" and similar files alone, so you'll have to find their main project file and double click that, then use the IDE again to search for the function name.


All platforms: Arbed


Arbed is a tool that was developed mainly for dealing with larger projects and the tasks around them. One of its many features lets you search inside project files within a folder and all its sub folders (and while some features require a license purchase, this is a free feature).

Contrary to Spotlight, Arbed runs on Windows and Linux as well, so you can use this even if you do not use a Mac.

The advantage of Arbed over the above methods is that Arbed can show you the results conveniently, so you don't have to re-enter the search in the IDE before you know if the results contain what you were searching for.

Launch Arbed and then either drop a folder onto its Search Multiple Projects box or choose Find on Disk... from the File menu:


You can then enter the search text, even as a Regex formula, and also choose the folder in which you want to search:


Arbed then lists all found projects:


Double clicking any of the found items opens a new window showing that project, along with a window listing all occurances. Clicking on any of the occurances will show you the related project item (class, function etc.):


Right-Clicking on an item in the project list gives you the option to reveal the file on disk or open it in the Xojo / Real Studio IDE.

21 September 2015

New: OS X Automator Action plugin for invoking Services

When I started adding Services to my apps (Find Any File, iClip), I realized that this was not enough for users who want to write workflows with Automator or Applescript because there seems to be no way to invoke Services from Automator nor AppleScript.

So I wrote an Automator Action that allows you to run any Service that operates on Text or Files & Folders:



Find instructions and the downloads on github (including Xcode source).

License is unrestricted, i.e. totally free.