ファイルシステムイベントAPIを使う

ファイルシステムイベントAPIは、はっきりと異なる幾つかの機能グループからなります。 FSEventsで始まる関数を使うことで、ボリュームとイベントに関する一般的な情報を取得できます。 FSEventStreamで始まる関数を使うことで、新規イベントストリームの作成、そのストリームにおける操作の実行などを行うことができます。

ファイルシステムイベントストリームのライフサイクルは下記の通りです:

  • アプリケーションがFSEventStreamCreateまたはFSEventStreamCreateRelativeToDeviceを呼びストリームを作る。
  • アプリケーションがFSEventStreamScheduleWithRunLoopを呼び、そのストリームのスケジュールを実行ループに組み込む。
  • アプリケーションがFSEventStreamStartを呼び、ファイルシステムデーモンにイベント送信開始を伝える。
  • アプリケーションが到着したイベントを処理する。APIは手順1で指定されたコールバック関数を呼び、イベントを投稿する。
  • アプリケーションがFSEventStreamStopを呼び、デーモンにイベント送信停止を伝える。
  • もしアプリケーションがストリームを再スタートする必要があれば、手順3へ飛ぶ。
  • アプリケーションがFSEventStreamUnscheduleFromRunLoopを呼び、実行ループからのイベントスケジュールを取り除く。
  • アプリケーションがFSEventStreamInvalidateを呼びストリームを無効化する。
  • アプリケーションがFSEventStreamReleaseを呼び、ストリームへの参照を開放する。

以下の節で、これら手順をより詳しく解説します。

ファイルシステムイベントストリームAPIを使うには、次のようにCore Servicesフレームワークをインクルードしなければなりません:

#include <CoreServices/CoreServices.h>

コンパイルの際、Core ServicesフレームワークをXcodeのターゲットに追加するか、コマンドラインまたはMakefile内のリンカフラグに-framework CoreServicesフラグを追加しなければなりません。

ファイルシステムイベントAPIは2種類のイベントストリームに対応します。ディスク毎のイベントストリームとホスト毎のイベントストリームです。 ストリームを生成する前に、どちらのストリームタイプにするか決めなければなりません。 FSEventStreamCreate関数とFSEventStreamCreateRelativeToDevice関数を呼ぶことで、それぞれのストリームを生成できます。

ホスト毎イベントストリームは、そのホストにおける他のイベントに関連して増加するIDで構成されます。 IDは1つの例外を除いて一意であることが保証されます。OS X v10.5以降の別のマシンで使われていた複数のディスクが追加されると、それらボリューム間で履歴IDが衝突する可能性があります。 新規イベントは、追加ディスクの履歴IDの最大値以降で自動的に開始されます。

対照的に、ディスク毎イベントストリームは、そのディスクにおける以前のイベントに関連して増加するIDで構成されます。 他のディスクの他のイベントとは一切関係せず、従って、個々の物理デバイスをモニターしたければ個別にイベントストリームを生成せねばなりません。

一般的に、永続性が要求されるソフトウェアを書いているならば、IDの衝突を避けるためにディスク毎ストリームを使うべきです。 一方、ホスト毎ストリームは、キューディレクトリの監視といった、通常実行中のディレクトリやディレクトリツリー内の変更をモニタリングする場合に最も向いています。

以前のOS Xバージョン(または可能性として他のオペレーティングシステム)が稼動するコンピュータによってディスクは変更される事があるため、イベントリストはそのボリュームに対する全ての変更の確実なリストとしてではなく、目安advisoryとして扱うべきです。 前バージョンのOS Xが動いているマシンでディスクを変更した場合、変更ログは破棄されます。

例えば、バックアップソフトウェアは、変更の見落としがないことを確実にするために、定期的に全てのボリュームをフルスキャンするべきです。

ルートファイルシステム上のファイルを監視する場合は、どちらのストリーム機構も同じような振る舞いをします。

例として、以下のコード片snippetはイベントストリームの生成方法を示します:

/* 監視するパスのCFStringオブジェクトを含む
   CFArrayオブジェクトの変数を定義し生成する。
*/
CFStringRef mypath = CFSTR("/path/to/scan");
CFArrayRef pathsToWatch = CFArrayCreate(NULL, (const void **)&mypath, 1, NULL);
void *callbackInfo = NULL; // ここでストリーム固有データを入れられる。
FSEventStreamRef stream;
CFAbsoluteTime latency = 3.0; /* レイテンシは秒数 */
 
/* コールバックを渡してストリームを生成する */
stream = FSEventStreamCreate(NULL,
    &myCallbackFunction,
    callbackInfo,
    pathsToWatch,
    kFSEventStreamEventIdSinceNow, /* もしくは以前のイベントID */
    latency,
    kFSEventStreamCreateFlagNone /* 関数リファレンスで説明されているフラグ */
);

イベントストリームを生成した時点で、それをアプリケーションの実行ループにスケジューリングすべきです。 これを行うには、生成した新規ストリーム、実行ループ、実行ループモードを引数としてFSEventStreamScheduleWithRunLoopを呼びます。 実行ループに関する更なる情報はRun Loopsをお読み下さい。

もし、まだ実行ループを持っていなければ、このタスクのためにスレッドを割り当てる必要があるでしょう。 あなたが選択したAPIを使ってスレッドを生成した後は、そのスレッドに最初の実行ループを確保するためにCFRunLoopGetCurrentを呼びます。 以後CFRunLoopGetCurrentは、その同一の実行ループを返します。

例として、以下のコード片snippetはストリーム??stream??を現在のスレッドの(まだ実行していない)実行ループにスケジュールする方法を示します:

FSEventStreamRef stream;
/* 以下の関数を呼ぶ前にストリームを生成する。 */
FSEventStreamScheduleWithRunLoop(stream, CFRunLoopGetCurrent(),         kCFRunLoopDefaultMode);

イベントストリーム設定の最終工程はFSEventStreamStartを呼ぶことです。 この関数は、イベントストリームにイベントの送信開始を指示します。 開始させるイベントストリームが唯一の引数です。

イベントストリームの生成とスケジュールを行い、もしまだ実行ループを走らせていなければ、CFRunLoopRunを読んでそれを始動させましょう。

イベントハンドラーコールバックはFSEventStreamCallbackプロトタイプに適合していなければなりません。 パラメータはFSEventStreamCallbackデータ型のリファレンスマニュアルで解説しています。

イベントハンドラは、パス、識別子、フラグの3つのリストを受け取ります。 実質的に、これらがイベントのリストを表します。 The first event consists of the first entry taken from each of the arrays, and so on. イベント処理の必要に応じて、ハンドラはこれらリストをイテレートして下さい。

For each event, you should scan the directory at the specified path, processing its contents as desired. 通常、そのパスによって示されるディレクトリだけを走査する必要があります。しかしながら、そのケースに当てはまらない3つの状況があります:

  • 1つのディレクトリにおいて、サブディレクトリ内で複数のイベントが同時に発生した場合、それらイベントは1つのイベントにまとめられる可能性があります。このケースでは、kFSEventStreamEventFlagMustScanSubDirsフラグが立ったイベントを受信することになります。このようなイベントを受信した時は、そのイベントに載っているパスを再帰的に再走査すべきです。変化の追加分は、必ずしもリストアップされたパスの直接の子供にあるとは限りません。
  • カーネルとユーザー空間の間でコミュニケーションエラーが発生した場合、kFSEventStreamEventFlagKernelDroppedkFSEventStreamEventFlagUserDroppedフラグが立ったイベントを受け取ることになるでしょう。このケースでは、監視中の全てのディレクトリをフルスキャンしなければなりません。なぜならば、何が変更されたかを決める一切の手立てがないからです。

<note> イベントのドロップ時はkFSEventStreamEventFlagMustScanSubDirsフラグも同様にセットされます。 上で述べたように、パスのフルスキャンを行うべきかどうかの判断では、

  Note: When an event is dropped, the kFSEventStreamEventFlagMustScanSubDirs flag is also set. Thus, it is not necessary to explicitly check for the dropped event flags when determining whether to perform a full rescan of a path. The dropped event flags are provided purely for informational purposes.
  If the root directory that you are watching is deleted, moved, or renamed (or if any of its parent directories are moved or renamed), the directory may cease to exist. If you care about this, you should pass the flag kFSEventStreamCreateFlagWatchRoot when creating the stream. In this case, you will receive an event with the flag kFSEventStreamEventFlagRootChanged and an event ID of zero (0). In this case, you must rescan the entire directory because it may not exist.
  If you need to figure out where the directory moved, you should open the root directory with open, then pass F_GETPATH to fcntl to find its current path. See the manual page for fcntl for more information.
  If the number of events approaches 2^64, the event identifier will wrap around. When this happens, you will receive an event with the flag kFSEventStreamEventFlagEventIdsWrapped. Fortunately, at least in the near term, this is unlikely to occur in practice, as 64 bits allows enough room for about one event per eraser-sized region on the Earth’s surface (including water) and would require about 2000 exabytes (2 million million gigabytes) of storage to hold them all. However, you should still check for this flag and take appropriate action if you receive it.

As part of your handler, you may sometimes need to obtain a list of paths being watched by the current event stream. You can obtain that list by calling FSEventStreamCopyPathsBeingWatched.

Sometimes, you may wish to monitor where you are in the stream. You might, for example, choose to do less processing if your code is slipping significantly behind. You can find out the latest event included in the current batch of events by calling FSEventStreamGetLatestEventId (or by examining the last event in the list). You can then compare this with the value returned by FSEventsGetCurrentEventId, which returns the highest numbered event in the system.

For example, the following code snippet shows a very simple handler.

void mycallback(

  ConstFSEventStreamRef streamRef,
  void *clientCallBackInfo,
  size_t numEvents,
  void *eventPaths,
  const FSEventStreamEventFlags eventFlags[],
  const FSEventStreamEventId eventIds[])

{

  int i;
  char **paths = eventPaths;
  // printf("Callback called\n");
  for (i=0; i<numEvents; i++) {
      int count;
      /* flags are unsigned long, IDs are uint64_t */
      printf("Change %llu in %s, flags %lu\n", eventIds[i], paths[i], eventFlags[i]);
 }

}

Note: If you passed the flag kFSEventStreamCreateFlagUseCFTypes when creating the stream, you should cast the eventPaths value to a CFArrayRef object.

Using Persistent Events

One of the most powerful features of file system events is their persistence across reboots. This means that your application can easily find out what happened since a particular time or a particular event in the distant past. By doing so, you can find out what files have been modified even when your application is not running. This can greatly simplify tasks such as backing up modified files, checking for changed dependencies in multi-file projects, and so on.

To work with persistent events, your application should regularly store the last event ID that it processes. Then, when it needs to go back and see what files have changed, it only needs to look at events that occurred after the last known event. To obtain all events since a particular event in the past, you pass the event ID in the sinceWhen argument to FSEventStreamCreate or FSEventStreamCreateRelativeToDevice.] [On a per-device basis, you can also easily use a timestamp to determine which events to include. To do this, you must first call FSEventsGetLastEventIdForDeviceBeforeTime to obtain the last event ID sinceWhen argument to FSEventStreamCreateRelativeToDevice.

On a per-device basis, you can also easily use a time stamp to determine which events to include. To do this, you must first call FSEventsGetLastEventIdForDeviceBeforeTime to obtain the last event ID for that device prior to the specified time stamp. You then pass the resulting value to FSEventStreamCreateRelativeToDevice. This is described further in Special Considerations for Per-Device Streams.

When working with persistent events, a commonly-used technique is to combine file system event notifications with a cached “snapshot” of the metadata of files within the tree. This process is described further in Building a Directory Hierarchy Snapshot. Building a Directory Hierarchy Snapshot

File system events tell you that something in a given directory changed. In some cases, this is sufficient—for example, if your application is a print or mail spooler, all it needs to know is that a file has been added to the directory.

In some cases, however, this is not enough, and you need to know precisely what changed within the directory. The simplest way to solve this problem is to take a snapshot directory hierarchy, storing your own copy of the state of the system at a given point in time. You might, for example, store a list of filenames and last modified dates, thus allowing you to determine which files have been modified since the last time you performed a backup.

You do this by iterating through the hierarchy and building up a data structure of your choice. As you cache this metadata, if you see changes during the caching process, you can reread the directory or directories that changed to obtain an updated snapshot. Once you have a cached tree of metadata that accurately reflects the current state of the hierarchy you are concerned with, you can then determine what file or files changed within a directory or hierarchy (after a file system event notification) by comparing the current directory state with your snapshot.

Important: To avoid missing changes, you must start monitoring the directory before you start scanning it. Because of the inherently non-deterministic latency in any notification mechanism on a multitasking operating system, it may not always be obvious whether the action that triggered an event occurred before or after a nested subdirectory was scanned. To guarantee that no changes are lost, it is best to always rescan any subdirectory that is modified during scanning rather than taking a time stamp for each subdirectory and trying to compare those time stamps with event time stamps.

OS X provides a number of APIs that can make this easier. The scandir function returns an array of directory entries that you can quickly iterate through. This is somewhat easier than reading a directory manually with opendir, readdir, and so on, and is slightly more efficient since you will always iterate through the entire directory while caching anyway.

The binary tree functions tsearch, tfind, twalk, and tdelete can simplify working with large search trees. In particular, binary trees are an easy way of quickly finding the cached file information from a particular directory. The following code snippet demonstrates the proper way to call these functions:

Listing 2-1 Using the tsearch, tfind, twalk, and tdelete API.

#include <unistd.h>

#include <stdlib.h>

#include <stdio.h>

#include <dirent.h>

#include <sys/stat.h>

#include <string.h>

#include <search.h>

int array[] = { 1, 17, 2432, 645, 2456, 1234, 6543, 214, 3, 45, 34 };

void *dirtree;

static int cmp(const void *a, const void *b) {

  if (*(int *)a < *(int *)b) return -1;
  if (*(int *)a > *(int *)b) return 1;
  return 0;

}

void printtree(void);

/* Pass in a directory as an argument. */

int main(int argc, char *argv[])

{

  int i;
  for (i=0; i< sizeof(array) / sizeof(array[0]); i++) {
      void *x = tsearch(&array[i], &dirtree, &cmp);
      printf("Inserted %p\n", x);
  }
  printtree();
  void *deleted_node = tdelete(&array[2], &dirtree, &cmp);
  printf("Deleted node %p with value %d (parent node contains %d)\n",
      deleted_node, array[2], **(int**)deleted_node);
  for (i=0; i< sizeof(array) / sizeof(array[0]); i++) {
      void *node = tfind(&array[i], &dirtree, &cmp);
      if (node) {
          int **x = node;
          printf("Found %d (%d) at %p\n", array[i], **x, node);
      } else {
          printf("Not found: %d\n", array[i]);
      }
  }
  exit(0);

}

static void printme(const void *node, VISIT v, int k)

{

  const void *myvoid = *(void **)node;
  const int *myint = (const int *)myvoid;
  // printf("x\n");
  if (v != postorder && v != leaf) return;
  printf("%d\n", *myint);

}

void printtree(void)

{

  twalk(dirtree, &printme);

}

Two unusual design decisions in this API can make it tricky to use correctly if you haven’t used it before on other UNIX-based or UNIX-like operating systems:

  The tsearch and tdelete functions take the address of the tree variable, not the tree variable itself. This is because they must modify the value stored in the tree variable when they create or delete the initial root node, respectively.
  Even though tfind does not modify the value of the root, it still takes the address of the root as its parameter, not the root pointer itself. A common mistake is to pass in the dirtree pointer. In fact, you must pass in &dirtree (the address of the dirtree pointer).
  Note: Despite the seeming consistency, the twalk function does not take the address of the root, so the ampersand is not needed, and indeed, will cause a crash if you use it.
  The values passed to the callback by twalk and the values returned by tfind and tsearch are the address where the pointer to the data is stored, not the data value itself. Because this code passed in the address of an integer, it is necessary to dereference that value twice—once for the original address-of operator and once to dereference the pointer to that pointer that these functions return.
  Unlike the other functions, however, the function tdelete does not return an address within the tree where the data is stored. This is because the data is no longer stored in the tree. Instead, it returns the parent node of the node that it deleted.

The POSIX functions stat and lstat provide easy access to file metadata. These two functions differ in their treatment of symbolic links. The lstat function provides information about the link itself, while the stat function provides information about the file that the link points to. Generally speaking, when working with file system event notifications, you will probably want to use lstat, because changes to the underlying file will not result in a change notification for the directory containing the symbolic link to that file. However, if you are working with a controlled file structure in which symbolic links always point within your watched tree, you might have reason to use stat.

For an example of a tool that builds a directory snapshot, see the Watcher sample code. Cleaning Up

When you no longer need a file system event stream, you should always clean up the stream to avoid leaking memory and descriptors. Before cleaning up, however, you must first stop the run loop by calling FSEventStreamStop.

Next, you should call FSEventStreamInvalidate. This function unschedules the stream from all run loops with a single call. If you need to unschedule it from only a single run loop, or if you need to move the event stream between two run loops, you should instead call FSEventStreamUnscheduleFromRunLoop. You can then reschedule the event stream, if desired, by calling FSEventStreamScheduleWithRunLoop.

Once you have invalidated the event stream, you can release it by calling FSEventStreamRelease. When the stream release and stream retain counts balance and there are no longer any occurances of the stream being retained, the stream will be freed.

There are three other cleanup-related functions that you should be aware of under certain circumstances. If your application needs to make certain that the file system has reached a steady state prior to cleaning up the stream, you may find it useful to flush the stream. You can do this with one of two functions: FSEventStreamFlushAsync and FSEventStreamFlushSync.

When flushing events, the synchronous call will not return until all pending events are flushed. The asynchronous call will return immediately, and will return the event ID (of type FSEventStreamEventId) of the last event pending. You can then use this value in your callback function to determine when the last event has been processed, if desired.

The final function related to cleaning up is FSEventsPurgeEventsForDeviceUpToEventId. This function can only be called by the root user because it destroys the historical record of events on a volume prior to a given event ID. As a general rule, you should never call this function because you cannot safely assume that your application is the only consumer of event data.

If you are writing a specialized application (an enterprise backup solution, for example), it may be appropriate to call this function to trim the event record to some reasonable size to prevent it from growing arbitrarily large. You should do this only if the administrator explicitly requests this behavior, however, and you should always ask for confirmation (either before performing the operation or before enabling any rule that would cause it to be performed at a later time). Special Considerations for Per-Device Streams

In addition to the considerations described in Handling Events, streams created with FSEventStreamCreateRelativeToDevice, per-device streams have some special characteristics that you should be aware of:

  All paths are relative to the root of the volume that you are monitoring, not relative to the system root. This applies to both the path used when creating the stream and to any path that your callback receives as part of an event.
  Device IDs may not remain the same across reboots (particularly with removable devices). It is your responsibility to ensure that the volume you are looking at is the right one by comparing the UUID.

In addition to the functions provided for systemwide streams, you can obtain the UUID for the device associated with a stream by calling FSEventStreamGetDeviceBeingWatched.

You can obtain the unique ID for a device by calling FSEventsCopyUUIDForDevice. If this unique ID is different than the one obtained from a previous run, this can mean many things. It could mean that the user has two volumes with the same name, that the user has reformatted the volume with the same name, or that the event IDs have been purged for the volume. In any of these cases, any previous events for the volume do not apply to this particular volume, but they may still be valid for another volume.

If you find that the UUID for a volume matches what was stored on a previous run, but the event ID is lower than the last version you stored, this may mean that the user restored a volume from a backup, or it may mean that the IDs have wrapped around or have been purged. In either case, any stored events you may have for the device are invalid.

Finally, if you are using persistent events, you can also use the function FSEventsGetLastEventIdForDeviceBeforeTime to find the last event prior to a time stamp. This event ID is persistent, and can be particularly useful for performing incremental backups.

The time format used is a CFAbsoluteTime value, which is measured in seconds since January 1, 2001. For other timestamp formats, you must convert them to this format as follows:

  If you are writing a Cocoa application, you should use an NSDate object to perform any conversions, then use CFDateGetAbsoluteTime to obtain the corresponding CFAbsoluteTime value. (You can transparently pass an NSDate object as a CFDateRef.)
  If you are starting with a POSIX timestamp in a non-Cocoa application, you should subtract kCFAbsoluteTimeIntervalSince1970 from the value to convert to a CFAbsoluteTime value. Be sure to always use timestamps based on GMT.
  If you are working with a legacy Carbon timestamp in a non-Cocoa application, you would subtract kCFAbsoluteTimeIntervalSince1904. Be sure to always use timestamps based on GMT.

For more information about date and time types, you should read Date and Time Programming Guide for Core Foundation.

  • translation/working/翻訳5.txt
  • 最終更新: 2016-08-24 13:46
  • by Decomo