flock() Timings
Last Update: 2021-04-09
Testing the effects of file locking on I/O throughput.
> Part 1: Lots of characters
NOTE: All files are available here
This was an assignment for OS. The goal was to write to a file simultaneously from multiple threads and see how the operating system interleaved their writes and what effects locking the files might have on performance.
The first task was to write a million A's and a million B's to a file. This is pretty easy: open the file, loop a million times, close the file. The programs to write A's and B's are '1a.c' and '1b.c', respectively. To start these files at nearly the same time, the program 'ps.c' was written to make sure that no instances of the A or B writers are running from previous trials, then start both writers and timing their individual run times.
Once I had a file with a lot of A's and B's in it, the next task was to determine the length of the runs of A's an B's in the file. The program 'consecutive.c' does just this. It takes an argument for a file name and returns the lines which the character[1] changes and how long the run of characters was before the change. Interestingly, the output shows the OS scheduler interleaving the file writes with remarkable consistency:
Change on line 65537 - B's: 2048
Change on line 67585 - A's: 2048
Change on line 69633 - B's: 2048
Change on line 71681 - A's: 2048
Change on line 73729 - B's: 2048
Change on line 75777 - A's: 2048
Change on line 77825 - B's: 2048
Change on line 79873 - A's: 2048
Change on line 81921 - B's: 2048
Change on line 83969 - A's: 2048
And here are the times from five runs of the 'ps.c' program:[2]
| Trial | A/B | Time (s) |
| ----- | --- | -------- |
| 1 | A | 0.10 |
| 1 | B | 0.10 |
| 2 | A | 0.12 |
| 2 | B | 0.12 |
| 3 | A | 0.10 |
| 3 | B | 0.10 |
| 4 | A | 0.12 |
| 4 | B | 0.12 |
| 5 | A | 0.09 |
| 5 | B | 0.09 |
| AVERAGE: | 0.106 |
> Part 2: I want my file!
The next bit was to introduce file locking into the loop that writes the characters to the output file. This introduces a loop inside the previous loop, allowing a certain number of characters (1000 to be exact) to be written to the file between locking and unlocking the file. Finally, to make sure the changes were written directly to disk, 'fflush()' was called after the completion of the inner loop, before unlocking the file. Here is the resulting loop (files '4a.c' and '4b.c'):
for (int i = 0; i < 1000; i++)
{
flock(fileno(fp), LOCK_EX);
for (int j = 0; j < 1000; j++)
fprintf(fp, "A\n");
fflush(fp);
flock(fileno(fp), LOCK_UN);
}
The enhanced program, complete with explicit file locking, was called in a similar manner as the first set of character writers. Program 'ps4.c' takes care of timing and running these programs simultaneously.
And here are the times from five runs:
| Trial | A/B | Time (s) |
| ----- | --- | -------- |
| 1 | A | 0.25 |
| 1 | B | 0.23 |
| 2 | A | 0.22 |
| 2 | B | 0.12 |
| 3 | A | 0.12 |
| 3 | B | 0.22 |
| 4 | A | 0.13 |
| 4 | B | 0.22 |
| 5 | A | 0.22 |
| 5 | B | 0.12 |
| AVERAGE: | 0.185 |
> Part 3: How slow can we make this?
The last part was to vary the number of characters written during the inner loop. Groups of 1, 10, 100, 1000, and 10000 characters were tested both with and without the 'fflush()' call. This code is contained in files '5.c' and 'ps5.c'. These files are essentially the same as the '4x.c' and 'ps4.c' files, except that they accept arguments to tell the program the number of characters to write per lock/unlock and whether or not to flush the write cache before releasing the lock.
Here are the times from a bunch of runs with different options:
| Write Size | Use 'fflush()'? | A/B | Time (s) |
| ---------- | --------------- | --- | -------- |
| 1 | No | A | 8.62 |
| 1 | No | B | 8.62 |
| 1 | Yes | A | 30.95 |
| 1 | Yes | B | 28.78 |
| 10 | No | A | 1.07 |
| 10 | No | B | 1.07 |
| 10 | Yes | A | 3.29 |
| 10 | Yes | B | 3.28 |
| 100 | No | A | 0.19 |
| 100 | No | B | 0.32 |
| 100 | Yes | A | 0.31 |
| 100 | Yes | B | 0.54 |
| 1000 | No | A | 0.14 |
| 1000 | No | B | 0.26 |
| 1000 | Yes | A | 0.13 |
| 1000 | Yes | B | 0.25 |
| 10000 | No | A | 0.23 |
| 10000 | No | B | 0.11 |
| 10000 | Yes | A | 0.16 |
| 10000 | Yes | B | 0.29 |
Looking at the table above, it is clear that flushing the cache results in a significant increase in time to preform the write operation when performed frequently.
The tests were repeated five times for each of the above scenarios. The raw data is available in the 'results.csv' file. Below is a graph of the averages of the runs:
Notes
1. I am using the term 'character' somewhat loosely throughout the post. In reality, each 'character' is composed two actual characters: the 'A' (or 'B') and a newline.
2. All runs were completed on a Raspberry Pi 4B, running Raspberry Pi OS, with no overclocking. The disk being accessed is a mdadm RAID 1 set of 2 WD Green HDDs, connected via USB3 to SATA interfaces (which have a peak throughput of about 119MB/s)