The previous article about creating file containers was written in 2017. Since working with multiple containers simultaneously for a long time, improvements have been made. First of all, creating them with dd
was slow and not very precise. Finally, read/write speeds were very slow. This article explains how to fix these issues.
Correctly sizing file containers
The examples in the previous article showed creating the containers using both dd's count and bs parameters. Creating a container of 1GB was done as follows:
The problem is that it doesn't create a 1GB container, it creates a 1000MB container, but we want a Gigibyte as defined by the IEC, a real GB. Off course, we could simply upper the count to 1024, but that means we have to use maths for creating containers of multiple GB's. And I'm too lazy for that.
Therefore, we'll use the seek option to help us with that:
And voila, a container of exactly one real GB.
Faster creation of file containers
For the examples above, the speed of creating a 1GB container using count
instead of seek
, is slower, fast enough to not be of any problem. However, creating a 100G container using count
instead of seek
, takes 5 minutes instead of 3 milliseconds respectively. In other words, using the seek
method is a huge amount faster. If you don't believe me, just test it for yourself:
I must be honest though, the speed of dd
increases when you tune the block size properly, but this brings us back to the maths.
Using truncate
instead of dd
Perhaps, all I learned about dd
is in vain (except that dd
is an excellent tool imaging, corrupting or clearing disks), when I came across truncate
for creating containers:
This really does exactly the same as the seek
method from above, same speed, less to type.
Sparse files
Both the truncate
and dd's seek
method, create sparse files. It means the file exists and has a size boundary (apparent size), but the created file isn't actually filled up with zeroes as it is when using our first example, dd's count
method. It means we can create a file that's bigger than the actual free space, but moreover it means the file grows with the amount of data in it (occupied space).
This difference can be made visible using du
versus ls
:
When the file is read, the kernel adds the zeroes where needed. Therefore, sparse files save space, but add both CPU and IO overhead.
Improving on read/write speeds
Even though the read/write speeds are mostly dependent on the actual hardware in use, the best filesystem can be found by simply looping over them:
Using this method, I found that bfs
is the best candidate on an SSD, but ext2
works on an old fashioned HDD. Find out for yourself what's best on your setup, simply save the contents from the example above to a bash file, for example
, change the rwtest.sh
filesystems
variable to what your system supports and run it using bash fstest.sh
.
The script uses sparse files, are described above. If you want to test using pre-allocated space, replace the truncate command with the line below:
In our findings the speeds remain the same.
Conclusion
First of all, using sparse files created with truncate
is nice way to go, given you've got the CPU and IO to spare. Read/write speeds don't change when you're using pre-allocated files instead of sparse files, but if you're using multiple of them, for example mounted as loop devices, they slow down quickly due to required computational power.
Finding the best filesystem for your container to run on top of your configuration is a matter of trail and error. Simply use ext2
if you're unsure or if you're too lazy to run the test. Use ext3
if you need journaling, which for containers you probably don't. Finding the right filesystem might also lower the CPU and IO usage.
Long story short, if you've got the disk space, use regular pre-allocated files. If you've got CPU and IO but less disk space, use sparse files instead.
Changelog
- 2021-02-07
- Added clearer remarks about CPU and IO usage for sparse files. We found out the hard way that many sparse files mounted as loop devices require a lot of computation.