The previous article about creating file containers was written in 2017. Since working with multiple containers simultaneously for a long time, improvements have been made. First of all, creating them with
dd was slow and not very precise. Finally, read/write speeds were very slow. This article explains how to fix these issues.
Correctly sizing file containers
The examples in the previous article showed creating the containers using both dd's count and bs parameters. Creating a container of 1GB was done as follows:
The problem is that it doesn't create a 1GB container, it creates a 1000MB container, but we want a Gigibyte as defined by the IEC, a real GB. Off course, we could simply upper the count to 1024, but that means we have to use maths for creating containers of multiple GB's. And I'm too lazy for that.
Therefore, we'll use the seek option to help us with that:
And voila, a container of exactly one real GB.
Faster creation of file containers
For the examples above, the speed of creating a 1GB container using
count instead of
seek, is slower, fast enough to not be of any problem. However, creating a 100G container using
count instead of
seek, takes 5 minutes instead of 3 milliseconds respectively. In other words, using the
seek method is a huge amount faster. If you don't believe me, just test it for yourself:
I must be honest though, the speed of
dd increases when you tune the block size properly, but this brings us back to the maths.
truncate instead of
Perhaps, all I learned about
dd is in vain (except that
dd is an excellent tool imaging, corrupting or clearing disks), when I came across
truncate for creating containers:
This really does exactly the same as the
seek method from above, same speed, less to type.
truncate and dd's
seek method, create sparse files. It means the file exists and has a size boundary (apparent size), but the created file isn't actually filled up with zeroes as it is when using our first example, dd's
count method. It means we can create a file that's bigger than the actual free space, but moreover it means the file grows with the amount of data in it (occupied space).
This difference can be made visible using
When the file is read, the kernel adds the zeroes where needed. Therefore, sparse files save space, but add both CPU and IO overhead.
Improving on read/write speeds
Even though the read/write speeds are mostly dependent on the actual hardware in use, the best filesystem can be found by simply looping over them:
Using this method, I found that
bfs is the best candidate on an SSD, but
ext2 works on an old fashioned HDD. Find out for yourself what's best on your setup, simply save the contents from the example above to a bash file, for example
, change the
filesystems variable to what your system supports and run it using
The script uses sparse files, are described above. If you want to test using pre-allocated space, replace the truncate command with the line below:
In our findings the speeds remain the same.
First of all, using sparse files created with
truncate is nice way to go, given you've got the CPU and IO to spare. Read/write speeds don't change when you're using pre-allocated files instead of sparse files, but if you're using multiple of them, for example mounted as loop devices, they slow down quickly due to required computational power.
Finding the best filesystem for your container to run on top of your configuration is a matter of trail and error. Simply use
ext2 if you're unsure or if you're too lazy to run the test. Use
ext3 if you need journaling, which for containers you probably don't. Finding the right filesystem might also lower the CPU and IO usage.
Long story short, if you've got the disk space, use regular pre-allocated files. If you've got CPU and IO but less disk space, use sparse files instead.
- Added clearer remarks about CPU and IO usage for sparse files. We found out the hard way that many sparse files mounted as loop devices require a lot of computation.