Q: ActiveStorage DiskService directory sharding

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Q: ActiveStorage DiskService directory sharding

Dominik Menke
Hello list,

I've noticed ActiveStorage::Service::DiskService does two levels of directory sharding for Blob pathnames:

$ tree storage/
storage/
+-- aa/
    +-- bb/
    |   +-- aabb00...
    +-- cc/
        +-- aacc11...

However, when it comes to variants, the sharding is circumvented:

$ tree storage/
storage/
+-- aa/
|   +-- bb/
|   |   +-- aabb00...
|   +-- cc/
|       +-- aacc11...
+-- va/
    +-- ri/
        +-- variants/
            +-- aabb00.../
            |   +-- <encoded variant file name>
            +-- aacc11.../
                +-- <encoded variant file name>


Should sharding exclude variant/ key prefixes? Maybe the directory layout can look like this:

$ tree storage/
storage/
+-- aa/
|   +-- bb/
|   |   +-- aabb00...
|   +-- cc/
|       +-- aacc11...
+-- variants/
    +-- aa/
        +-- bb/
        |   +-- aabb00.../
        |       +-- <encoded variant file name>
        +-- cc/
            +-- aacc11.../
                +-- <encoded variant file name>


This might not actually a problem at all, as the limiting factor on ext4 file systems is the inode index (which allow directories to contain ~10 million entries with 32 character long names (reference); the Base58 blob id is only 24 characters long). I guess with ZFS it's even less of a problem.

I was going to create an issue for this, but this might actually be expected/desired behaviour. I'm also not sure if this is a bug or feature request either. I know changing this will cause some headache. I'm primarily looking for other opinions.

Kind Regards,
Dominik

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/6624225b-b14e-4143-82d2-c75a94e56c6c%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Q: ActiveStorage DiskService directory sharding

Josef Strzibny
On the first look I would like your layout more (as something I would expect or design myself). However, I don't know any background on why is this like it is.

Dne úterý 27. srpna 2019 13:24:02 UTC+2 Dominik Menke napsal(a):
Hello list,

I've noticed ActiveStorage::Service::DiskService does two levels of <a href="https://github.com/rails/rails/blob/6-0-stable/activestorage/lib/active_storage/service/disk_service.rb#L143-L145" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frails%2Frails%2Fblob%2F6-0-stable%2Factivestorage%2Flib%2Factive_storage%2Fservice%2Fdisk_service.rb%23L143-L145\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH5tK2PVohQb-ipmWVKqWGbbZkR-g&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Frails%2Frails%2Fblob%2F6-0-stable%2Factivestorage%2Flib%2Factive_storage%2Fservice%2Fdisk_service.rb%23L143-L145\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH5tK2PVohQb-ipmWVKqWGbbZkR-g&#39;;return true;">directory sharding for Blob pathnames:

$ tree storage/
storage/
+-- aa/
    +-- bb/
    |   +-- aabb00...
    +-- cc/
        +-- aacc11...

However, when it comes to variants, the sharding is circumvented:

$ tree storage/
storage/
+-- aa/
|   +-- bb/
|   |   +-- aabb00...
|   +-- cc/
|       +-- aacc11...
+-- va/
    +-- ri/
        +-- variants/
            +-- aabb00.../
            |   +-- <encoded variant file name>
            +-- aacc11.../
                +-- <encoded variant file name>


Should sharding exclude variant/ key prefixes? Maybe the directory layout can look like this:

$ tree storage/
storage/
+-- aa/
|   +-- bb/
|   |   +-- aabb00...
|   +-- cc/
|       +-- aacc11...
+-- variants/
    +-- aa/
        +-- bb/
        |   +-- aabb00.../
        |       +-- <encoded variant file name>
        +-- cc/
            +-- aacc11.../
                +-- <encoded variant file name>


This might not actually a problem at all, as the limiting factor on ext4 file systems is the inode index (which allow directories to contain ~10 million entries with 32 character long names (<a href="https://medium.com/@hartator/benchmark-deep-directory-structure-vs-flat-directory-structure-to-store-millions-of-files-on-ext4-cac1000ca28" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fmedium.com%2F%40hartator%2Fbenchmark-deep-directory-structure-vs-flat-directory-structure-to-store-millions-of-files-on-ext4-cac1000ca28\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEk2eTwr7pZDUb_EjOF0f1Zx_OIOg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fmedium.com%2F%40hartator%2Fbenchmark-deep-directory-structure-vs-flat-directory-structure-to-store-millions-of-files-on-ext4-cac1000ca28\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEk2eTwr7pZDUb_EjOF0f1Zx_OIOg&#39;;return true;">reference); the Base58 blob id is only 24 characters long). I guess with ZFS it's even less of a problem.

I was going to create an issue for this, but this might actually be expected/desired behaviour. I'm also not sure if this is a bug or feature request either. I know changing this will cause some headache. I'm primarily looking for other opinions.

Kind Regards,
Dominik

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/af649317-8e22-4833-b1da-51d9b2d33029%40googlegroups.com.